Format and Filter File to CSV Table

format and filter file to Csv table

GNU awk

awk -F: -v OFS=, '
/^at/ {
split($0, f, " ")
time = f[2]
course = f[3] " " f[4]
times[course] = times[course] OFS time
}
$2 == "oftheory" {th[course] = th[course] OFS $(NF-1)}
$2 == "ofapplicaton" {ap[course] = ap[course] OFS $(NF-1)}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for (c in times) {
printf "%s%s\n", c, times[c]
printf "application%s\n", ap[c]
printf "theory%s\n", th[c]
print ""
}
}
' file
carl 1,10:00,14:00
application,onehour,twohours
theory,nothing,nothing

carl 2,10:00,14:00
application,twohour,twohours
theory,math,music

david 1,10:00,14:00
application,halfhour,onehours
theory,geo,programmation

david 2,10:00,14:00
application,nothing,nothing
theory,history,philosophy

Filter CSV files for specific value before importing

setwd("E:/Data/")
files <- list.files(path = "E:/Data/",pattern = "*.csv")
temp <- lapply(files, function(x) subset(fread(x, sep=",", fill=TRUE, integer64="numeric",header=FALSE), V1=="aa"))
DF <- rbindlist(temp)

Untested, but this will probably work - replace your function call with an anonymous function.

How to filter columns within a .CSV file and then save those filtered columns to a new .CSV file in Python?

It can be quickly done using Pandas

import pandas as pd

weather_data = pd.read_csv('Data.csv')
filtered_weather = weather_data[['Column_1','Column_1']] #Select the column names that you want
filtered_weather.to_csv('new_file',index=False)

How to filter csv file by date column using awk whenever date format constraint does not match date format column?

You can use a regex to match the start of your field, i.e. match the first 10 characters (YYYY-MM-DD) of the field.

today=$(date '+%Y-%m-%d')
awk -v regex="^$today" -F';' '$25 ~ regex' input.csv > today.csv

This passes the value of the $today variable with -v to awk and prepends a ^ to match the start of the field.

Filter csv files and create a new set of .csv files with the data

Edit: Added a target folder and a cd command for a source folder.

This works here - test it on some sample files.

@echo off
setlocal enabledelayedexpansion
set "target=d:\target\folder"
cd /d "c:\source\folder"
for /L %%a in (101,1,148) do (
set num=%%a
del "%target%\-!num:~-2!.csv" 2>nul
>"%target%\-!num:~-2!.csv.txt" echo Code,type,head,file,make,run,style,line,edge,model,letter,status
)

for %%a in (*.csv) do (
for /f "skip=1 usebackq delims=" %%b in ("%%a") do (
for /f "tokens=1,2 delims=-," %%c in ("%%b") do (
set "line=%%c"
if /i "!line:~0,2!"=="HH" >> "%target%\-%%d.csv.txt" echo %%b
)
)
)
ren "%target%\*.csv.txt" *.
pause

CSV Filtering a column with mixed data types

If your CSV data is non-trivial, with things like commas inside quoted fields, a tool that's aware of the format is a better option than trying to use awk or the like on it.

Example perl one-liner using the Text::CSV_XS module (Install via your OS package manager or favorite CPAN client):

$ perl -MText::CSV_XS=csv -e 'csv(in => \*STDIN, filter => { 4 => sub { ! $seen{$_}++ }})' < input.csv
71508050,"HUNT, RICHARD F"," ","1009 # B FATHOM DR"

Creating a user-input filters on csv file that contains large data

The Pandas library in Python allows you to view and manipulate csv data. The following solution imports the pandas library, reads the csv using the read_csv() function and loads it into a dataframe, then ask for input values, keeping in mind that State and Crime should be string values and cast as str and Year should be integer and cast as int, then applies a simple query to filter the results you need from the dataframe. We build this query keeping in mind that all three conditions should be met and that the input strings can be lowercase too.

In [125]: import pandas as pd
In [126]: df = pd.read_csv('test.csv')

In [127]: df
Out[127]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999
1 CALIFORNIA ASSAULT 45 2003
2 NEW YORK ARSON 9 1999

In [128]: state = str(input("Enter State: "))
Enter State: California

In [129]: crime_type = str(input("Enter Crime Type: "))
Enter Crime Type: robbery

In [130]: year = int(input("Enter Year: "))
Enter Year: 1999

In [131]: df.loc[lambda x:(x['State'].str.lower().str.contains(state.lower()))
...: & (x['Crime type'].str.lower().str.contains(crime_type.lower())) & (x
...: ['Year'] == year)]
Out[131]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999


Related Topics



Leave a reply



Submit