format and filter file to Csv table
GNU awk
awk -F: -v OFS=, '
/^at/ {
split($0, f, " ")
time = f[2]
course = f[3] " " f[4]
times[course] = times[course] OFS time
}
$2 == "oftheory" {th[course] = th[course] OFS $(NF-1)}
$2 == "ofapplicaton" {ap[course] = ap[course] OFS $(NF-1)}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for (c in times) {
printf "%s%s\n", c, times[c]
printf "application%s\n", ap[c]
printf "theory%s\n", th[c]
print ""
}
}
' file
carl 1,10:00,14:00
application,onehour,twohours
theory,nothing,nothing
carl 2,10:00,14:00
application,twohour,twohours
theory,math,music
david 1,10:00,14:00
application,halfhour,onehours
theory,geo,programmation
david 2,10:00,14:00
application,nothing,nothing
theory,history,philosophy
Filter CSV files for specific value before importing
setwd("E:/Data/")
files <- list.files(path = "E:/Data/",pattern = "*.csv")
temp <- lapply(files, function(x) subset(fread(x, sep=",", fill=TRUE, integer64="numeric",header=FALSE), V1=="aa"))
DF <- rbindlist(temp)
Untested, but this will probably work - replace your function call with an anonymous function.
How to filter columns within a .CSV file and then save those filtered columns to a new .CSV file in Python?
It can be quickly done using Pandas
import pandas as pd
weather_data = pd.read_csv('Data.csv')
filtered_weather = weather_data[['Column_1','Column_1']] #Select the column names that you want
filtered_weather.to_csv('new_file',index=False)
How to filter csv file by date column using awk whenever date format constraint does not match date format column?
You can use a regex to match the start of your field, i.e. match the first 10 characters (YYYY-MM-DD) of the field.
today=$(date '+%Y-%m-%d')
awk -v regex="^$today" -F';' '$25 ~ regex' input.csv > today.csv
This passes the value of the $today
variable with -v
to awk
and prepends a ^
to match the start of the field.
Filter csv files and create a new set of .csv files with the data
Edit: Added a target folder
and a cd command for a source folder
.
This works here - test it on some sample files.
@echo off
setlocal enabledelayedexpansion
set "target=d:\target\folder"
cd /d "c:\source\folder"
for /L %%a in (101,1,148) do (
set num=%%a
del "%target%\-!num:~-2!.csv" 2>nul
>"%target%\-!num:~-2!.csv.txt" echo Code,type,head,file,make,run,style,line,edge,model,letter,status
)
for %%a in (*.csv) do (
for /f "skip=1 usebackq delims=" %%b in ("%%a") do (
for /f "tokens=1,2 delims=-," %%c in ("%%b") do (
set "line=%%c"
if /i "!line:~0,2!"=="HH" >> "%target%\-%%d.csv.txt" echo %%b
)
)
)
ren "%target%\*.csv.txt" *.
pause
CSV Filtering a column with mixed data types
If your CSV data is non-trivial, with things like commas inside quoted fields, a tool that's aware of the format is a better option than trying to use awk
or the like on it.
Example perl one-liner using the Text::CSV_XS
module (Install via your OS package manager or favorite CPAN client):
$ perl -MText::CSV_XS=csv -e 'csv(in => \*STDIN, filter => { 4 => sub { ! $seen{$_}++ }})' < input.csv
71508050,"HUNT, RICHARD F"," ","1009 # B FATHOM DR"
Creating a user-input filters on csv file that contains large data
The Pandas library in Python allows you to view and manipulate csv data. The following solution imports the pandas library, reads the csv using the read_csv()
function and loads it into a dataframe, then ask for input values, keeping in mind that State and Crime should be string values and cast as str
and Year should be integer and cast as int
, then applies a simple query to filter the results you need from the dataframe. We build this query keeping in mind that all three conditions should be met and that the input strings can be lowercase too.
In [125]: import pandas as pd
In [126]: df = pd.read_csv('test.csv')
In [127]: df
Out[127]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999
1 CALIFORNIA ASSAULT 45 2003
2 NEW YORK ARSON 9 1999
In [128]: state = str(input("Enter State: "))
Enter State: California
In [129]: crime_type = str(input("Enter Crime Type: "))
Enter Crime Type: robbery
In [130]: year = int(input("Enter Year: "))
Enter Year: 1999
In [131]: df.loc[lambda x:(x['State'].str.lower().str.contains(state.lower()))
...: & (x['Crime type'].str.lower().str.contains(crime_type.lower())) & (x
...: ['Year'] == year)]
Out[131]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999
Related Topics
How to Boot with My Latest Rpi-3.18.0 Kernel and Enabling The Device Tree
Removing First 3 Characters of File Names in Linux
Why 2 Linux Processes of Same File Cannot Share Text Segment
Google Suggest Query Using Curl
Shell Script to Find The Nth Occurrence of a String and Print The Line Number
Linux: Get a Script to Be Able to Ask The User for a File Name Then Open That File
Code for Wait_Event_Interruptible
Why Doesn't ''Var=Value Echo $Var'' Emit Value
Yocto: How to Install Header Files Along with Kernel Module in Sdk
Flutter PDF Viewer for Linux Desktop
Accessing Any Memory Locations Under Linux 2.6.X
Restoring System Directories Permissions
Extract/See Content of a Specific File Inside a .War File
How to Cross-Compile a Autotools Project for Arm
Ftrace: System Crash When Changing Current_Tracer from Function_Graph via Echo