Split delimited file into smaller files by column
#!/bin/bash
(($# == 2)) || { echo -e "\nUsage: $0 <file to split> <# columns in each split>\n\n"; exit; }
infile="$1"
inc=$2
ncol=$(awk 'NR==1{print NF}' "$infile")
((inc < ncol)) || { echo -e "\nSplit size >= number of columns\n\n"; exit; }
for((i=0, start=1, end=$inc; i < ncol/inc + 1; i++, start+=inc, end+=inc)); do
cut -f$start-$end "$infile" > "${infile}.$i"
done
split one file into multiple files according to columns using bash cut or awk
With awk:
awk -F '[\t;]' '{for(i=1; i<=NF; i++) print $i >> "column" i ".txt"}' file
Use tab and semicolon as field separator. NF
contains the number of last column in the current row. $i
contains content of current column and i
number of current column.
This creates 11 files. column11.txt contains:
k
p
k
k
split file into multiple files (by columns)
To summarise my comments, I suggest something like this (untested as I have no sample file):
NM=$(awk 'NR==1{print NF-2}' file.txt)
echo $NM
for (( i=1; i <= $NM; i++ ))
do
echo $i
awk '{print $'$i'}' file.txt > tmpgrid_0${i}.dat
done
How to split a huge .CSV file into n smaller files when an index in a particular column changes?
Something like this might be sufficient to split the csv file into smaller files each grouped by the first column in the csv:
awk -F, '{ print >> ($1".part.csv") }' file.csv
Breakdown
# awk iterates over each line in the specified input file
awk -F, # tell awk to split the lin into columns on ","
'{ print # print whole line
>> # append to file
($1".part.csv") }' # output file is first columns prefixed with ".part.csv"
file.csv # input file
How can I split a large tab delimited text file into separate files by date field and use the date in the file name
Split the line into a variable first, then use the DateTime
class to parse and reformat the date:
$fields = $line -split '\t'
$namePart1 = $fields[2]
$date = [DateTime]::ParseExact($fields[0], 'M\/d\/yyyy H\:m', [CultureInfo]::InvariantCulture)
$namePart2 = $date.ToString('yyyyMM')
$newFile = "${namePart1}_$namePart2"
- In the
ParseExact()
call, the 2nd argument specifies a custom date and time format. Certain characters like/
and:
are special characters which must be\
-escaped to use them as literal characters. - In
"${namePart1}_$namePart2"
, the curly braces are used to make sure the underscore is not interpreted as part of the variable name.
How to split text file with Pipe delimiter using Python and then pick columns based on condition?
You can store your splitted lines in a dictionary and make a Series out of it:
output_dict = {}
with open("file.txt", "r") as f:
while True:
line = f.readline()
if not line:
break
fields = line.strip("\n").split('|')
if fields[1] == "Number":
output_dict[fields[0]] = fields[2]
elif fields[1] == "Text":
output_dict[fields[0]] = fields[3]
elif fields[1] == "Columns":
output_dict[fields[0]] = fields[4:4 + int(fields[2])]
#print(output_dict)
series = pd.Series(output_dict)
print(series.explode())
Output:
Attribute1 7
Attribute2 "sample text"
Attribute3 "data1"
Attribute3 "data2"
Attribute3 "data3"
Attribute3 "data4"
Related Topics
Tty_Flip_Buffer_Push() Sends Data Back to Itself
What Do the Suffixes "+" and "-" After the Job Id of Background Jobs Mean
How to Print a Number in Arm Assembly
How to Grep While Avoiding 'Too Many Arguments'
What Happened to Socket If Network Has Broken Down
How to Find Out Where Is My Code Causing Glib-Gobject-Critical
Ldconfig Only Links Files Starting with Lib*
Printing Grep Results to File and Terminal
Question About File Seeking Position
Collecting the Data for a Partiulcar Process from Pmu for Every 1 Milli Second
Clear Screen in a Linux Terminal Using Assembly
Filename Last Modification Date Shell in Script
Low-Overhead Way to Access the Memory Space of a Traced Process
How Can Linux Ptrace Be Unsafe or Contain a Race Condition
Git Clone from Linux to Tfs Git Repo
How to Set Runpath of a Binary