Split Delimited File into Smaller Files by Column

Split delimited file into smaller files by column

#!/bin/bash

(($# == 2)) || { echo -e "\nUsage: $0 <file to split> <# columns in each split>\n\n"; exit; }

infile="$1"

inc=$2
ncol=$(awk 'NR==1{print NF}' "$infile")

((inc < ncol)) || { echo -e "\nSplit size >= number of columns\n\n"; exit; }

for((i=0, start=1, end=$inc; i < ncol/inc + 1; i++, start+=inc, end+=inc)); do
cut -f$start-$end "$infile" > "${infile}.$i"
done

split one file into multiple files according to columns using bash cut or awk

With awk:

awk -F '[\t;]' '{for(i=1; i<=NF; i++) print $i >> "column" i ".txt"}' file

Use tab and semicolon as field separator. NF contains the number of last column in the current row. $i contains content of current column and i number of current column.

This creates 11 files. column11.txt contains:


k
p
k
k

split file into multiple files (by columns)

To summarise my comments, I suggest something like this (untested as I have no sample file):

NM=$(awk 'NR==1{print NF-2}' file.txt)
echo $NM

for (( i=1; i <= $NM; i++ ))
do
echo $i
awk '{print $'$i'}' file.txt > tmpgrid_0${i}.dat
done

How to split a huge .CSV file into n smaller files when an index in a particular column changes?

Something like this might be sufficient to split the csv file into smaller files each grouped by the first column in the csv:

awk -F, '{ print >> ($1".part.csv") }' file.csv

Breakdown

# awk iterates over each line in the specified input file
awk -F, # tell awk to split the lin into columns on ","
'{ print # print whole line
>> # append to file
($1".part.csv") }' # output file is first columns prefixed with ".part.csv"
file.csv # input file

How can I split a large tab delimited text file into separate files by date field and use the date in the file name

Split the line into a variable first, then use the DateTime class to parse and reformat the date:

$fields    = $line -split '\t'
$namePart1 = $fields[2]
$date = [DateTime]::ParseExact($fields[0], 'M\/d\/yyyy H\:m', [CultureInfo]::InvariantCulture)
$namePart2 = $date.ToString('yyyyMM')
$newFile = "${namePart1}_$namePart2"
  • In the ParseExact() call, the 2nd argument specifies a custom date and time format. Certain characters like / and : are special characters which must be \-escaped to use them as literal characters.
  • In "${namePart1}_$namePart2", the curly braces are used to make sure the underscore is not interpreted as part of the variable name.

How to split text file with Pipe delimiter using Python and then pick columns based on condition?

You can store your splitted lines in a dictionary and make a Series out of it:

output_dict = {}
with open("file.txt", "r") as f:
while True:
line = f.readline()
if not line:
break
fields = line.strip("\n").split('|')
if fields[1] == "Number":
output_dict[fields[0]] = fields[2]
elif fields[1] == "Text":
output_dict[fields[0]] = fields[3]
elif fields[1] == "Columns":
output_dict[fields[0]] = fields[4:4 + int(fields[2])]

#print(output_dict)

series = pd.Series(output_dict)
print(series.explode())

Output:

Attribute1                7
Attribute2 "sample text"
Attribute3 "data1"
Attribute3 "data2"
Attribute3 "data3"
Attribute3 "data4"


Related Topics



Leave a reply



Submit