Splitting a File Using Awk on MAC Os X

Splitting a file using AWK on Mac OS X

You can fix this script by using a variable:

awk '/SEPARATOR/{n++}{filename = "part" n ".txt"; print >filename }' in.txt

Splitting text file and adding line count in header with awk in OSX

With GNU Awk or Mawk:

awk -v RS='\nB       \\*        -                     \\|[0-9]+\\|\n' 'NF {
numLines = gsub("(^|\n)>", "\n") # replace line-initial ">" and count lines in block
fname = "part" ++n # determine next output filename
printf "%s%s\n", numLines " 120", $0 > fname # output header + block
close(fname) # close output file
}' file

Note: Unless the last line in the input file is a separator line, the last output file will have a trailing empty line (the data-line count in the header will be correct, however) - the OP has confirmed this not to be a problem.

  • GNU Awk or Mawk are needed, because only they support multi-character regex-based RS (input-record separator) values - unlike the BSD awk that macOS comes with. It is possible to solve this problem differently, but it would be a little more cumbersome.

    • Both GNU Awk and Mawk can be installed on macOS via package manager Homebrew; with Homebrew installed, simply run brew install gawk or brew install mawk.
  • The approach breaks the input into blocks of lines, by the B separator lines. Thus, each such block must fit into memory as a whole (presumably two copies at once, due to performing a string substitution.

  • Having the whole block of lines in memory before writing them to the output file is what allows counting the lines up front and adding that information to the header.

    • numLines = gsub("(^|\n)>", "\n") performs both the removal of the line-initial > chars. and determines the number of lines in the block, taking advantage of the fact that gsub() returns the number of replacements made.

Using awk to split CSV file by column

I've resolved this. Following the logic of this thread, I checked my line endings with the file command and learned that the file had the old-style Mac line terminators. I opened my input CSV file with Text Wrangler and saved it again with Unix style line endings. Once I did that, the awk command listed above worked as expected. It took ~5 seconds to create 63 new CSV files broken out by date.

rename output file using split function on mac osx

If you check man split you'll find that the argument --additional-suffix=SUFFIX is not supported in this version.

To achieve what I understand you want you'll need an Automator script or a shell script, e.g.:

#!/bin/sh

DONE=false
until $DONE; do
for i in $(seq 1 16); do
read line || DONE=true;
[ -z "$line" ] && continue;
lines+=$line$'\n';
done
ratio=${lines::${#lines}-10}
(cat "Ratio"; echo "$ratio .txt";)
#echo "--- DONE SPLITTING ---";
lines=;
done < $1

How can I split a large text file into smaller files with an equal number of lines?

Have a look at the split command:

$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic to standard error just
before each output file is opened
--help display this help and exit
--version output version information and exit

You could do something like this:

split -l 200000 filename

which will create files each with 200000 lines named xaa xab xac ...

Another option, split by size of output file (still splits on line breaks):

 split -C 20m --numeric-suffixes input_filename output_prefix

creates files like output_prefix01 output_prefix02 output_prefix03 ... each of maximum size 20 megabytes.

Multisplitting in AWK

Try that :

echo "$test" | awk -F'[;&]' '{print $4}'

I specify a multiple separator in -F'[;&]'



Related Topics



Leave a reply



Submit