Splitting bulk text file every n line
for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done
Or, written over multiple lines:
for f in filename*.txt
do
split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"
done
How it works:
-d
tellssplit
to use numeric suffixes-a1
tellssplit
to start with only single digits for the suffix.-l10000
tellssplit
to split every 10,000 lines.--additional-suffix=.txt
tellssplit
to add.txt
to the end of the names of the new files."$f"
tellssplit
the name of the file to split."${f%.txt}-"
tellssplit
the prefix name to use for the split files.
Example
Suppose that we start with these files:
$ ls
filename1.txt filename2.txt
Then we run our command:
$ for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done
When this is done, we now have the original files and the new split files:
$ ls
filename1-0.txt filename1-1.txt filename1.txt filename2-0.txt filename2-1.txt filename2.txt
Using older, less featureful forms of split
If your split does not offer --additional-suffix
, then consider:
for f in filename*.txt
do
split -d -a1 -l10000 "$f" "${f%.txt}-"
for g in "${f%.txt}-"*
do
mv "$g" "$g.txt"
done
done
How can I split a large text file into smaller files with an equal number of lines?
Have a look at the split command:
$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic to standard error just
before each output file is opened
--help display this help and exit
--version output version information and exit
You could do something like this:
split -l 200000 filename
which will create files each with 200000 lines named xaa xab xac
...
Another option, split by size of output file (still splits on line breaks):
split -C 20m --numeric-suffixes input_filename output_prefix
creates files like output_prefix01 output_prefix02 output_prefix03 ...
each of maximum size 20 megabytes.
How to split large text file in windows?
If you have installed Git for Windows, you should have Git Bash installed, since that comes with Git.
Use the split
command in Git Bash to split a file:
into files of size 500MB each:
split myLargeFile.txt -b 500m
into files with 10000 lines each:
split myLargeFile.txt -l 10000
Tips:
If you don't have Git/Git Bash, download at https://git-scm.com/download
If you lost the shortcut to Git Bash, you can run it using
C:\Program Files\Git\git-bash.exe
That's it!
I always like examples though...
Example:
You can see in this image that the files generated by split
are named xaa
, xab
, xac
, etc.
These names are made up of a prefix and a suffix, which you can specify. Since I didn't specify what I want the prefix or suffix to look like, the prefix defaulted to x
, and the suffix defaulted to a two-character alphabetical enumeration.
Another Example:
This example demonstrates
- using a filename prefix of
MySlice
(instead of the defaultx
), - the
-d
flag for using numerical suffixes (instead ofaa
,ab
,ac
, etc...), - and the option
-a 5
to tell it I want the suffixes to be 5 digits long:
splitting file to smaller max n-chars files without cutting any line
This should do it:
BEGIN {
maxChars = 700
out = "file.0"
}
{
numChars = length($0)
totChars += numChars
if ( totChars > maxChars ) {
close(out)
out = "file." ++cnt
totChars = numChars
}
print > out
}
Python A way to split a file into sections every time a newline appears in python and manipulate the sections
In the first instance where you have line break separated details, you can do:
file_name = "./details.txt"
def print_details(name, age, hobbies):
"""Helper function to print the details in a nice format"""
print(f"Hi {name}. You are aged {age}. You like {hobbies}")
# add details to a list until a newline is reached, print then reset
with open(file_name, 'r') as fin:
details = []
for line in fin:
if line == '\n':
print_details(*details)
details = []
else:
details.append(line.strip())
print_details(*details)
Related Topics
How to Check If Jboss Is Running on Unix Server
Trying to Launch an External Editor from Within a Go Program
How to Touch a File and Mkdir If Needed in One Line
How to Idiomatically Package Dependencies for a Qt Application Using Cpack
How to Get My Cuda Specs on a Linux Machine
Which Characters Are Allowed in a Bash Alias
How Delete File from Fortran Code
Docker Container Can Reach Dns But Not Resolve Hosts
/Etc/Lsb-Release Vs /Etc/Os-Release
Bash: Send Sigtstp Signal (Ctrl+Z)
Capture Nethogs Output in Log File
Gnu Time and Formatting Output
Extract Lines When Column K Is Empty with Awk/Perl