Split large files by size limit without cutting lines
From the split man-page:
...
-C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file
...
The description of this option may not be very obvious, but it seems to cover what you are asking for: the file is split at the latest possible line break before reaching SIZE bytes.
split large file into files with a set number of lines based on 1st column value
with double scanning the file you can do
$ awk -F\| -v size=5 'NR==FNR {a[$1]++; next}
FNR==1 || p!=$1 {if(count+a[$1]>=size) {f++; count=0}
else count+=a[$1]; p=$1}
{print > "_file_"f+0}' file{,}
$ head _f*
==> _file_0 <==
A.B|100|20
A.B|101|20
A.X|101|30
A.X|1000|20
==> _file_1 <==
B.Y|1|1
B.Y|1|2
note however that if one of the unique keys can have more records than the desired file length, the non-splitting and keeping the max file length will conflict. In this script, I assumed non-splitting is more important. For example, for the same input file change, set size=1. The keys won't be split into separate files, but file lengths will be more than 1.
Shell command to split large file into 10 smaller files
Use split
- e.g. to split a file every 3.4 million lines (should give you 10 files):
split -l 3400000
$ man split
Related Topics
Exclude First Row When Importing Data from Excel into Python
Replacing Special Characters in a List in Python
How to Remove Lowest Elements in List
Cannot Find Reference 'Xxx' in _Init_.Py
Python Print First N Lines of String
Expression to Remove Url Links from Twitter Tweet
Find Out the Percentage of Missing Values in Each Column in the Given Dataset
How to Remove Carriage Return in a Dataframe
Fill With Nan When Length of Values Does Not Match Length of Index
Get Row Value of Maximum Count After Applying Group by in Pandas
Issue Skipping Song by Requester
Construct Networkx Graph from Pandas Dataframe
Python Comparing Previous and Next Row Value
Large File Crashing on Jupyter Notebook
Check If List Is Ascending or Descending (Using For)
Pass Variable Between Python Scripts
Converting Text File into Json in a Specific Format ( Python )