Split and Rename The Splitted Files in Shell Script

Split and rename the splitted files in shell script

There are many ways of approaching this. Other answers provide a way to do it just with arguments passed to split - however the version of split on ubuntu 12.04 don't appear to support the arguments used in those answers.

Here is one. This splits the files and uses the default option on split to prefix the file names with an x. It then lists the files in order and renames them as required.

    split -l 100 date.csv
i=1
for x in `ls x* | sort`
do
mv $x date_$i.csv
i=$(($i+1))
done

rename multiple files splitting filenames by '_' and retaining first and last fields

Your rename attempt was close; you just need to make sure the final group is greedy.

rename 's/^([^_]*).*_([^_]*[.]txt)$/$1_$2/' *_*_*.txt

I added a _ before the last opening parenthesis (this is the crucial fix), and a $ anchor at the end, and also extended the wildcard so that you don't process any files which don't contain at least two underscores.

The equivalent in Awk might look something like

find . -name "*_*_*.txt" |
awk -F _ '{ system("mv " $0 " " $1 "_" $(NF)) }'

This is somewhat brittle because of the system call; you might need to rethink your approach if your file names could contain whitespace or other shell metacharacters. You could add quoting to partially fix that, but then the command will fail if the file name contains literal quotes. You could fix that, too, but then this will be a little too complex for my taste.

Here's a less brittle approach which should cope with completely arbitrary file names, even ones with newlines in them:

find . -name "*_*_*.txt" -exec sh -c 'for f; do
mv "$f" "${f%%_*}_${f##*_}"
done' _ {} +

find will supply a leading path before each file name, so we don't need mv -- here (there will never be a file name which starts with a dash).

The parameter expansion ${f##pattern} produces the value of the variable f with the longest available match on pattern trimmed off from the beginning; ${f%%pattern} does the same, but trims from the end of the string.

Splitting and renaming a file using a word delimiter in Linux

This one-liner should help:

awk '/file part/{fn=$NF ".txt"}{print > fn}' split.txt

The idea is same as your codes, just change the sequence number by the last word in the line as filename.

rename files which produced by split

Tip: don't do ls -l if you only need the file names. Even better, don't use ls at all, just use the shell's globbing ability. x* expands to all file names starting with x.

Here's a way to do it:

i=1; for f in x*; do mv $f $(printf 'part-%d.gz' $i); ((i++)); done

This initializes i to 1, and then loops over all file names starting with x in alphabetical order, assigning each file name in turn to the variable f. Inside the loop, it renames $f to $(printf 'part-%d.gz' $i), where the printf command replaces %d with the current value of i. You might want something like %02d if you need to prefix the number with zeros. Finally, still inside the loop, it increments i so that the next file receives the next number.

Note that none of this is safe if the input file names contain spaces, but yours don't.

Splitting a file in a shell script adds unwanted newlines

Option 1

Get rid of all empty lines. This only works if you don't need to retain any of the empty lines in the middle of a section.
Change:

    echo "$line" >&3

To:

    [[ -n "$line" ]] && echo "$line" >&3

Option 2

Rewrite each file using command substitution to trim any trailing newlines. Works best with short files. Change:

        exec 3>&-
exec 3<> outputfile2

To:

        exec 3>&-
data=$(<outputfile1)
echo "$data" >outputfile1
exec 3<> outputfile2

Option 3

Have the loop write the line from the prior iteration, and then do not write the final line from the prior file when you start a new file:

#!/bin/zsh

rm inputfile outputfile1 outputfile2
IFS=''
printf "section1\nsection1end\n\nsection2\nsection2end\n" >inputfile

echo " open outputfile1"
exec 3<> outputfile1
counter=1
IFS=$'\n'

priorLine=MARKER
while IFS= read line; do
if [[ "$line" == "section2" ]]; then
echo " Matched start of section2. Close outputfile1 and open outputfile2"
exec 3>&-
exec 3<> outputfile2
elif [[ "$priorLine" != MARKER ]]; then
echo "$priorLine" >&3
fi
echo $counter $line
let "counter = $counter + 1"
priorLine="$line"
done <inputfile
echo "$priorLine" >&3
echo " Close outputfile2"
exec 3>&-

echo
unset IFS
echo `wc -l inputfile`
echo `wc -l outputfile1`
echo `wc -l outputfile2`
echo " The above should show 5, 2, 2 as desired number of newlines in these files."

How to split out and rename code files in git while preserving history?

The minimum theoretical part that you must learn is this: Git doesn't have file history. Git has commits, and the commits are the history. Each commit has a full snapshot of every file.1

Git can, at any time, compare any two existing commits. If there is a file named F in the old commit, and a file named F in the new commit, we generally assume that this is the same file. But suppose that the old commit has a file named old/path/to/name1.py and the new commit has a file named new/name/of/name2.py.2 Then maybe those should be considered "the same file", even though they have different names.

If some commit renames some file, Git can try to detect that rename. This rename detection depends on the files being similar enough in terms of content. A 100% match on content guarantees that Git can find the rename pretty easily. So when you have a commit that just renames the files, telling Git tell me what changed in this one commit, and by the way, detect renames while you're doing that3 will make Git compare the "before" snapshot to the "after one", and it will find all the renames.

In order to show you a pretend "file history" with git log --follow -- path, Git simply looks at each commit. Git starts at the end and works backwards (it always does this), comparing the before-and-after snapshots, with rename detection enabled. If path is in the "after" commit, and Git finds that it's renamed from some previous path in the "before" commit, Git tells you about that, and then starts looking for the old path name.

That's essentially all you get. Your best bet when renaming a file or restructuring a project, then, is to commit just the renaming, as one commit, then commit any other changes required. You do not have to do this, as the rename detector can often detect a renamed-and-changed file as renamed, but you get a better rename-detection guarantee when you have the rename committed separately, so that each file 100%-matches the previous one.

Note that whether any particular GUI turns on rename-detection, and if so, how, is up to that GUI. All Git provides are the commits.


1The files inside a commit are stored in a special, read-only, Git-only, compressed and de-duplicated format. This means that if you make a thousand commits in a row, and only change README.md once, you have, say, 998 shared copies of the old one and 2 shared copies of the new one, or 400 shared copies of the old one and 600 shared copies of the new one, so that either way, it's really only in the repository twice, rather than a thousand times.

This also, however, means that the files you see and work on, when you work with a Git repository, are not in the Git repository. The files you see and work with are copies that were extracted from the repository, and turned back into usable files in the process. This explains a lot about why Git behaves the way it does.

2Note that the slashes—which go forwards, though you can use backslashes on Windows—are part of each file's name: the name is old/path/to/name1.py, for instance. That's not a folder named old containing a folder named path and so on, that's just a file whose name is old/path/to/name1.py.

3From the command line, use git diff --find-renames or git show --find-renames to enable the rename detector, or set diff.renames to true. In Git version 2.9 and later, diff.renames is set to true by default; in earlier versions, it is set to false by default.

Split a file into multiple files based on a pattern and name the new files by the search pattern in Unix?

Try:

awk '/^GROUP[0-9]+$/{x=$0;next}{print > x;}' cdw_all_jobs_reduced3.txt

If you want the "filenames" remove next statement:

awk '/^GROUP[0-9]+$/{x=$0}{print > x;}' cdw_all_jobs_reduced3.txt

How to split a file and keep the first line in each of the pieces?

This is robhruska's script cleaned up a bit:

tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
head -n 1 file.txt > tmp_file
cat "$file" >> tmp_file
mv -f tmp_file "$file"
done

I removed wc, cut, ls and echo in the places where they're unnecessary. I changed some of the filenames to make them a little more meaningful. I broke it out onto multiple lines only to make it easier to read.

If you want to get fancy, you could use mktemp or tempfile to create a temporary filename instead of using a hard coded one.

Edit

Using GNU split it's possible to do this:

split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }; export -f split_filter; tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_

Broken out for readability:

split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }
export -f split_filter
tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_

When --filter is specified, split runs the command (a function in this case, which must be exported) for each output file and sets the variable FILE, in the command's environment, to the filename.

A filter script or function could do any manipulation it wanted to the output contents or even the filename. An example of the latter might be to output to a fixed filename in a variable directory: > "$FILE/data.dat" for example.

rename output file using split function on mac osx

If you check man split you'll find that the argument --additional-suffix=SUFFIX is not supported in this version.

To achieve what I understand you want you'll need an Automator script or a shell script, e.g.:

#!/bin/sh

DONE=false
until $DONE; do
for i in $(seq 1 16); do
read line || DONE=true;
[ -z "$line" ] && continue;
lines+=$line$'\n';
done
ratio=${lines::${#lines}-10}
(cat "Ratio"; echo "$ratio .txt";)
#echo "--- DONE SPLITTING ---";
lines=;
done < $1

How do I know the total number of files after splitting in Linux Split

You can do that with GNU Parallel.

First make a 10MB file to work with:

dd if=/dev/zero bs=10240 count=1024 > data.bin

Now split into 1MB chunks, naming each chunk suffix{TOTALCHUNKS}-{CHUNKNUMBER}

parallel --recend '' --plus --pipepart --block 1M cat \> suffix{##}-{#} :::: data.bin

Result

-rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-1
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-2
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-3
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-4
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-5
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-6
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-7
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-8
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-9
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-10

Notes:

  • You need --recend '' to stop GNU Parallel trying to split your file on linefeeds

  • You need --plus so that {##} is set to the total number of jobs

  • You need --pipepart to make it faster on seekable files - if your file is not seekable, use --pipe instead

  • {##} means the total number of chunks

  • {#} means the current chunk number



Related Topics



Leave a reply



Submit