extract unique block of lines from a file using shell script
Thanks @tripleee and @Jarmund for your suggestions. From your inputs I was finally able to figure out solution of my problem. I got hint from associative arrays to make unique key for each block, so here is what i did
take file-2 and convert each block into single line
awk '/[Ss]etup.[Tt]est/ || /perl/[[:alpha:]]/[Ss]etup[Rr]eq/{if(b) exit; else b=1}1' file-2 > $TESTSETFILE
cat $TESTSETFILE | sed ':a;N;$!ba;s/\n//g;s/ //g' >> $SINGLELINEFILENow each line in this file is a unique entry
- after this i use grep for each line in File-1 and find respective block(which is converted into single line) now
- Then i use awk or sort -u to find unique entries in my solution file
Maybe this solution is not best but it is lot faster than the previous one.
Here is my new script
FLBATCHLIST=$1
BATCHFILE=$2
TEMPDIR="./tempBatchdir"
rm -rf $TEMPDIR/*
WORKFILE="$TEMPDIR/failedTestList.txt"
CPBATCHFILE="$TEMPDIR/orig.test"
TESTSETFILE="$TEMPDIR/testset.txt"
DIFFFILE="$TEMPDIR/diff.txt"
SINGLELINEFILE="$TEMPDIR/singleline.txt"
TEMPFILE="$TEMPDIR/temp.txt"
#Output
FAILEDBATCH="$TEMPDIR/FailedBatch.test"
LOGFILE="$TEMPDIR/log.txt"
convertSingleLine ()
{
sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//g;/^$/d' $CPBATCHFILE
STATUS=1
while [ $STATUS -ne "0" ]
do
if [ ! -s $CPBATCHFILE ]; then
echo "$CPBATCHFILE is empty" >> $LOGFILE
STATUS=0
fi
awk '/[Ss]etup.*[Tt]est/ || /perl\/[[:alpha:]]*\/[Ss]etup[Rr]eq/{if(b) exit; else b=1}1' $CPBATCHFILE > $TESTSETFILE
cat $TESTSETFILE | sed ':a;N;$!ba;s/\n//g;s/ //g' >> $SINGLELINEFILE
echo "**" >> $SINGLELINEFILE
TSTFLLINES=`wc -l < $TESTSETFILE`
CPBTCHLINES=`wc -l < $CPBATCHFILE`
DIFF=`expr $CPBTCHLINES - $TSTFLLINES`
tail -n $DIFF $CPBATCHFILE > $DIFFFILE
mv $DIFFFILE $CPBATCHFILE
done
}
####STARTS HERE####
mkdir -p $TEMPDIR
sed 's/^[eE][xX][eE][cC]//g;s/^[[:space:]]*//;s/[[:space:]]*$//g;/^$/d' $FLBATCHLIST > $WORKFILE
sed -i 's/\([\/\.\"]\)/\\\1/g' $WORKFILE
cp $BATCHFILE $CPBATCHFILE
convertSingleLine
for fltest in $(cat $WORKFILE)
do
echo $fltest >> $LOGFILE
grep "$fltest" $SINGLELINEFILE >> $FAILEDBATCH
if [ $? -eq "0" ]; then
echo "TEST FOUND" >> $LOGFILE
else
ABSTEST=$(echo $fltest | sed 's/\\//g')
echo "FATAL ERROR: Test \"$ABSTEST\" not found in $BATCHFILE" | tee -a $LOGFILE
fi
done
awk '!x[$0]++' $FAILEDBATCH > $TEMPFILE
mv $TEMPFILE $FAILEDBATCH
sed -i "s/exec/\\nexec /g;s/#/\\n#/g" $FAILEDBATCH
sed -i '1d;s/\//\\/g' $FAILEDBATCH
Here is the output
$ crflbatch file-1 file-2
FATAL ERROR: Test "perl/RRP/RRP-1.30/JEDI/CommonReq/confAbvExp" not found in file-2
FATAL ERROR: Test "this/or/that" not found in file-2
$ cat tempBatchdir/FailedBatch.test
exec 1.20\setup\testinit
exec 1.20\abc\this_is_test_1
exec 1.20\abc\this_is_test_1
exec perl\RRP\SetupReq\testdef_ijk
exec perl\RRP\RRP-1.30\JEDI\SetupReq\confAbvExp
exec perl\RRP\RRP-1.30\JEDI\JEDIExportSuccess2
exec perl\LRP\SetupReq\testird_LRP("LRP")
exec perl\BaseLibs\launch_client("LRP")
exec perl\LRP\LRP-classic-4.14\churrip\chorSingle
exec perl\LRP\BaseLibs\setupLRPMMMTab
exec perl\LRP\BaseLibs\launchMMM
exec perl\LRP\BaseLibs\launchLRPCHURRTA("TYRE")
#PAUSEExpandChurriptreeview&openallnodes
exec perl\LRP\LRP-classic-4.14\Corrugator\multipleSeriesWeb
exec perl\BaseLibs\ShutApp("SelfDestructionSystem")
exec perl\LRP\BaseLibs\close-MMM
$
How can I extract a predetermined range of lines from a text file on Unix?
sed -n '16224,16482p;16483q' filename > newfile
From the sed manual:
p -
Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.n -
If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If
there is no more input then sed exits without processing any more
commands.q -
Exitsed
without processing any more commands or input.
Note that the current pattern space is printed if auto-print is not disabled with the -n option.
and
Addresses in a sed script can be in any of the following forms:
number
Specifying a line number will match only that line in the input.An address range can be specified by specifying two addresses
separated by a comma (,). An address range matches lines starting from
where the first address matches, and continues until the second
address matches (inclusively).
How to generate list of unique lines in text file using a Linux shell script?
If you don't mind the output being sorted, use
sort -u
This sorts and removes duplicates
extracting unique values between 2 sets/files
$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
6
7
Explanation of how the code works:
- If we're working on file1, track each line of text we see.
- If we're working on file2, and have not seen the line text, then print it.
Explanation of details:
FNR
is the current file's record numberNR
is the current overall record number from all input filesFNR==NR
is true only when we are reading file1$0
is the current line of texta[$0]
is a hash with the key set to the current line of texta[$0]++
tracks that we've seen the current line of text!($0 in a)
is true only when we have not seen the line text- Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given
Select unique or distinct values from a list in UNIX shell script
You might want to look at the uniq
and sort
applications.
./yourscript.ksh | sort | uniq
(FYI, yes, the sort is necessary in this command line, uniq
only strips duplicate lines that are immediately after each other)
EDIT:
Contrary to what has been posted by Aaron Digulla in relation to uniq
's commandline options:
Given the following input:
class
jar
jar
jar
bin
bin
java
uniq
will output all lines exactly once:
class
jar
bin
java
uniq -d
will output all lines that appear more than once, and it will print them once:
jar
bin
uniq -u
will output all lines that appear exactly once, and it will print them once:
class
java
How to print only the unique lines in BASH?
Using awk:
awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file
eagle
forest
Related Topics
How to Correctly Nandwrite a Nanddump'Ed Dump with Oob
List of Available Wireless Networks with Golang (Under Linux)
How to Implement Highly Accurate Timers in Linux Userspace
Can 'Vim' Open a Large File in Read Only Mode as Fast as 'Less'
How to Write on Serial Port Using Qextserialport
Add a Directory When Creating Tar Archive
How Does/Frequent Unix Tee Command Write Stdout Terminal Output to File? If The Output Is Too Big
How to Include Debug Information with Nasm
Use "Git Revert" to Back-Out a Change Adding a Line
How to Connect to Docker Container from Localhost
.Dat Attachment Instead of Text Using Mailx in Redhat Linux
Qwidget/X11: Prevent Window from Beeing Activated/Focussed by Mouse Clicks
Is It Necessary to Flush Write Combine Memory Explicitly by Programmer
Merge Multiple Lines to 1 Row with Awk(Or Familiar)
X11 Forwarding Through Google Colab
Amazon Ses on Google Cloud Computing Instance Vm Using Postfix