Extracting columns from text file with different delimiters in Linux
If the command should work with both tabs and spaces as the delimiter I would use awk
:
awk '{print $100,$101,$102,$103,$104,$105}' myfile > outfile
As long as you just need to specify 5 fields it is imo ok to just type them, for longer ranges you can use a for
loop:
awk '{for(i=100;i<=105;i++)print $i}' myfile > outfile
If you want to use cut
, you need to use the -f
option:
cut -f100-105 myfile > outfile
If the field delimiter is different from TAB
you need to specify it using -d
:
cut -d' ' -f100-105 myfile > outfile
Check the man page for more info on the cut command.
Extract Column(s) from text file having Multi Character Delimiter i.e. %$%
The symbol $
is a special character in a regex, so you need to escape it with a \
, which is also a special character for the string literal, so it needs to be escaped again.
So, finally we have:
$ cat sample
ghkjlj;lk%$%23e;k32poek%$%eqdje2oijd%$%xrgtdy5h
$ awk -F'%\\$%' '{print $1}' sample
ghkjlj;lk
Extract specific columns from delimited file using Awk
I don't know if it's possible to do ranges in awk. You could do a for loop, but you would have to add handling to filter out the columns you don't want. It's probably easier to do this:
awk -F, '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$20,$21,$22,$23,$24,$25,$30,$33}' infile.csv > outfile.csv
something else to consider - and this faster and more concise:
cut -d "," -f1-10,20-25,30-33 infile.csv > outfile.csv
As to the second part of your question, I would probably write a script in perl that knows how to handle header rows, parsing the columns names from stdin or a file and then doing the filtering. It's probably a tool I would want to have for other things. I am not sure about doing in a one liner, although I am sure it can be done.
How to extract a certain column in a space-delimited .txt file and store each unique value along with the number of times it appears [Unix - Bash]
awk '{print $2}'
extracts the second column, not row.
You can indeed use sort
and uniq
to do this, and that's the traditional Unix 'toolbox' method, which a great many people before you have also thought of:
awk '{print $2}' file.txt | sort -n | uniq -c
(uniq -c
counts adjacent duplicates instead of removing them. On any non-weird Unix system, you can use man {programname}
to get documentation on a program, and man uniq
shows you several options that can be useful for various things including -c
.)
But awk
can also do the whole job (or nearly) by itself:
awk '{++c[$2]} END{for(v in c){print c[v],v}}' file.txt
awk has 'associative' arrays subscripted or 'keyed' by any values, not just more-or-less consecutive integers; this was the 1970s name for what nowadays is often called a dictionary. (And all array elements, and variables other than predefined ones like NR NF OFS etc, are initialized to an empty value, which is treated numerically as zero.)
Since this is normally implemented as a hash-table, the for..in
statement in traditional awk can produce the values in an arbitrary order, and the standard (POSIX) codifies this. If you want them in numeric order (as the sort|uniq
method produces), you can add ... | sort -nk2
, or only on non-ancient versions of GNU awk (which is now common but not universal) you can use:
awk '{++c[$2]} END{PROCINFO["sorted_in"]="@val_num_asc";for(v in c){print c[v],v}}' file.txt
Extract several space-delimited fields from file with varying delimiters into another file in Bash
I figured out a solution.
- Remove the header line.
- Filter all lines based on the word "rectangle" using grep.
- Replace whitespaces with commas to make it easier to deal with.
- Iterate through each line, saving to file as needed.
#!/bin/bash
#Code here to retrieve the file from command arguments and set it as $inputFile (removed for brevity)
sed -i 1d $inputFile #Remove header line
sed 's/^ *//g' < $inputFile > work.txt #Remove first character in each line (a space).
tr -s ' ' <work.txt | tr ' ' ',' >work2.txt #Switch spaces for commas.
grep "rectangle" work2.txt > work3.txt #Print all lines containing "rectangle" in them to new file.
rm lineout.txt #Delete output file in case script was run previously.
touch lineout.txt
count=0
while IFS='' read -r line || [[ -n "$line" ]]; do
printf "$line" > line.txt
awk 'BEGIN { FS="," } { printf $1 >> "lineout.txt" }' line.txt
printf "," >> lineout.txt
awk 'BEGIN { FS="," } { printf $2 >> "lineout.txt" }' line.txt
printf "," >> lineout.txt
count=$((count + 1))
if [[ $count = "1" ]]
then
printf "$count\n" >> lineout.txt
else
printf "0\n" >> lineout.txt
if [[ $count = "4" ]]
then
count=0
fi
fi
done < work3.txt
Awk command to extract columns on dual delimiter
awk
interprets the field separator as a regular expression, so you just need to double \\
escape each character to get the literals.
echo 'name[^legalName[^code[^type[^contactNumber1[^contactNumber2' | awk -F'\\[\\^' '{print $2}'
legalName
Ubuntu: How do I extract only specific columns from tab-delimited file if it contains a specific string?
Simplifying your code (with code borrowed from Extract column using grep)
grep -E "chr6.fa" FC305JN_s_1_eland_result.txt > out.txt
awk '{print $1, "\t", $2, "\t", $7, "\t", $8, "\t", $9}' out.txt > outfile.txt
produces output:
FC305JN_20080525:1:15:944:72 GATGACTTCCTTAATTTTCTTTATNNNN chr6.fa 7200804 R
FC305JN_20080525:1:15:1799:100 TTCAGCTTATTGATAAAGAAGCACNNNN chr6.fa 20979453 R
FC305JN_20080525:1:15:771:1076 GAGTTCACTAAACAAAAGAGTGTCNNNN chr6.fa 136877852 R
Related Topics
Count Number of Files Within a Directory in Linux
Docker Command Can't Connect to Docker Daemon
How to Run Crontab Job Every Week on Sunday
Generating a Sha-256 Hash from the Linux Command Line
How to Run Nginx Within a Docker Container Without Halting
Vim Configuration for Linux Kernel Development
How to Edit /Etc/Sudoers from a Script
Get Yesterday's Date in Bash on Linux, Dst-Safe
Linux: Where Are Environment Variables Stored
Should I Use Libc++ or Libstdc++
Reading Living Process Memory Without Interrupting It
How to Parse CSV Files on the Linux Command Line
Difference Between Arm-Eabi Arm-Gnueabi and Gnueabi-Hf Compilers
How to Install PHP 7 on Ec2 T2.Micro Instance Running Amazon Linux Distro
How to Portably Extend a File Accessed Using Mmap()
How to Reference Files Relative to Application Root in Node.Js