Joining Line Breaks in Fasta File with Condition in Sed/Awk/Perl One-Liner

Joining Line Breaks in FASTA file With Condition in SED/AWK/Perl one-liner

$ awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' file 
> sq1 foofoofoobarfoofoofoo
> sq2 quxquxquxbarquxquxquxbarquxx
> sq3 paxpaxpaxpax

AWK join lines between record separator

Here is an awk

awk '/^>/ {print (NR==1?"":RS)$0;next} {printf "%s",$0}' file
>1
AAAAAABBBCCCCCCCCDDDDFFF
>2
AAAAACCC

Merge multiple lines to single line in a file skipping the header

With perl

$ perl -pe 's/\n// if $. > 1 && !eof' AAB08704.1.fasta 
>gi|1117824|gb|AAB08704.1| ecdysteroid regulated 16 kDa [Manduca sexta]
MLFYITVTVLLVSAQAKFYTDCGSKLATVQSVGVSGWPENARECVLKRNSNVTISIDFSPTTDVSAITTEVHGVIMSLPVPFPCRSPDACKDNGLTCPIKAGVVANYKTTLPVLKSYPKVSVDVKWELKKDEEDLVCILIPARIH
  • s/\n// remove newline

    • if $. > 1 && !eof only if line number is greater than one and not end of file
  • Use perl -i -pe for inplace editing. See Command Switches for documentation on -i, -p and -e

How to find patterns across multiple lines using grep?

Grep is an awkward tool for this operation.

pcregrep which is found in most of the modern Linux systems can be used as

pcregrep -M  'abc.*(\n|.)*efg' test.txt

where -M, --multiline allow patterns to match more than one line

There is a newer pcre2grep also. Both are provided by the PCRE project.

pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:

% sudo port install pcre2 

and via Homebrew as:

% brew install pcre

or for pcre2

% brew install pcre2

pcre2grep is also available on Linux (Ubuntu 18.04+)

$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep # Older PCRE

Swap two columns - awk, sed, python, perl

You can do this by swapping values of the first two fields:

awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file


Related Topics



Leave a reply



Submit