Extract Lines When Column K Is Empty with Awk/Perl

Extract Lines when Column K is empty with AWK/Perl

You need to specifically set the field separator to a TAB character:

> cat qq.in
foo 78 xxx
bar yyy
qux 99 zzz
xuq xyz
> cat qq.in | awk 'BEGIN {FS="\t"} $2=="" {print}'
bar yyy
xuq xyz

The default behaviour for awk is to treat an FS of SPACE (the default) as a special case. From the man page:

In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines. (my italics)

Filtering Rows Based On Number of Columns with AWK

You need to use the NF (number of fields) variable to control the actions, such as in the following transcript:

$ echo '0333 foo
> bar
> 23243 qux' | awk 'NF==2{print}{}'
0333 foo
23243 qux

This will print the line if the number of fields is two, otherwise it will do nothing. The reason I have the (seemingly) strange construct NF==2{print}{} is because some implementations of awk will print by default if no rules are matched for a line. The empty command {} guarantees that this will not happen.

If you're lucky enough to have one of those that doesn't do this, you can get away with:

awk 'NF==2'

but the first solution above will work in both cases.

AWK include empty/null entries in delimited text file

Tell awk that your field separator is a tab using the -F option. Example to print the third column with your example:

$ cat foo.txt
Col1 Col3 Col4
Col1 Col2 Col3 Col4
$ awk -F ' ' '{print $3}' foo.txt
Col3
Col3

Using AWK to list several columns from a db queried list when some fields are blank?

If the columns have a consistent/static width, and the spacing is handled with ' ' characters, not tabs (meaning every line is the same length, regardless of missing fields), you could use cut with an appropriate list of field start/stop positions.

Surround every line with single quote except empty lines

.* means zero or more characters, you want 1 or more characters which in any sed would be ..*:

$ sed "s/..*/'&'/" file
'Quote1'
'Quote2'

'Quote3'

You can also write that regexp as .\+ in GNU sed, .\{1,\} in POSIX seds, and .+ in GNU or OSX/BSD sed when invoked with -E.

The above assumes lines of all blanks should be quoted. If that's wrong then:

$ sed "s/.*[^[:blank:]].*/'&'/" file
'Quote1'
'Quote2'

'Quote3'

In any awk assuming lines of all blanks should be quoted:

$ awk '/./{$0="\047" $0 "\047"}1' file
'Quote1'
'Quote2'

'Quote3'

otherwise:

$ awk 'NF{$0="\047" $0 "\047"}1' file
'Quote1'
'Quote2'

'Quote3'

You can see the difference between the above with this:

$ printf '   \n' | sed "s/..*/'&'/"
' '
$ printf ' \n' | sed "s/.*[^[:blank:]].*/'&'/"

$ printf ' \n' | awk '/./{$0="\047" $0 "\047"}1'
' '
$ printf ' \n' | awk 'NF{$0="\047" $0 "\047"}1'

$

Using AWK to list several columns from a db queried list when some fields are blank?

If the columns have a consistent/static width, and the spacing is handled with ' ' characters, not tabs (meaning every line is the same length, regardless of missing fields), you could use cut with an appropriate list of field start/stop positions.

To remove blank lines in data set

it can be done in many ways.

e.g with awk:

awk '$0' yourFile

or sed:

sed '/^$/d' yourFile

or grep:

grep -v '^$' yourFile

grep (awk) a file from A to first empty line

sed might be best:

sed -n '/PATTERN/,/^$/p' file

To avoid printing the empty line:

sed -n '/PATTERN/,/^$/{/^$/d; p}' file

or even better - thanks jthill!:

sed -n '/PATTERN/,/^$/{/./p}' file

Above solutions will give more output than needed if PATTERN appears more than once. For that, it is best to quit after empty line is found, as jaypal's answer suggests:

sed -n '/PATTERN/,/^$/{/^$/q; p}' file

Explanation

  • ^$ matches empty lines, because ^ stands for beginning of line and $ for end of line. So that, ^$ means: lines not containing anything in between beginning and end of line.
  • /PATTERN/,/^$/{/^$/d; p}
    • /PATTERN/,/^$/ match lines from PATTERN to empty line.
    • {/^$/d; p} remove (d) the lines being on ^$ format, print (p) the rest.
  • {/./p} just prints those lines having at least one character.

With awk you can use:

awk '!NF{f=0} /PATTERN/ {f=1} f' file

Same as sed, if it has many lines with PATTERN it would fail. For this, let's exit once empty line is found:

awk 'f && !NF{exit} /PATTERN/ {f=1} f' file

Explanation

  • !NF{f=0} if there are no fields (that is, line is empty), unset the flag f.
  • /PATTERN/ {f=1} if PATTERN is found, set the flag f.
  • f if flag f is set, this is True, so it performs the default awk behaviour: print the line.

Test

$ cat a
aa
bb
hello
aaaaaa
bbb

ttt

$ awk '!NF{f=0} /hello/ {f=1} f' a
hello
aaaaaa
bbb
$ sed -n '/hello/,/^$/{/./p}' a
hello
aaaaaa
bbb

Using awk to print all columns from the nth to the last

Print all columns:

awk '{print $0}' somefile

Print all but the first column:

awk '{$1=""; print $0}' somefile

Print all but the first two columns:

awk '{$1=$2=""; print $0}' somefile


Related Topics



Leave a reply



Submit