Linux, Print All Lines in a File, Not Starting With

Linux, Print all lines in a file, NOT starting with

Use the -v option of grep to negate the condition:

grep -v '^#' file

Print all lines in a file not beginning with hash and also filter by a specific column

EDIT: Since OP changed Input_file so adding solution as per Ed sir's suggestions in comments too.

awk '!(/^#/ || $2~/^0\/0/)'  Input_file

Could you please try following.

awk '!/^#meta/ && $2!="0/0"'  Input_file

OR

awk '!/^#/ && $2!="0/0"'  Input_file

OR

awk '!(/^#/ || $2=="0/0")'  Input_file

SET BEGIN{FS=OFS="\t"} in case your Input_file is TAB separated and you need OFS output as \t TAB too.

Printing lines from a file where a specific field does not start with something

The regex could be a tiny bit cleaner:

awk -F: '$3 !~ /^ ?#/ { print }'

It's often better to expect repeated whitespace (space or tab) rather than a single space character, which can look identical in printed output.

awk -F: '$3 !~ /^[[:space:]]*#/ { print }'

Check if a line does not start with a specific string with grep

Simply use the below grep command,

grep -v '^Nov 06' file

From grep --help,

-v, --invert-match        select non-matching lines

Another hack through regex,

grep -P '^(?!Nov 06)' file

Regex Explanation:

  • ^ Asserts that we are at the start.
  • (?!Nov 06) This negative lookahead asserts that there isn't a string Nov 06 following the line start. If yes, then match the boundary exists before first character in each line.

Another regex based solution through PCRE verb (*SKIP)(*F)

grep -P '^Nov 06(*SKIP)(*F)|^' file

Fast way of finding lines in one file that are not in another?

You can achieve this by controlling the formatting of the old/new/unchanged lines in GNU diff output:

diff --new-line-format="" --unchanged-line-format=""  file1 file2

The input files should be sorted for this to work. With bash (and zsh) you can sort in-place with process substitution <( ):

diff --new-line-format="" --unchanged-line-format="" <(sort file1) <(sort file2)

In the above new and unchanged lines are suppressed, so only changed (i.e. removed lines in your case) are output. You may also use a few diff options that other solutions don't offer, such as -i to ignore case, or various whitespace options (-E, -b, -v etc) for less strict matching.


Explanation

The options --new-line-format, --old-line-format and --unchanged-line-format let you control the way diff formats the differences, similar to printf format specifiers. These options format new (added), old (removed) and unchanged lines respectively. Setting one to empty "" prevents output of that kind of line.

If you are familiar with unified diff format, you can partly recreate it with:

diff --old-line-format="-%L" --unchanged-line-format=" %L" \
--new-line-format="+%L" file1 file2

The %L specifier is the line in question, and we prefix each with "+" "-" or " ", like diff -u
(note that it only outputs differences, it lacks the --- +++ and @@ lines at the top of each grouped change).
You can also use this to do other useful things like number each line with %dn.


The diff method (along with other suggestions comm and join) only produce the expected output with sorted input, though you can use <(sort ...) to sort in place. Here's a simple awk (nawk) script (inspired by the scripts linked-to in Konsolebox's answer) which accepts arbitrarily ordered input files, and outputs the missing lines in the order they occur in file1.

# output lines in file1 that are not in file2
BEGIN { FS="" } # preserve whitespace
(NR==FNR) { ll1[FNR]=$0; nl1=FNR; } # file1, index by lineno
(NR!=FNR) { ss2[$0]++; } # file2, index by string
END {
for (ll=1; ll<=nl1; ll++) if (!(ll1[ll] in ss2)) print ll1[ll]
}

This stores the entire contents of file1 line by line in a line-number indexed array ll1[], and the entire contents of file2 line by line in a line-content indexed associative array ss2[]. After both files are read, iterate over ll1 and use the in operator to determine if the line in file1 is present in file2. (This will have have different output to the diff method if there are duplicates.)

In the event that the files are sufficiently large that storing them both causes a memory problem, you can trade CPU for memory by storing only file1 and deleting matches along the way as file2 is read.

BEGIN { FS="" }
(NR==FNR) { # file1, index by lineno and string
ll1[FNR]=$0; ss1[$0]=FNR; nl1=FNR;
}
(NR!=FNR) { # file2
if ($0 in ss1) { delete ll1[ss1[$0]]; delete ss1[$0]; }
}
END {
for (ll=1; ll<=nl1; ll++) if (ll in ll1) print ll1[ll]
}

The above stores the entire contents of file1 in two arrays, one indexed by line number ll1[], one indexed by line content ss1[]. Then as file2 is read, each matching line is deleted from ll1[] and ss1[]. At the end the remaining lines from file1 are output, preserving the original order.

In this case, with the problem as stated, you can also divide and conquer using GNU split (filtering is a GNU extension), repeated runs with chunks of file1 and reading file2 completely each time:

split -l 20000 --filter='gawk -f linesnotin.awk - file2' < file1

Note the use and placement of - meaning stdin on the gawk command line. This is provided by split from file1 in chunks of 20000 line per-invocation.

For users on non-GNU systems, there is almost certainly a GNU coreutils package you can obtain, including on OSX as part of the Apple Xcode tools which provides GNU diff, awk, though only a POSIX/BSD split rather than a GNU version.

Linux command to filter records from a file that does not start with number

[1,9] means "one of the characters 1, , or 9. That's probably not what you meant. Maybe you meant [19] (one or nine) or maybe you meant [0-9] (any digit). I'm assuming that you meant "any digit" because that's what you said in the title. If you meant something else, I hope that it will be obvious how to fix it.

To invert the set of characters which could match, put a ^ right after the [. So [^0-9] means "anything other than a digit".

The ^ at the beginning of the pattern means "only match at the beginning of a line". But it only means that at the beginning of the pattern.

So ^[^0-9] matches a line which starts with something other than a digit.

Neither of those patterns will match an empty line, since both of them match exactly one character.

Normally, grep searches for the pattern anywhere in a line, and prints the line if it finds the pattern. But if you put a ^ at the beginning of the pattern, it only checks the beginning of the line. You can also put a $ at the end of the pattern to mean "only match at the end of the line". If you use both ^ at the beginning and $ at the end, you are asking grep to print lines which precisely match the pattern. You don't need to know that for this question, but it will come in handy some day.

Find rows in a file which doesnt contain a string and add text at start of it

grep is not the tool here, I'd use awk or sed. Using awk:

$ awk '
BEGIN {
FS=OFS="," # set delimiters to ,
}
NF==1 { # if there is only one field (consider NF<=1 for ampty records)
$1=OFS $1 # add a delimiter in front of it
}
1' file # output

Output:

name,path:A:B
,loc:D
name,for:B:C

grep lines NOT starting with # or empty lines


grep -v '^#' fileNameIGrepFor | grep -v '^$'

can be simplified into:

grep -v '^#\|^$' fileNameIGrepFor

To remove the ugly \ you can use grep -E, or equivalently egrep:

egrep -v '^#|^$' fileNameIGrepFor

You could then clarify this a bit by grouping the terms:

egrep -v '^(#|$)' fileNameIGrepFor

And then make it a little more robust by including a check for whitespace before the #:

egrep -v '^(\s*#|$)' fileNameIGrepFor

Maybe you'll also want to exclude all blank lines (only contain whitespace)? In which case, again, the change is simple:

egrep -v '^\s*(#|$)' fileNameIGrepFor

How to print lines of file that start with d and end with number

What about this:

grep "^d" korad | grep "[0-9]$"

This first filters the lines, starting with letter "d" and afterwards filters those results with the lines, ending with a number. Like that, you don't need to worry about anything being present between the first and the last character.

In case you don't understand the vertical bar, it's called a pipe, which is (amongst others) explained here.

How to join lines not starting with specific pattern to the previous line in UNIX?

Please try the following:

awk 'BEGIN {accum_line = "";} /^These/{if(length(accum_line)){print accum_line; accum_line = "";}} {accum_line = accum_line " " $0;} END {if(length(accum_line)){print accum_line; }}' < data.txt

The code consists of three parts:

  1. The block marked by BEGIN is executed before anything else. It's useful for global initialization
  2. The block marked by END is executed when the regular processing finished. It is good for wrapping the things. Like printing the last collected data if this line has no These at the beginning (this case)
  3. The rest is the code performed for each line. First, the pattern is searched for and the relevant things are done. Second, data collection is done regardless of the string contents.


Related Topics



Leave a reply



Submit