Use Sed to Delete Certain Lines Using an Index with the Line Numbers to Delete

Use SED to delete certain lines using an index with the line numbers to delete

Do not call sed in a loop, that will be very slow.

You could transform the index file into a sed script, then call sed once on the data file:

sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt

Or, as @Hazzard17 points out, ignore lines that don't contain just digits:

script=$(sed -n '/^[[:blank:]]*[[:digit:]]\+[[:blank:]]*$/ s/$/d/p' index.txt)
sed -i.bak "$script" file.txt

a demo:

$ seq 20000 | sed 's/^/line/' > file.txt
$ wc file.txt
20000 20000 188894 file.txt
$ seq 20000 | while read n; do [[ $RANDOM -le 5000 ]] && echo $n; done > index.txt
$ wc index.txt
3083 3083 16789 index.txt
$ sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt
$ wc -l file.txt{,.bak}
16917 file.txt
20000 file.txt.bak
36917 total

To read a file into an array, you can do:

mapfile -t indices < index.txt
for i in "${indices[@]}"; do ...; done

or just iterate over the file

while IFS= read -r i; do ...; done < index.txt

Delete specific line number(s) from a text file using sed?

If you want to delete lines from 5 through 10 and line 12th:

sed -e '5,10d;12d' file

This will print the results to the screen. If you want to save the results to the same file:

sed -i.bak -e '5,10d;12d' file

This will store the unmodified file as file.bak, and delete the given lines.

Note: Line numbers start at 1. The first line of the file is 1, not 0.

How to keep certain line numbers and delete the rest

How about stopping the print for sed and mention wherever you want to print the lines then.

sed -i.bak -n '6p;8p;15p;' Input_file

From man sed:

   -n, --quiet, --silent

suppress automatic printing of pattern space

p Print the current pattern space.

Delete range of lines when line number of known or not in unix using head and tail?

Adding solution as per OP's request to make it genuine solution.

Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.

What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.

cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)

awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh

Change Input_file name to your actual file name and let me know how it goes then.


Following awk may help you in same(Since I don't have Hp system so didn't test it).

awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1'  Input_file

EDIT: Adding non-one liner form of solution too now.

awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file

Use sed to delete all lines starting with pattern b after line with pattern a

With GNU sed, you may use

sed '/DELETE ME/{:a;N;s/\n[[:blank:]]*-.*//;ta;!P;D}' file

See the online sed demo:

s='first line
second line DELETE ME
- third line
- fourth line
fifth line
sixth line DELETE ME
seventh line
- eighth line'
sed '/DELETE ME/{:a;N;s/\n[[:blank:]]*-.*//;ta;!P;D}' <<< "$s"

Output:

first line
fifth line
seventh line
- eighth line

Details

  • /DELETE ME/ - finds all lines that contain DELETE ME string
  • {:a;N;s/\n[[:blank:]]*-.*//;ta;!P;D} - if the line matching DELETE ME is found, this block is entered:
    • :a - an a label marks the current position
    • N - reads the next line with \n at the start into the pattern space
    • s/\n[[:blank:]]*-.*// - finds and removes the newline, 0+ blank chars, - and the rest of the string
    • ta - if the substitution occurred, sed goes to the position marked with a
    • !P - otherwise, prints the pattern space content until the first newline (i.e. prints the first line)
    • D - deletes the pattern space content until the first new line, i.e. deletes the first line inside pattern space, and restarts cycle with the resultant pattern space, without reading a new line of input.

Delete line from text file with line numbers from another file

awk oneliner should work for you, see test below:

kent$  head lines.txt doc.txt 
==> lines.txt <==
1
3
5
7

==> doc.txt <==
a
b
c
d
e
f
g
h

kent$ awk 'NR==FNR{l[$0];next;} !(FNR in l)' lines.txt doc.txt
b
d
f
h

as Levon suggested, I add some explanation:

awk                     # the awk command
'NR==FNR{l[$0];next;} # process the first file(lines.txt),save each line(the line# you want to delete) into an array "l"

!(FNR in l)' #now come to the 2nd file(doc.txt), if line number not in "l",print the line out
lines.txt # 1st argument, file:lines.txt
docs.txt # 2nd argument, file:doc.txt

Delete lines by pattern in specific range of lines

You can use this sed,

sed -i.bak '731,1000{/some_pattern/d}' yourfile

Test:

$ cat a
1
2
3
13
23
4
5

$ sed '2,4{/3/d}' a
1
2
23
4
5

Delete lines shorter than a certain length and the one above it (remove short sequences in a FASTA file)

With a GNU sed, you can use

sed -E '/>/N;/\n[^>].{0,4}$/d'

Details:

  • />/ - finds lines with > (if it must be at the start, add ^ before >)
  • N - reads the line and appends it to the pattern space with a leading newline
  • \n[^>].{0,4}$ - a newline, a char other than a > (as the first char should not be >) and then zero to four chars till end of the string
  • d removes the value in pattern space.

See the online demo:

#!/bin/bash
s='>seq1
GAAAT
>seq2
CATCTCGGGA
>seq3
GAC
>seq4
ATTCCGTGCC'
sed -E '/>/N;/\n[^>].{0,4}$/d' <<< "$s"

Output:

>seq2
CATCTCGGGA
>seq4
ATTCCGTGCC


Related Topics



Leave a reply



Submit