Removing Parts of a String That Contain Digit with Sed/Perl

Removing parts of a string that contain digit with SED/Perl

You need the -r switch and a character class for the sed.

$ echo "AB208804_1 446 576 AB208804_1orf 0" | sed -r 's/_[0-9]+//g'
AB208804 446 576 AB208804orf 0

Or, since you asked; in perl:

$ echo "AB208804_1 446 576 AB208804_1orf 0" | perl -ne 's/_\d+//g; print $_'
AB208804 446 576 AB208804orf 0

Remove from the beginning till certain part in a string

sed 's/^(.*)_([^_]*)$/_\2/' < input.txt

How to delete from a text file, all lines that contain a specific string?

To remove the line and print the output to standard out:

sed '/pattern to match/d' ./infile

To directly modify the file – does not work with BSD sed:

sed -i '/pattern to match/d' ./infile

Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:

sed -i '' '/pattern to match/d' ./infile

To directly modify the file (and create a backup) – works with BSD and GNU sed:

sed -i.bak '/pattern to match/d' ./infile

how to remove part of the string if the condition exists

This works:

$ sed -E 's/^([^:]*:[^:]*):[0-9][0-9]$/\1/' file

The [^:] means 'any character other than a :' so it works by making the deletion at the end only if there are two leading colons.

This awk works too:

$ awk 'gsub(/:/,":")==2 {sub(/:[0-9][0-9]$/,"")} 1' file

In this case, gsub returns the number of replacements made. So if there are two colons, delete the ending.

You can also use GNU grep (with PCRE) to only match the template of what you are looking for:

$ grep -oP '^\w+\*\d\d:\d\d' file

Or perl same way:

$ perl -lnE 'say "$1" if /(^\w+\*\d\d:\d\d)/' file

Remove leading and trailing numbers from string, while leaving 2 numbers, using sed or awk

You may try this sed:

sed -E 's/^[0-9]+([0-9]{2})|([0-9]{2})[0-9]+$/\1\2/g' file

51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Command Details:

  • ^[0-9]+([0-9]{2}): Match 1+ digits at start if that is followed by 2 digits (captured in a group) and replace with 2 digits in group #1.
  • ([0-9]{2})[0-9]+$: Match 1+ digits at the end if that is preceded by 2 digits (captured in a group) and replace with 2 digits in group #2.

Sed Regex to delete all numbers except ordinals

Since sed doesn't have support for lookarounds you have to define each path using:

[0-9]+(([sS]([^Tt]|$)|[Tt]([^Hh]|$)|[RNrn]([^Dd]|$))|[^RNSTrnst0-9]|$)

Live demo

For case-insensitivity I included both upper and lower cases into bracket notations.

GNU sed command (POSIX ERE):

sed -r 's/[0-9]+(([sS]([^Tt]|$)|[Tt]([^Hh]|$)|[RNrn]([^Dd]|$))|[^RNSTrnst0-9]|$)/\1/g' file

Regex breakdown:

[0-9]+ # Match digits
( # Start of Capturing Group #1
( # Start of Capturing Group #2
[sS] # Match S or s
( # Start of Capturing Group #3
[^Tt] # If a character exists after S it shouldn't be T
| # Or
$ # Match end of line position
) # End of Capturing Group #3
| # Or
[RNrn] # Match a letter from set
( # Start of Capturing Group #4
[^Dd] # If a character exists after R or N it shouldn't be D
| # Or
$ # Match end of line position
) # End of Capturing Group #4
) # End of Capturing Group #2
| # Or
[^RNSrns0-9] # Match a letter from other than one in set
| # Or
$ # Match end of line position
) # End of Capturing Group #1

How to delete certain characters after a pattern using sed or awk?

1st solution: Could you please try following, written and tested with shown samples in GNU awk(where assuming ;;; occurring one time in lines).

awk '
match($0,/.*;;;/){
laterPart=substr($0,RSTART+RLENGTH)
gsub(/[,.:;()~?]/,"",laterPart)
print substr($0,RSTART,RLENGTH) laterPart
}' Input_file

Explanation: Adding detailed explanation for above.

awk '                                  ##Starting awk program from here.
match($0,/.*;;;/){ ##Using atch function to match everything till ;;; here.
laterPart=substr($0,RSTART+RLENGTH) ##Creating variable laterPart which has rest of the line apart from matched regex part above.
gsub(/[,.:;()~?]/,"",laterPart) ##Globally substituting ,.:;()~? with NULL in laterPart variable.
print substr($0,RSTART,RLENGTH) laterPart ##Printing sub string of matched regex and laterPart var here.
}' Input_file ##Mentioning Input_file name here.



2nd solution: In case you have multiple occurrences of ;;; in lines and you want to substitute characters from all fields, after 1st occurrence of ;;; then try following.

awk 'BEGIN{FS=OFS=";;;"} {for(i=2;i<=NF;i++){gsub(/[,.:;()~?,]/,"",$i)}} 1' Input_file

Removing specific character from anywhere between two specific strings?

Using a substitution and a loop:

sed ':l s/\(number="[^" \t]*\)\s\s*/\1/g;tl' input

this one gives:

number="+123123123" text="This is some text"
number="+123456" text="This may contain numbers"
number="+123456789" text="Numbers here should keep their spaces"
number="+98765" text="example 123 123 123"

Removing non-alphanumeric characters with sed

tr's -c (complement) flag may be an option

echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'


Related Topics



Leave a reply



Submit