Sed: Remove Whole Words Containg a Character Class

sed: remove whole words containg a character class

Using awk:

s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
awk '{ofs=""; for (i=1; i<=NF; i++) if ($i ~ /^[[:alpha:]]+$/)
{printf "%s%s", ofs, $i; ofs=OFS} print ""}' <<< "$s"
ok

This awk command loops through all words and if word matches the regex /^[[:alpha:]]+$/ then it writes to standard out. (i<NF)?OFS:RS is a short cut to add OFS if current field no is less than NF otherwise it writes RS.

Using grep + tr together:

s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
r=$(grep -o '[^ ]\+' <<< "$s"|grep '^[[:alpha:]]\+$'|tr '\n' ' ')
echo "$r"
ok

First grep -o breaks the string into individual words. 2nd grep only searches for words with alphabets only. ANd finally tr translates \n to space.

Remove words starting with _ in file using sed in bash

try this

sed -ie 's/_[A-Za-z0-9]* / /g' here.txt

Remove one-character words

You have to use word boundary \b (or) \< and \> respectively match the empty string at the beginning and end of a word.

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\b\w\b \?//g'

(OR)

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\<.\> \?//g'

sed: removing alphanumeric words from a file

If you want to remove all words that consist of letters and digits, leaving only words that consist of all digits or all letters:

sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile

Example:

$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222

remove data between two word with sed

  • As sed cannot parse xml files, there are many cases that sed does
    not work well (e.g. tags within a comment tag).
  • As sed regex does not support the non-greedy match, we need to
    consider about workarounds.

Based on the above, would you please try:

sed $'s/<tag1>/&\\\n/g' input | sed '/<tag1>/,/<\/tag1>/d'

Output:

data1
data2
data3
data4
data5

The first sed just puts a line break after the <tag1>.

Although it works for the provided example, please note there are
many cases it doesn't work well (e.g. </tag1> is missing).

sed delete match within quotes on line containing several quotes

You need to use a negated character class [^"]* which matches any character but not of " zero or more times. .* in your regex is greedy by default, it eats all the characters upto the last " double quotes. So that only it matches Stacey and upto the last Ford. And also you must need to add a word boundary \b before the NAME, so that it won't match the string NAME in SURNAME . \b matches between a word character and a non-word character.

sed 's/\bNAME="[^"]*"/NAME="Jack"/g' names.xml

Sed command to remove all lines not containing punctuation

You can use the [:punct:] character class, which corresponds to

[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]

and negate it:

$ sed '/[[:punct:]]/!d' infile
111.222.123.234
222.11.34.54
www.facebook.com
www.stackoverflow.com
random@email.com

Or, instead of the negated match, negate the character class directly:

sed '/[^[:punct:]]/d'

Or don't print anything unless a line does contain a punctuation character:

sed -n '/[[:punct:]]/p'

Or use grep instead of sed:

grep '[[:punct:]]' infile


Related Topics



Leave a reply



Submit