Sed: Remove Whole Words Containg a Character Class

sed: remove whole words containg a character class

Using awk:

s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
awk '{ofs=""; for (i=1; i<=NF; i++) if ($i ~ /^[[:alpha:]]+$/)
         {printf "%s%s", ofs, $i; ofs=OFS} print ""}' <<< "$s"
ok

This awk command loops through all words and if word matches the regex /^[[:alpha:]]+$/ then it writes to standard out. (i<NF)?OFS:RS is a short cut to add OFS if current field no is less than NF otherwise it writes RS.

Using grep + tr together:

s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
r=$(grep -o '[^ ]\+' <<< "$s"|grep '^[[:alpha:]]\+$'|tr '\n' ' ')
echo "$r"
ok

First grep -o breaks the string into individual words. 2nd grep only searches for words with alphabets only. ANd finally tr translates \n to space.

Remove words starting with _ in file using sed in bash

try this

sed -ie 's/_[A-Za-z0-9]* / /g' here.txt

Remove one-character words

You have to use word boundary \b (or) \< and \> respectively match the empty string at the beginning and end of a word.

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\b\w\b \?//g'

(OR)

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\<.\> \?//g'

sed: removing alphanumeric words from a file

If you want to remove all words that consist of letters and digits, leaving only words that consist of all digits or all letters:

sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile

Example:

$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222

remove data between two word with sed

As sed cannot parse xml files, there are many cases that sed does
not work well (e.g. tags within a comment tag).
As sed regex does not support the non-greedy match, we need to
consider about workarounds.

Based on the above, would you please try:

sed $'s/<tag1>/&\\\n/g' input | sed '/<tag1>/,/<\/tag1>/d'

Output:

data1
data2
data3
data4
data5

The first sed just puts a line break after the <tag1>.

Although it works for the provided example, please note there are
many cases it doesn't work well (e.g. </tag1> is missing).

sed delete match within quotes on line containing several quotes

You need to use a negated character class [^"]* which matches any character but not of " zero or more times. .* in your regex is greedy by default, it eats all the characters upto the last " double quotes. So that only it matches Stacey and upto the last Ford. And also you must need to add a word boundary \b before the NAME, so that it won't match the string NAME in SURNAME . \b matches between a word character and a non-word character.

sed 's/\bNAME="[^"]*"/NAME="Jack"/g' names.xml

Sed command to remove all lines not containing punctuation

You can use the [:punct:] character class, which corresponds to

[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]

and negate it:

$ sed '/[[:punct:]]/!d' infile
111.222.123.234
222.11.34.54
www.facebook.com
www.stackoverflow.com
random@email.com

Or, instead of the negated match, negate the character class directly:

sed '/[^[:punct:]]/d'

Or don't print anything unless a line does contain a punctuation character:

sed -n '/[[:punct:]]/p'

Or use grep instead of sed:

grep '[[:punct:]]' infile

Sed: Remove Whole Words Containg a Character Class