sed: remove whole words containg a character class
Using awk
:
s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
awk '{ofs=""; for (i=1; i<=NF; i++) if ($i ~ /^[[:alpha:]]+$/)
{printf "%s%s", ofs, $i; ofs=OFS} print ""}' <<< "$s"
ok
This awk
command loops through all words and if word matches the regex /^[[:alpha:]]+$/
then it writes to standard out. (i<NF)?OFS:RS
is a short cut to add OFS
if current field no is less than NF
otherwise it writes RS
.
Using grep
+ tr
together:
s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
r=$(grep -o '[^ ]\+' <<< "$s"|grep '^[[:alpha:]]\+$'|tr '\n' ' ')
echo "$r"
ok
First grep -o
breaks the string into individual words. 2nd grep only searches for words with alphabets only. ANd finally tr
translates \n
to space.
Remove words starting with _ in file using sed in bash
try this
sed -ie 's/_[A-Za-z0-9]* / /g' here.txt
Remove one-character words
You have to use word boundary \b
(or) \<
and \>
respectively match the empty string at the beginning and end of a word.
echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\b\w\b \?//g'
(OR)
echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\<.\> \?//g'
sed: removing alphanumeric words from a file
If you want to remove all words that consist of letters and digits, leaving only words that consist of all digits or all letters:
sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile
Example:
$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222
remove data between two word with sed
- As
sed
cannot parse xml files, there are many cases thatsed
does
not work well (e.g. tags within a comment tag). - As
sed
regex does not support the non-greedy match, we need to
consider about workarounds.
Based on the above, would you please try:
sed $'s/<tag1>/&\\\n/g' input | sed '/<tag1>/,/<\/tag1>/d'
Output:
data1
data2
data3
data4
data5
The first sed
just puts a line break after the <tag1>
.
Although it works for the provided example, please note there are
many cases it doesn't work well (e.g. </tag1>
is missing).
sed delete match within quotes on line containing several quotes
You need to use a negated character class [^"]*
which matches any character but not of "
zero or more times. .*
in your regex is greedy by default, it eats all the characters upto the last "
double quotes. So that only it matches Stacey
and upto the last Ford
. And also you must need to add a word boundary \b
before the NAME
, so that it won't match the string NAME
in SURNAME
. \b
matches between a word character and a non-word character.
sed 's/\bNAME="[^"]*"/NAME="Jack"/g' names.xml
Sed command to remove all lines not containing punctuation
You can use the [:punct:]
character class, which corresponds to
[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]
and negate it:
$ sed '/[[:punct:]]/!d' infile
111.222.123.234
222.11.34.54
www.facebook.com
www.stackoverflow.com
random@email.com
Or, instead of the negated match, negate the character class directly:
sed '/[^[:punct:]]/d'
Or don't print anything unless a line does contain a punctuation character:
sed -n '/[[:punct:]]/p'
Or use grep instead of sed:
grep '[[:punct:]]' infile
Related Topics
Bash Linux - Massive Folder Rename
What Is The 'Tr' Command in Windows
Does Zgrep Unzip a File Before Searching
Flutter PDF Viewer for Linux Desktop
How to Delay Pipe Netcat to Connect on First Input
Get Apache Total CPU Usage in (Linux)
How to Use Xdotool to Enter a Web Console Command
How to Have Chef Reload Global Path
Restoring System Directories Permissions
How to Script Multiple Ssh and Scp Commands to Various Systems
Ssh Environment Variable for Sudo Access
Complete Password Field Scp Command on Linux
Cannot Kill Redis-Server on Linux
Perl Script to Capture Stderr and Stdout of Command Executed in Back-Quotes
Accessing Any Memory Locations Under Linux 2.6.X