Delete the Word Whose Length Is Less Than 2 in Bash

Delete the word whose length is less than 2 in bash

Edit (based on ennuikiller's sed answer)

Using pure Bash:

newstring=${exmple// ? / }   # remove one character words

To normalize the whitespace:

read newstring <<< $newstring

shopt -s extglob
newstring=${newstring//+( )/ }

Original:

exmple="This is a lovey 7 words   string"
for word in $exmple
do
    if (( ${#word} >= 2 ))
    then
        newstring+=$sp$word
        sp=' '
    fi
done

Remove Words Shorter Than 4 Characters Using Linux

$ echo Here is an example line of fantastic data | sed -E 's/\b\(\w\)\{,3\}\b\s*//g'
Here is an example line of fantastic data

How to remove last n characters from a string in Bash?

First, it's usually better to be explicit about your intent. So if you know the string ends in .rtf, and you want to remove that .rtf, you can just use var2=${var%.rtf}. One potentially-useful aspect of this approach is that if the string doesn't end in .rtf, it is not changed at all; var2 will contain an unmodified copy of var.

If you want to remove a filename suffix but don't know or care exactly what it is, you can use var2=${var%.*} to remove everything starting with the last .. Or, if you only want to keep everything up to but not including the first ., you can use var2=${var%%.*}. Those options have the same result if there's only one ., but if there might be more than one, you get to pick which end of the string to work from. On the other hand, if there's no . in the string at all, var2 will again be an unchanged copy of var.

If you really want to always remove a specific number of characters, here are some options.

You tagged this bash specifically, so we'll start with bash builtins. The one which has worked the longest is the same suffix-removal syntax I used above: to remove four characters, use var2=${var%????}. Or to remove four characters only if the first one is a dot, use var2=${var%.???}, which is like var2=${var%.*} but only removes the suffix if the part after the dot is exactly three characters. As you can see, to count characters this way, you need one question mark per unknown character removed, so this approach gets unwieldy for larger substring lengths.

An option in newer shell versions is substring extraction: var2=${var:0:${#var}-4}. Here you can put any number in place of the 4 to remove a different number of characters. The ${#var} is replaced by the length of the string, so this is actually asking to extract and keep (length - 4) characters starting with the first one (at index 0). With this approach, you lose the option to make the change only if the string matches a pattern; no matter what the actual value of the string is, the copy will include all but its last four characters.

You can leave the start index out; it defaults to 0, so you can shorten that to just var2=${var::${#var}-4}. In fact, newer versions of bash (specifically 4+, which means the one that ships with MacOS won't work) recognize negative lengths as the index of the character to stop at, counting back from the end of the string. So in those versions you can get rid of the string-length expression, too: var2=${var::-4}.

If you're not actually using bash but some other POSIX-type shell, the pattern-based suffix removal with % will still work – even in plain old dash, where the index-based substring extraction won't. Ksh and zsh do both support substring extraction, but require the explicit 0 start index; zsh also supports the negative end index, while ksh requires the length expression. Note that zsh, which indexes arrays starting at 1, nonetheless indexes strings starting at 0 if you use this bash-compatible syntax. But zsh also allows you to treat scalar parameters as if they were arrays of characters, in which case the substring syntax uses a 1-based count and places the start and (inclusive) end positions in brackets separated by commas: var2=$var[1,-5].

Instead of using built-in shell parameter expansion, you can of course run some utility program to modify the string and capture its output with command substitution. There are several commands that will work; one is var2=$(sed 's/.\{4\}$//' <<<"$var").

Remove a fixed prefix/suffix from a string in Bash

$ prefix="hell"
$ suffix="ld"
$ string="hello-world"
$ foo=${string#"$prefix"}
$ foo=${foo%"$suffix"}
$ echo "${foo}"
o-wor

This is documented in the Shell Parameter Expansion section of the manual:

${parameter#word}

${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the # case) or the longest matching pattern (the ## case) deleted. […]
${parameter%word}

${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted. […]

Remove words that are less than two characters long AND don't contain a vowel

Could you please try following.

awk '
{
  val=""
  for(i=1;i<=NF;i++){
    if($i!~/[aieou]/ && length($i)<2){ a="" }
    else{ val=(val?val OFS:"")$i            }
  }
  print val
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                             ##Starting an awk program from here.
{
  val=""                                          ##Nullifying val value here.
  for(i=1;i<=NF;i++){                             ##Starting a for loop from here.
    if($i!~/[aieou]/ && length($i)<2){ a="" }     ##Checking condition if field is NOT containing any vowels and length is lesser than 2 then do nothing.
    else{ val=(val?val OFS:"")$i            }     ##Else(in case above condition is FALSE) create val which contains current field value.
  }
  print val                                       ##Printing val here.
}
' Input_file                                      ##Mentioning Input_file name here.

How can I remove all text after a character in bash?

An example might have been useful, but if I understood you correctly, this would work:

echo "Hello: world" | cut -f1 -d":"

This will convert Hello: world into Hello.

Remove all words bigger than 6 characters using sed

This should do the trick to remove words containing more than six letters - if you define a word as being made up of letters A-Z and a-z:

sed -e s'/[A-Za-z]\{7,\}//g'

delete all the files having similar pattern whose date is less than particular date

A first naive approach to the problem, tweek it to your needs:

find . | awk -F'_' '$3<20130125' | xargs rm

To prevent find from doing a recursive search and just stay in current folder:

find . \( ! -name . -prune \) -type f | ...

2nd update:

Add the name parameter to only list files that contains the string "EXPORT_v1x0"

find . \( ! -name . -prune \) -type f -name "EXPORT_v1x0*" | ...

Simpler way to make find non-recursive is to use the maxdepth flag

find . -maxdepth 1 -type f -name "EXPORT_v1x0*" | ...

remove lines when string has certain length with awk or sed

This awk cmd does the job:

 awk '{a[NR]=$0}
    END{for(i=2;i<=NR;i+=4)
            if(length(a[i])==9)
                p[i-1]=p[i]=p[i+1]=p[i+2]=1
        for(x=1;x<=NR;x++)
                if(p[x])print a[x]}' file

The idea is save all lines in an array, and check the interesting line, and decide if the "block" should be printed or not.

test with your example:

kent$  cat f
A1
NNNNNNNNN
A3
A4
B1
NNNNNNN
B3
B4
C1
NNNNNNNNN
C3
C4

kent$  awk '{a[NR]=$0}
        END{for(i=2;i<=NR;i+=4)
                        if(length(a[i])==9)
                                p[i-1]=p[i]=p[i+1]=p[i+2]=1
                for(x=1;x<=NR;x++)
                        if(p[x])print a[x]}' f
A1
NNNNNNNNN
A3
A4
C1
NNNNNNNNN
C3
C4

how to remove words of specific length in a string in R?

Try this:

gsub('\\b\\w{1,2}\\b','',str)
[1] "hello  have  nice day"

EDIT
\b is word boundary. If need to drop extra space,change it as:

gsub('\\b\\w{1,2}\\s','',str)

gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)

Delete the Word Whose Length Is Less Than 2 in Bash