Linux Shell Scripting: How to Remove Final Numbers in a Word List File

Remove the last line from a file in Bash

Using GNU sed:

sed -i '$ d' foo.txt

The -i option does not exist in GNU sed versions older than 3.95, so you have to use it as a filter with a temporary file:

cp foo.txt foo.txt.tmp
sed '$ d' foo.txt.tmp > foo.txt
rm -f foo.txt.tmp

Of course, in that case you could also use head -n -1 instead of sed.

MacOS:

On Mac OS X (as of 10.7.4), the equivalent of the sed -i command above is

sed -i '' -e '$ d' foo.txt

How to use sed to remove the last n lines of a file

I don't know about sed, but it can be done with head:

head -n -2 myfile.txt

Bash: remove numbers at the end of names.

Here's one way using bash parameter substitution:

for i in *.jpg; do mv "$i" "${i%-*}.jpg"; done

Or for the more general case (i.e. if you have other file extensions), try:

for i in *.*; do mv "$i" "${i%-*}.${i##*.}"; done

Results:

alien-skull.jpg
dead-space-album.jpg
snow-birds-red-arrows-thunderbirds-blue-angels.jpg

As per the comments below, try this bash script:

declare -A array

for i in *.*; do

j="${i%-*}.${i##*.}"

# k="$j"
# k="${i%-*}-0.${i##*.}"

for x in "${!array[@]}"; do

if [[ "$j" == "$x" ]]; then
k="${i%-*}-${array[$j]}.${i##*.}"
fi
done

(( array["$j"]++ ))

mv "$i" "$k"
done

Note that you will need to uncomment a value for k depending on how you would like to format the filenames. If you uncomment the first line, only the duplicate basenames will be incremented:

dead-space-album.jpg
dead-space-album-1.jpg
dead-space-album-2.jpg
dead-space-album-3.jpg

If you uncomment the second line, you'll get the following:

alien-skull-0.jpg
alien-skull-1.jpg
alien-skull-2.jpg
alien-skull-3.jpg

How to remove last n characters from a string in Bash?

First, it's usually better to be explicit about your intent. So if you know the string ends in a .rtf that you want to remove, you can just use var2=${var%.rtf}. One potentially-useful aspect of this approach is that if the string doesn't end in .rtf, it is not changed at all; var2 will contain an unmodified copy of var.

If you want to remove a filename suffix but don't know or care exactly what it is, you can use var2=${var%.*} to remove everything starting with the last .. Or, if you only want to keep everything up to but not including the first ., you can use var2=${var%%.*}. Those options have the same result if there's only one . in the string, but if there might be more than one, you get to pick which end of the string to work from. On the other hand, if there's no . in the string at all, var2 will again be an unchanged copy of var.

If you really want to always remove a specific number of characters, here are some options.

You tagged this bash specifically, so we'll start with bash builtins. The one which has worked the longest is the same suffix-removal syntax I used above: to remove four characters, use var2=${var%????}. Or to remove four characters only if the first one is a dot, use var2=${var%.???}, which is like var2=${var%.*} but only removes the suffix if the part after the dot is exactly three characters. As you can see, to count characters this way, you need one question mark per unknown character removed, so this approach gets unwieldy for larger substring lengths.

An option in newer shell versions is substring extraction: var2=${var:0:${#var}-4}. Here you can put any number in place of the 4 to remove a different number of characters. The ${#var} is replaced by the length of the string, so this is actually asking to extract and keep (length - 4) characters starting with the first one (at index 0). With this approach, you lose the option to make the change only if the string matches a pattern. As long as the string has at least four characters, no matter what its actual value is, the copy will include all but its last four characters.

You can leave the start index out; it defaults to 0, so you can shorten that to just var2=${var::${#var}-4}. In fact, newer versions of bash (specifically 4+, which means the one that ships with MacOS won't work) recognize negative lengths as the index of the character to stop at, counting back from the end of the string. So in those versions you can get rid of the string-length expression, too: var2=${var::-4}. This interpretation is also triggered if you leave the string length in but the string is shorter than four characters, since then ${#var}-4 is negative. For example, if the string has three characters, ${var:0:${#var}-4} becomes ${var:0:-1} and removes only the last character.

If you're not actually using bash but some other POSIX-type shell, the pattern-based suffix removal with % will still work – even in plain old dash, where the index-based substring extraction won't. Ksh and zsh do both support substring extraction, but require the explicit 0 start index; zsh also supports the negative end index, while ksh requires the length expression. Note that zsh, which indexes arrays starting at 1, nonetheless indexes strings starting at 0 if you use this bash-compatible syntax. But zsh also allows you to treat scalar parameters as if they were arrays of characters, in which case the substring syntax uses a 1-based count and places the start and (inclusive) end positions in brackets separated by commas: var2=$var[1,-5].

Instead of using built-in shell parameter expansion, you can of course run some utility program to modify the string and capture its output with command substitution. There are several commands that will work; one is var2=$(sed 's/.\{4\}$//' <<<"$var").

How to delete certain words from files with shell script?

the while loop condition should check if there are no arguments, and if there are it should continue. So the right form would be

while [ "$#" -ne 0 ]; do

Now, what you really want is to get each argument and do something with it. That automatically implies a for loop. So what you really should be doing is

for file in $@; do

Now, a file can have spaces in it, so getting that filename and checking if it is really a file should be quoted, and you should also check for existance first

for file in "$@"; do
if [ ! -e "$file" ]; then
printf "file doesn't exist: %s\n" "$file"
continue;
fi
if [ ! -f "$file" ]; then
printf "not a file: %s\n" "$file"
continue;
fi
sed -i "s/[^ ]*[0-9][^ ]*//g" "$file"
done

I could expand more on sed where the -i switch is restricted only to GNU sed. Apart from that you may also want to keep a backup of that file in case something goes wrong.

But this is another topic I guess.

Shell script - remove first and last quote () from a variable

There's a simpler and more efficient way, using the native shell prefix/suffix removal feature:

temp="${opt%\"}"
temp="${temp#\"}"
echo "$temp"

${opt%\"} will remove the suffix " (escaped with a backslash to prevent shell interpretation).

${temp#\"} will remove the prefix " (escaped with a backslash to prevent shell interpretation).

Another advantage is that it will remove surrounding quotes only if there are surrounding quotes.

BTW, your solution always removes the first and last character, whatever they may be (of course, I'm sure you know your data, but it's always better to be sure of what you're removing).

Using sed:

echo "$opt" | sed -e 's/^"//' -e 's/"$//'

(Improved version, as indicated by jfgagne, getting rid of echo)

sed -e 's/^"//' -e 's/"$//' <<<"$opt"

So it replaces a leading " with nothing, and a trailing " with nothing too. In the same invocation (there isn't any need to pipe and start another sed. Using -e you can have multiple text processing).

How to delete from a text file, all lines that contain a specific string?

To remove the line and print the output to standard out:

sed '/pattern to match/d' ./infile

To directly modify the file – does not work with BSD sed:

sed -i '/pattern to match/d' ./infile

Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:

sed -i '' '/pattern to match/d' ./infile

To directly modify the file (and create a backup) – works with BSD and GNU sed:

sed -i.bak '/pattern to match/d' ./infile

How can I remove the extension of a filename in a shell script?

You should be using the command substitution syntax $(command) when you want to execute a command in script/command.

So your line would be

name=$(echo "$filename" | cut -f 1 -d '.')

Code explanation:

  1. echo get the value of the variable $filename and send it to standard output
  2. We then grab the output and pipe it to the cut command
  3. The cut will use the . as delimiter (also known as separator) for cutting the string into segments and by -f we select which segment we want to have in output
  4. Then the $() command substitution will get the output and return its value
  5. The returned value will be assigned to the variable named name

Note that this gives the portion of the variable up to the first period .:

$ filename=hello.world
$ echo "$filename" | cut -f 1 -d '.'
hello
$ filename=hello.hello.hello
$ echo "$filename" | cut -f 1 -d '.'
hello
$ filename=hello
$ echo "$filename" | cut -f 1 -d '.'
hello


Related Topics



Leave a reply



Submit