Delete Repeated Characters Without Back-Referencing with Sed

Remove Repeating and Control Characters in sed

See "limiting repetition" from this site: http://www.regular-expressions.info/repeat.html

An actual script, as inspired by chown and that site:

sed 's/\([a-zA-Z]\)\1\+/\1/g' 

However, you won't be able to get HELLO, you would only get HELO. A regex is not sophisticated enough to determine that there should be 2 L's. For that, you would need to match the word to a dictionary. Though, you could use the regex for that ... H+E+L+O+ . . .

For the control characters, \0xx will match arbitrary ASCII characters. You'll have to look up what ^H represents.

How to correctly use sed to remove characters from file? - Invalid back reference

A better way of removing all characters with the 8th bit set is

tr -d '\200-\377' m.txt > m-no-8bit.txt

Remove some character in the start and in the end using sed

You can use sed:

sed -n 's/.*\[profile *\([^][]*\).*/\1/p' ~/.aws/config

Details:

  • -n - suppress default line output
  • .*\[profile *\([^][]*\).*/ - find any text, [profile, zero or more spaces, then capture into Group 1 any zero or more chars other than [ and ], and then match the rest of the text
  • \1 - replace with Group 1 value
  • p - print the result of the substitution.

See an online demo:

s='[profile gateway]
[profile personal]
[profile DA]
[profile CX]'
sed -n 's/.*\[profile *\([^][]*\).*/\1/p' <<< "$s"

Output:

gateway
personal
DA
CX

With a GNU grep

grep -oP '(?<=\[profile )[^]]+' ~/.aws/config

The (?<=\[profile )[^]]+ regex matches a location that is immediately preceded with profile string and then matches one or more chars other than ]. -o option makes grep extract the matches only and P enables the PCRE regex syntax.

With awk

You may also use awk:

awk '/^\[profile .*]$/{print substr($2, 0, length($2)-1)}' ~/.aws/config

It will find all lines that start with [profile , and oputput the second field without the last char (that is a ] char that will get omitted).

using sed to copy lines and delete characters from the duplicates

That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:

$ cat input 
@"Afghanistan.png",
@"Albania.png",
@"Algeria.png",
@"American_Samoa.png",

you should use this command:

sed 's/@"\([^.]*\)\.png",/&\
@"\1",/' input

The result:

$ sed 's/@"\([^.]*\)\.png",/&\
@"\1",/' input
@"Afghanistan.png",
@"Afghanistan",
@"Albania.png",
@"Albania",
@"Algeria.png",
@"Algeria",
@"American_Samoa.png",
@"American_Samoa",

This commands is just a replacement command (s///). It matches anything starting with @" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:

@"\([^.]*\)\.png",

So follows the replacement part of the command. The & command just inserts everything that was matched by @"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the @" string followed by the content of the first group (\1) and then the string ",.

This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:

sed 's/@"\([^.]*\)\.png",/&\n@"\1",/' input 

use sed to match on string but not remove anything past any number of specific characters then a character

You can use

sed -i -E 's/(setting01 = )("[^"]*"|[^[:space:]]+)/\11/g' conf_file.txt

Details:

  • -E - enable POSIX ERE regex syntax
  • (setting01 = ) - Group 1: setting01 =
  • ("[^"]*"|[^[:space:]]+) - Group 2:
    • "[^"]*" - a ", then zero or more chars other than " and then a "
    • | - or
    • [^[:space:]]+ - one or more non-whitespace chars

The \11 replaces the match with Group 1 value and 1 (as there can be no more than \9 backreferences in a POSIX regex, \11 is not parsed by sed as the 11th backreference).

See the online demo:

#!/bin/bash
s='setting01 = 0 # Comment for setting 01
setting02 = 1 # Comment for setting 02
setting03 = "./folder" # Comment for setting 03
setting04 = "string" # Comment for setting 04
setting05 = 1 # Comment for setting 05'

sed -E 's/(setting01 = )("[^"]*"|[^[:space:]]+)/\11/g' <<< "$s"

Output:

setting01 = 1          # Comment for setting 01
setting02 = 1 # Comment for setting 02
setting03 = "./folder" # Comment for setting 03
setting04 = "string" # Comment for setting 04
setting05 = 1 # Comment for setting 05

How to remove duplicated characters from string in Bash?

Can you use awk?

awk -v FS="" '{
for(i=1;i<=NF;i++)str=(++a[$i]==1?str $i:str)
}
END {print str}' <<< "cabbagee"
cabge

Couple of other ways:

gnu awk:

awk -v RS='[a-z]' '{str=(++a[RT]==1?str RT: str)}END{print str}' <<< "cabbagee"
cabge


awk -v RS='[a-z]' -v ORS= '++a[RT]==1{print RT}END{print "\n"}' <<< "cabbagee"
cabge

gnu sed and awk:

sed 's/./&\n/g' <<< "cabbagee" | awk '!a[$1]++' | sed ':a;N;s/\n//;ba'
cabge

Removing duplicate words using sed

If you just want to get the first column and the last three, you can use the following awk one-liner:

awk '{$2=$(NF-2); $3=$(NF-1); $4=$NF; NF=4}1' file

It returns:

410011515534576 923000720575 10.225.4.236 CokeVPN
410011515534579 923000720578 10.225.4.239 CokeVPN
410018137112489 923054440014 10.225.1.212 CokeVPN

It resets the line by setting the 2nd parameter as the pe-penultimate, 3rd as penultimate and 4th and last as the last one. Then 1 triggers the default action for awk: {print $0}.


To be sure you don't screw other lines, you can add a condition: do this just if the number of fields is bigger or equal to 4:

awk 'NF>=4{$2=$(NF-2); $3=$(NF-1); $4=$NF; NF=4}1' file

match repeated character in sed on mac

If slurping the whole file is acceptable:

perl -0777pe 's/(\n){3,}/\n\n/g' newlines.txt

Where you should replace \n with whatever newline sequence is appropriate.

-0777 tells perl to not break each line into its own record, which allows a regex that works across lines to function.

If you are satisfied with the result, -i causes perl to replace the file in-place rather than output to stdout:

perl -i -0777pe 's/(\n){3,}/\n\n/g' newlines.txt

You can also do as so: -i~ to create a backup file with the given suffix (~ in this case).

If slurping the whole file is not acceptable:

perl -ne 'if (/^$/) {$i++}else{$i=0}print if $i<3' newlines.txt

This prints any line that is not the third (or higher) consecutive empty line. -i works with this the same.

ps--MacOS comes with perl installed.

How to delete duplicate lines in a file without sorting it in Unix

awk '!seen[$0]++' file.txt

seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.

The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.



Related Topics



Leave a reply



Submit