Remove Repeating and Control Characters in sed
See "limiting repetition" from this site: http://www.regular-expressions.info/repeat.html
An actual script, as inspired by chown
and that site:
sed 's/\([a-zA-Z]\)\1\+/\1/g'
However, you won't be able to get HELLO
, you would only get HELO
. A regex is not sophisticated enough to determine that there should be 2 L's. For that, you would need to match the word to a dictionary. Though, you could use the regex for that ... H+E+L+O+
. . .
For the control characters, \0xx
will match arbitrary ASCII characters. You'll have to look up what ^H
represents.
How to correctly use sed to remove characters from file? - Invalid back reference
A better way of removing all characters with the 8th bit set is
tr -d '\200-\377' m.txt > m-no-8bit.txt
Remove some character in the start and in the end using sed
You can use sed
:
sed -n 's/.*\[profile *\([^][]*\).*/\1/p' ~/.aws/config
Details:
-n
- suppress default line output.*\[profile *\([^][]*\).*/
- find any text,[profile
, zero or more spaces, then capture into Group 1 any zero or more chars other than[
and]
, and then match the rest of the text\1
- replace with Group 1 valuep
- print the result of the substitution.
See an online demo:
s='[profile gateway]
[profile personal]
[profile DA]
[profile CX]'
sed -n 's/.*\[profile *\([^][]*\).*/\1/p' <<< "$s"
Output:
gateway
personal
DA
CX
With a GNU grep
grep -oP '(?<=\[profile )[^]]+' ~/.aws/config
The (?<=\[profile )[^]]+
regex matches a location that is immediately preceded with profile
string and then matches one or more chars other than ]
. -o
option makes grep
extract the matches only and P
enables the PCRE regex syntax.
With awk
You may also use awk
:
awk '/^\[profile .*]$/{print substr($2, 0, length($2)-1)}' ~/.aws/config
It will find all lines that start with [profile
, and oputput the second field without the last char (that is a ]
char that will get omitted).
using sed to copy lines and delete characters from the duplicates
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input
file below:
$ cat input
@"Afghanistan.png",
@"Albania.png",
@"Algeria.png",
@"American_Samoa.png",
you should use this command:
sed 's/@"\([^.]*\)\.png",/&\
@"\1",/' input
The result:
$ sed 's/@"\([^.]*\)\.png",/&\
@"\1",/' input
@"Afghanistan.png",
@"Afghanistan",
@"Albania.png",
@"Albania",
@"Algeria.png",
@"Algeria",
@"American_Samoa.png",
@"American_Samoa",
This commands is just a replacement command (s///
). It matches anything starting with @"
followed by non-period chars ([^.]*
) and then by .png",
. Also, it matches all non-period chars before .png",
using the group brackets \(
and \)
, so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
@"\([^.]*\)\.png",
So follows the replacement part of the command. The &
command just inserts everything that was matched by @"\([^.]*\)\.png",
in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the &
there is a newline character - represented by the backslash \
followed by an actual newline - and in the new line we add the @"
string followed by the content of the first group (\1
) and then the string ",
.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n
string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/@"\([^.]*\)\.png",/&\n@"\1",/' input
use sed to match on string but not remove anything past any number of specific characters then a character
You can use
sed -i -E 's/(setting01 = )("[^"]*"|[^[:space:]]+)/\11/g' conf_file.txt
Details:
-E
- enable POSIX ERE regex syntax(setting01 = )
- Group 1:setting01 =
("[^"]*"|[^[:space:]]+)
- Group 2:"[^"]*"
- a"
, then zero or more chars other than"
and then a"
|
- or[^[:space:]]+
- one or more non-whitespace chars
The \11
replaces the match with Group 1 value and 1
(as there can be no more than \9
backreferences in a POSIX regex, \11
is not parsed by sed
as the 11th backreference).
See the online demo:
#!/bin/bash
s='setting01 = 0 # Comment for setting 01
setting02 = 1 # Comment for setting 02
setting03 = "./folder" # Comment for setting 03
setting04 = "string" # Comment for setting 04
setting05 = 1 # Comment for setting 05'
sed -E 's/(setting01 = )("[^"]*"|[^[:space:]]+)/\11/g' <<< "$s"
Output:
setting01 = 1 # Comment for setting 01
setting02 = 1 # Comment for setting 02
setting03 = "./folder" # Comment for setting 03
setting04 = "string" # Comment for setting 04
setting05 = 1 # Comment for setting 05
How to remove duplicated characters from string in Bash?
Can you use awk
?
awk -v FS="" '{
for(i=1;i<=NF;i++)str=(++a[$i]==1?str $i:str)
}
END {print str}' <<< "cabbagee"
cabge
Couple of other ways:
gnu awk
:
awk -v RS='[a-z]' '{str=(++a[RT]==1?str RT: str)}END{print str}' <<< "cabbagee"
cabge
awk -v RS='[a-z]' -v ORS= '++a[RT]==1{print RT}END{print "\n"}' <<< "cabbagee"
cabge
gnu sed
and awk
:
sed 's/./&\n/g' <<< "cabbagee" | awk '!a[$1]++' | sed ':a;N;s/\n//;ba'
cabge
Removing duplicate words using sed
If you just want to get the first column and the last three, you can use the following awk
one-liner:
awk '{$2=$(NF-2); $3=$(NF-1); $4=$NF; NF=4}1' file
It returns:
410011515534576 923000720575 10.225.4.236 CokeVPN
410011515534579 923000720578 10.225.4.239 CokeVPN
410018137112489 923054440014 10.225.1.212 CokeVPN
It resets the line by setting the 2nd parameter as the pe-penultimate, 3rd as penultimate and 4th and last as the last one. Then 1
triggers the default action for awk
: {print $0}
.
To be sure you don't screw other lines, you can add a condition: do this just if the number of fields is bigger or equal to 4:
awk 'NF>=4{$2=$(NF-2); $3=$(NF-1); $4=$NF; NF=4}1' file
match repeated character in sed on mac
If slurping the whole file is acceptable:
perl -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
Where you should replace \n
with whatever newline sequence is appropriate.
-0777
tells perl to not break each line into its own record, which allows a regex that works across lines to function.
If you are satisfied with the result, -i
causes perl to replace the file in-place rather than output to stdout:
perl -i -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
You can also do as so: -i~
to create a backup file with the given suffix (~
in this case).
If slurping the whole file is not acceptable:
perl -ne 'if (/^$/) {$i++}else{$i=0}print if $i<3' newlines.txt
This prints any line that is not the third (or higher) consecutive empty line. -i
works with this the same.
ps--MacOS comes with perl installed.
How to delete duplicate lines in a file without sorting it in Unix
awk '!seen[$0]++' file.txt
seen
is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.
AWK evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
Related Topics
How to Determinate Destination MAC Address
Reusing a Port Number in a Udp
Preventing to Bash Script from Running in Parallel or Overlap Using Cron
Pass in Bash Terminal Variables to a Bash Script
Output Stdout and Stderr to File and Screen and Stderr to File in a Limited Environment
Linux: Differencebetween These Two Symbolic Link Commands
Visual Studio Code Doesnt Run on Ubuntu
Killing Linux Process by Piping the Id
Bash Syntax Error: Unaccepted Token Elif
How to Remove End Folder Name from a Path in Linux Script
Shell Script to Create Empty Files with All Possible Permissions
Call Printf System Subroutine to Output a Integer Number Error in Assembly Code
Ssh Times Out While Connecting via Ipv6 But Works with Ipv4
What Is an Absolute Pathname VS a Relative Pathname