How to Remove All Special Characters in Linux Text

How to remove all special characters in Linux text

Remove everything except the printable characters (character class [:print:]), with sed:

sed $'s/[^[:print:]\t]//g' file.txt

[:print:] includes:

  • [:alnum:] (alpha-numerics)
  • [:punct:] (punctuations)
  • space

The ANSI C quoting ($'') is used for interpreting \t as literal tab inside $'' (in bash and alike).

Remove all special characters and case from string in bash

cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'

The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.

The second one translates uppercase characters to lowercase.

Removing all special characters from a string in Bash

You can use tr to print only the printable characters from a string like below. Just use the below command on your input file.

tr -cd "[:print:]\n" < file1   

The flag -d is meant to the delete the character sets defined in the arguments on the input stream, and -c is for complementing those (invert what's provided). So without -c the command would delete all printable characters from the input stream and using it complements it by removing the non-printable characters. We also keep the newline character \n to preserve the line endings in the input file. Removing it would just produce the final output in one big line.

The [:print:] is just a POSIX bracket expression which is a combination of expressions [:alnum:], [:punct:] and space. The [:alnum:] is same as [0-9A-Za-z] and [:punct:] includes characters ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

how to remove special characters using sed

If you want sed to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.

The "C" locale only considers the basic character set, i.e. only [A-Za-z] are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):

echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C

Output:

00000000  41 0a
00000002

Remove special Characters from a specific field

Assuming you want to remove all characters that are not upper case or lower case letters or digits ([A-Za-z0-9]) from the last field of every line you can use

awk -F '|' -v 'OFS=|' '{ gsub(/[^A-Za-z0-9]/,"",$NF); print}' inputfile > outputfile

From the input line in the question this creates exactly the requested output line.

How to remove special characters from text file

Remove characters that are not within the ascii table (11,12,40-176)

Sample Image

\11 = tab

\12 = new line

\40-176 = ( to ~ this range includes all letters and symbols present in the keyboard

cat test.txt | tr -cd '\11\12\40-\176' > temp && mv temp test.txt

NOTE: If your data has special characters that are not in the ascii table, they might be removed as well

Remove all special characters even 'éèô' from string

The simplest would be to run the command with the C locale:

echo "SamPlE_@tExT%, reééééally ?" | LANG=C sed 's/[^a-zA-Z]//g'

Output:

SamPlEtExTreally

how to remove the special characters from a variable using shell

The following solution uses the tr command:

$ str=`echo '"#$hello,)&^this I!s> m@ani: /& "'`
$ echo $str | tr -cd "[:alnum:]\"\n"
"hellothisIsmani"

All letters and digits, all " and new lines are allowed. If you want more or less characters to be allowed change the command.



Related Topics



Leave a reply



Submit