remove ^M characters from file using sed
Use tr
:
tr -d '^M' < inputfile
(Note that the ^M
character can be input using Ctrl+VCtrl+M)
EDIT: As suggested by Glenn Jackman, if you're using bash
, you could also say:
tr -d $'\r' < inputfile
how to remove ^@ from text files in unix?
The ^@
that you're seeing isn't a literal string. It's an escape code for a NUL (character value 0). If you want to remove them all:
tr -d '\0' <test.txt >newfile.txt
To help diagnose this sort of thing, the od
(octal dump) utility is handy. I ran this on the test file you linked, to confirm that they were NULs:
$ od -c test.txt | head
0000000 \0 A \0 i \0 r \0 Q \0 u \0 a \0 l \0 i
0000020 \0 t \0 y \0 S \0 t \0 a \0 t \0 i \0 o
0000040 \0 n \0 E \0 o \0 I \0 C \0 o \0 d \0 e
0000060 \0 \n \0 D \0 E \0 H \0 E \0 0 \0 4 \0 4
*
0000400 \0 \n \0 D \0 E \0 H \0 E \0 0 \0 4 \0
0000420 4 \0 \n \0 D \0 E \0 H \0 E \0 0 \0 4 \0
*
0422160 4 \0 \n \n
0422164
How can I remove special characteres from expect?
You could use the bash tr
utility. From the man
page
NAME
tr -- translate characters
DESCRIPTION
The tr utility copies the standard input to the standard output with sub-
situation or deletion of selected characters.
SYNOPSIS
tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
-C Complement the set of characters in string1, that is ``-C ab''
includes every character except for `a' and `b'.
-c Same as -C but complement the set of values in string1.
-d Delete characters in string1 from the input.
To Strip out non-printable characters from file1.
tr -cd "[:print:]\n" < file1 # This is all you need.
Remove carriage return in Unix
I'm going to assume you mean carriage returns (CR, "\r"
, 0x0d
) at the ends of lines rather than just blindly within a file (you may have them in the middle of strings for all I know). Using this test file with a CR at the end of the first line only:
$ cat infile
hello
goodbye
$ cat infile | od -c
0000000 h e l l o \r \n g o o d b y e \n
0000017
dos2unix
is the way to go if it's installed on your system:
$ cat infile | dos2unix -U | od -c
0000000 h e l l o \n g o o d b y e \n
0000016
If for some reason dos2unix
is not available to you, then sed
will do it:
$ cat infile | sed 's/\r$//' | od -c
0000000 h e l l o \n g o o d b y e \n
0000016
If for some reason sed
is not available to you, then ed
will do it, in a complicated way:
$ echo ',s/\r\n/\n/
> w !cat
> Q' | ed infile 2>/dev/null | od -c
0000000 h e l l o \n g o o d b y e \n
0000016
If you don't have any of those tools installed on your box, you've got bigger problems than trying to convert files :-)
How do I find and remove emojis in a text file?
2020 UPDATE: Perl v5.32 uses Unicode 13 and supports several properties that deal with emoji. You can simple use the Emoji
property:
#!perl
use v5.32;
use utf8;
use open qw(:std :utf8);
while( <<>> ) { # double diamond (from v5.26)
s/\p{Emoji}//g;
print;
}
As a one-liner, this turns into:
% perl -CS -pe 's/\p{Emoji}//g' file1 file2 ...
Character classes for older Perls
In Perl, removing the emojis can be this easy. At its core, this is very close to what you'd do it sed. Update the pattern and other details for your task:
#!perl
use utf8;
use open qw(:std :utf8);
my $pattern = "[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]";
while( <DATA> ) { # use <> to read from command line
s/$pattern//g;
print;
}
__DATA__
Emoji at end br> Emoji at beginning
Emoji in middle
UTS #51 mentions an Emoji property, but it's not listed in perluniprop. Were there such a thing, you would simplify that removing anything with that property:
while( <DATA> ) {
s/\p{Emoji}//g;
print;
}
There is the Emoticon
property, but that doesn't cover your character class. I haven't looked to see if it would be the same as the Emoji property in UTS #51.
User-defined Unicode properties
You can make your own properties by defining a subroutine that begins is In
or Is
followed by the property name you choose. That subroutine returns a potentially multi-lined string where each line is either a single hex code number or two hex code numbers separated by horizontal whitespace. Any character in all of that is then part of your property.
Here's that same character class as a user-defined Unicode property. Note that I use the squiggly heredoc, mostly because I can write the program locally with leading space so I can paste directly into StackOverflow. The lines in IsEmoji
cannot have leading space, though, but the indented heredoc takes care of that:
#!perl
use v5.26; # for indented heredoc
use utf8;
use open qw(:std :utf8);
while( <DATA> ) { # use <> to read from command line
s/\p{IsEmoji}//g;
print;
}
sub IsEmoji { <<~"HERE";
1f300 1f5ff
1f900 1f9ff
1f600 1f64f
1f680 1f6ff
2600 26ff
2700 27bf
1f1e6 1f1ff
1f191 1f251
1f004 1f0cf
1f170 1f171
1f17e 1f17f
1f18e
3030
2b50
2b55
2934 2935
2b05 2b07
2b1b 2b1c
3297
3299
303d
00a9
00ae
2122
23f3
24c2
23e9 23ef
25b6
23f8 23fa
HERE
}
__DATA__
Emoji at end br> Emoji at beginning
Emoji in middle
You can put that in a module:
# IsEmoji.pm
sub IsMyEmoji { <<~"HERE";
1f300 1f5ff
... # all that other stuff too
23f8 23fa
HERE
}
1;
Now you can use that in a one liner (the -I.
adds the current directory to the module search path and the -M
denotes a module to load):
$ perl -CS -I. -MIsEmoji -pe 's/\p{IsEmoji}//g' file1 file2
Beyond that, you're stuck with the long character class in your one-liner.
Related Topics
What Is File Hole and How Can It Be Used
The Address Where Filename Has Been Loaded Is Missing [Gdb]
Bash Script to Get All Ip Addresses
Hook into Linux Key Event Handling
Bash Print Stderr Only, Not Stdout
Using a User's .Bashrc in a Systemd Service
How to Get Ec2 Load Balancing Properly Set Up to Allow for Real Time File Syncing
Get a Nanosecond-Precise Atime, Mtime, Ctime Fields for File (Stat)
How to Overcome an Incompatibility Between the Ksh on Linux VS. That Installed on Aix/Solaris/Hpux
Module Compiling:Asm/Linkage.H File Not Found
How to Trace Per-File Io Operations in Linux
Is It Safe to Issue Blocking Write() Calls on the Same Tcp Socket from Multiple Threads
What Tools Do I Need to Develop in Actionscript (In Linux)
How to Replace Single Quotes with Another Character in Sed
Searching Multiple Patterns (Words) with Ack