How to Search for Files Containing Dos Line Endings (Crlf) with Grep on Linux

How do you search for files containing DOS line endings (CRLF) with grep on Linux?

grep probably isn't the tool you want for this. It will print a line for every matching line in every file. Unless you want to, say, run todos 10 times on a 10 line file, grep isn't the best way to go about it. Using find to run file on every file in the tree then grepping through that for "CRLF" will get you one line of output for each file which has dos style line endings:

find . -not -type d -exec file "{}" ";" | grep CRLF

will get you something like:

./1/dos1.txt: ASCII text, with CRLF line terminators
./2/dos2.txt: ASCII text, with CRLF line terminators
./dos.txt: ASCII text, with CRLF line terminators

Find files with at least one CR LF

You can use this grep command to list all the files in a directory with at least one CR-LF:

grep -l $'\r$' *

Pattern $'\r$' will file \r just before end of each line.

Or using hex value:

grep -l $'\x0D$' *

Where \x0D will find \r (ASCII: 13).

How does grep handle DOS end of line?

In this case grep really matches the string "aline\r" but you just don't see it because it was overwritten by the ANSI sequence that prints color. Pass the output to od -c and you'll see


$ grep aline file.txt
aline
$ grep aline$'\r' file.txt

$ grep aline$'\r' --color=never file.txt
aline
$ grep aline$'\r' --color=never file.txt | od -c
0000000 a l i n e \r \n
0000007
$ grep aline$'\r' --color=always file.txt | od -c
0000000 033 [ 0 1 ; 3 1 m 033 [ K a l i n e
0000020 \r 033 [ m 033 [ K \n
0000030

With --color=never you can see the output string because grep doesn't print out the color. \r simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by default grep will check whether it's running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then print \n clears the rest of the line

To match \n you can use the -z option to make null bytes the line separator


$ grep -z aline$'\r'$'\n' --color=never file.txt
aline
$ grep -z aline$'\r'$'\n' --color=never file.txt | od -c
0000000 a l i n e \r \n \0
0000010
$ grep -z aline$'\r'$'\n' --color=always file.txt | od -c
0000000 033 [ 0 1 ; 3 1 m 033 [ K a l i n e
0000020 \r 033 [ m 033 [ K \n \0
0000031

Your last command grep aline$'\n' file.txt works because \n is simply a word separator in bash, so the command is just the same as grep aline file.txt. Exactly the same thing happened in the 3rd line: grep aline$'\r'$'\n' file.txt To pass a newline you must quote the argument to prevent word splitting


$ echo "aline" | grep -z "aline$(echo $'\n')"
aline

To demonstrate the effect of the quote with the 3rd line I added another line to the file


$ cat file.txt
aline
another line
$ grep -z "aline$(echo $'\n')" file.txt | od -c
0000000 a l i n e \r \n a n o t h e r l
0000020 i n e \n \0
0000025
$ grep -z "aline$(echo $'\n')" file.txt
aline
another line
$

How to find out line-endings in a text file?

You can use the file utility to give you an indication of the type of line endings.

Unix:

$ file testfile1.txt
testfile.txt: ASCII text

"DOS":

$ file testfile2.txt
testfile2.txt: ASCII text, with CRLF line terminators

To convert from "DOS" to Unix:

$ dos2unix testfile2.txt

To convert from Unix to "DOS":

$ unix2dos testfile1.txt

Converting an already converted file has no effect so it's safe to run blindly (i.e. without testing the format first) although the usual disclaimers apply, as always.

Grep to find a pattern and replace in same line

Why are you using grep at all? Sed does pattern matching:

sed -e 's/btn-primary\(.*{.*Save\)/btn-primary Save\1/g'

or:

sed -e 's/\(btn-primary\)\(.*{.*Save\)/\1 Save\2/g'

If you are using grep to try to trim down the number of files that sed will operate on, you're fooling yourself if you believe that is more efficient. By doing that, you will read every file that doesn't match only once, but every file that does match will be read twice. If you only use sed, every file will be read only once.

how to check end-of-line of a text file to see if it is unix or dos format?


if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
echo "is DOS"
fi

Using bash to list files with a certain combination of characters

It sounds like you want grep -l, which will list the files that contain a particular string. You can also just pass the filename arguments directly to grep and skip cat.

grep -l "desiredString" *


Related Topics



Leave a reply



Submit