Why can't you use cat to read a file line by line where each line has delimiters
The problem is not in cat
, nor in the for
loop per se; it is in the use of back quotes. When you write either:
for i in `cat file`
or (better):
for i in $(cat file)
or (in bash
):
for i in $(<file)
the shell executes the command and captures the output as a string, separating the words at the characters in $IFS
. If you want lines input to $i
, you either have to fiddle with IFS
or use the while
loop. The while
loop is better if there's any danger that the files processed will be large; it doesn't have to read the whole file into memory all at once, unlike the versions using $(...)
.
IFS='
'
for i in $(<file)
do echo "$i"
done
The quotes around the "$i"
are generally a good idea. In this context, with the modified $IFS
, it actually isn't critical, but good habits are good habits even so. It matters in the following script:
old="$IFS"
IFS='
'
for i in $(<file)
do
(
IFS="$old"
echo "$i"
)
done
when the data file contains multiple spaces between words:
$ cat file
abc 123, comma
the quick brown fox
jumped over the lazy dog
comma, comma
$
Output:
$ sh bq.sh
abc 123, comma
the quick brown fox
jumped over the lazy dog
comma, comma
$
Without the double quotes:
$ cat bq.sh
old="$IFS"
IFS='
'
for i in $(<file)
do
(
IFS="$old"
echo $i
)
done
$ sh bq.sh
abc 123, comma
the quick brown fox
jumped over the lazy dog
comma, comma
$
Read a file line by line assigning the value to a variable
The following reads a file passed as an argument line by line:
while IFS= read -r line; do
echo "Text read from file: $line"
done < my_filename.txt
This is the standard form for reading lines from a file in a loop. Explanation:
IFS=
(orIFS=''
) prevents leading/trailing whitespace from being trimmed.-r
prevents backslash escapes from being interpreted.
Or you can put it in a bash file helper script, example contents:
#!/bin/bash
while IFS= read -r line; do
echo "Text read from file: $line"
done < "$1"
If the above is saved to a script with filename readfile
, it can be run as follows:
chmod +x readfile
./readfile filename.txt
If the file isn’t a standard POSIX text file (= not terminated by a newline character), the loop can be modified to handle trailing partial lines:
while IFS= read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
done < "$1"
Here, || [[ -n $line ]]
prevents the last line from being ignored if it doesn't end with a \n
(since read
returns a non-zero exit code when it encounters EOF).
If the commands inside the loop also read from standard input, the file descriptor used by read
can be chanced to something else (avoid the standard file descriptors), e.g.:
while IFS= read -r -u3 line; do
echo "Text read from file: $line"
done 3< "$1"
(Non-Bash shells might not know read -u3
; use read <&3
instead.)
How to cat EOF a file containing code?
You only need a minimal change; single-quote the here-document delimiter after <<
.
cat <<'EOF' >> brightup.sh
or equivalently backslash-escape it:
cat <<\EOF >>brightup.sh
Without quoting, the here document will undergo variable substitution, backticks will be evaluated, etc, like you discovered.
If you need to expand some, but not all, values, you need to individually escape the ones you want to prevent.
cat <<EOF >>brightup.sh
#!/bin/sh
# Created on $(date # : <<-- this will be evaluated before cat;)
echo "\$HOME will not be evaluated because it is backslash-escaped"
EOF
will produce
#!/bin/sh
# Created on Fri Feb 16 11:00:18 UTC 2018
echo "$HOME will not be evaluated because it is backslash-escaped"
As suggested by @fedorqui, here is the relevant section from man bash
:
Here Documents
This type of redirection instructs the shell to read input from the
current source until a line containing only delimiter (with no
trailing blanks) is seen. All of the lines read up to that point are
then used as the standard input for a command.The format of here-documents is:
<<[-]word
here-document
delimiterNo parameter expansion, command substitution, arithmetic expansion,
or pathname expansion is performed on word. If any characters in word
are quoted, the delimiter is the result of quote removal on word, and
the lines in the here-document are not expanded. If word is
unquoted, all lines of the here-document are subjected to parameter
expansion, command substitution, and arithmetic expansion. In the
latter case, the character sequence\<newline>
is ignored, and\
must be used to quote the characters\
,$
, and`
.
How to get the part of a file after the first line that matches a regular expression
The following will print the line matching TERMINATE
till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n
disables default behavior of sed
of printing each line after executing its script on it, -e
indicated a script to sed
, /TERMINATE/,$
is an address (line) range selection meaning the first line matching the TERMINATE
regular expression (like grep) to the end of the file ($
), and p
is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE
till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/
is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE
regular expression, and d
is the delete command which delete the current line and skip to the next line. As sed
default behavior is to print the lines, it will print the lines after TERMINATE
to the end of input.
If you want the lines before TERMINATE
:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE
in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $
meaning the last line so the shell will not try to expand the $w
variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE
by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
- Variables (
$variablename
) enclosed insingle quotes
['
] won't "expand" but variables insidedouble quotes
["
] will. So, you have to change all thesingle quotes
todouble quotes
if they contain text you want to replace with a variable. - The
sed
ranges also contain a$
and are immediately followed by a letter like:$p
,$d
,$w
. They will also look like variables to be expanded, so you have to escape those$
characters with a backslash [\
] like:\$p
,\$d
,\$w
.
Using multiple delimiters in awk
The delimiter can be a regular expression.
awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file
Produces:
tc0001 tomcat7.1 demo.example.com
tc0001 tomcat7.2 quest.example.com
tc0001 tomcat7.5 www.example.com
Java delimiter while reading text file - regex/or not?
Use String#split
or Pattern#split
Method.
For example,
String[] list ="AB523:[joe, pierre][charlie][dogs,cat]".split("[:\\[\\]]+");
for(String s : list)
System.out.println(s);
how to split one line with customized separator and assign to variables in BASH?
I would suggest using shell arrays for storing individual field values and slightly different awk
for this:
IFS=$'\03' read -ra arr < <(awk -F'#\\$' -v OFS='\03' '{$1=$1}1' file)
# check array content
declare -p arr
declare -a arr='([0]="hah a" [1]="hehe" [2]="hoho")'
We are using control character \03
as output field separator and using same in IFS
to make read
split fields on \03
.
Alternatively you can use sed
instead of awk
also:
IFS=$'\03' read -ra arr < <(sed 's/#\$/\x03/g' file)
Using a NUL byte with BASH ver 4+
readarray -d $'\0' arr < <(
awk -F'#\\$' -v OFS='\0' '{ORS=OFS; $1=$1} 1' file)
Read txt file to pandas dataframe with unique delimiter and end of line
I guess as pointed out by matheubv there is no option to solve this with pd.read_csv
. However this can be easily fixed a few lines of codes. Just open the file (in the example sample.csv
) and parse it (use the string method .replace()
). Afterwards you can read in the data currently saved as string in data_string
with a very basic list comprehension.
Hope this work-around helps you
import pandas as pd
from pathlib import Path
p = Path("Data/sample.csv")
with p.open() as f:
string_data = f.readline().replace('#%#',';').replace('##@##','\n')
df = pd.DataFrame([x.split(';') for x in string_data.split('\n')])
print(df)
Output:
0 1 2 3
0 cat dog rat cow
1 red blue green yellow
2 north south east west
Import text file with uneven column number and complicated delimiter
To provide another example in addition to the one provided by @JD Long, you could use a regular expression plus a list comprehension:
import re, pandas as pd
string = """
apple pear banana peach orange grape
dog cat white horse
salmon
tiger lion eagle hawk monkey
"""
rx = re.compile(r'''[ ]{2,}''')
items = [(rx.split(line)) for line in string.split("\n") if line]
df = pd.DataFrame.from_records(items)
print(df)
... which yields:
0 1 2 3
0 apple pear banana peach orange grape
1 dog cat white horse None
2 salmon None None None
3 tiger lion eagle hawk monkey
Related Topics
Linux Script with Curl to Check Webservice Is Up
Linux: How to Know the Module That Exports a Device Node
How to Load Luks Passphrase from Usb, Falling Back to Keyboard
How to Hide Wget Output in Linux
How to Find Lines Containing a String in Linux
Listen on a Network Port and Save Data to a Text File
How to Count Number of Unique Values of a Field in a Tab-Delimited Text File
How to Speed Up Linux Kernel Compilation
How to Monitor Data on a Serial Port in Linux
How to Add My Own Software to a Buildroot Linux Package
In Linux, What Do All the Values in the "Top" Command Mean
How to Skip Saturday and Sunday in a Cron Expression
When Should I Write a Linux Kernel Module
Gpg: Sorry, No Terminal at All Requested - Can't Get Input
Are There Any Good Postgresql Clients for Linux
Linux: Getting Umask of an Already Running Process
Linux Cmd to Search for a Class File Among Jars Irrespective of Jar Path