Python Read File Determined by Separator \R\N

Python read file determined by separator \r\n

Open the file with 'rb':

open('file.txt', 'rb').read().split('\r\n')

I found it a bit of challenge to create a text file with just CR and just LF, but Notepad++ helped me.

With this content:

CRLF\r\nCR\rLF\nCRLF\r\n

using print open('file.txt', 'rb').read().split('\r\n')

I got this output:

['CRLF', 'CR\rLF\nCRLF', '']

Python: Reading a file by using \n as the newline character. File also contains \r\n

I'm sure your answers are completely correct and technically advanced.
Sadly the CSV-File is not at all RFC 4180 compliant.

Therefore i'm going with the following solution and correct my temporary characters "||" afterwards:

with open(outputfile_corrected, 'w') as correctedfile_handle:
with open(outputfile, encoding="ISO-8859-15", newline='') as csvfile:
csvfile_content = csvfile.read()
csvfile_content_new = csvfile_content.replace('\r\n', '||')
correctedfile_handle.write(csvfile_content_new)

(Someone commented this, but answer has been deleted)

python read file (or string) into dictionary by first separator only

you can read file by using native python

dicti={}
f = open("file.txt", "r").read().splitlines()
for x in f:
dicti[x.split(' ')[0]]=x.split(' ',maxsplit=1)[1]

print(dicti)

and output will be:

{'AGE': '32', 'JOB': 'clerk', 'NAME': 'Bob Young'}

Reading a file with a specified delimiter for newline

You could use a generator:

def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
yield buf
break
buf += chunk

with open('file') as f:
for line in myreadlines(f, "."):
print line

Reading a text file in pandas with separator as linefeed (\n) and line terminator as two linefeeds (\n\n)

Try this:

with open(filename, 'r') as f:
data = f.read().replace('\n',',').replace(',,','\n')

In [7]: pd.read_csv(pd.compat.StringIO(data), header=None)
Out[7]:
0 1 2
0 2 8 4
1 3 1 9
2 6 5 7

Python readline with custom delimiter

Python 3 allows you to define what is the newline for a particular file. It is seldom used, because the default universal newlines mode is very tolerant:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.

So here you should made explicit that only '\r\n' is an end of line:

f= open("f.txt",mode='r',encoding='utf8', newline='\r\n')

# use enumerate to show that second line is read as a whole
for i, line in enumerate(fd):
print(i, line)

Read files line by line with \r, \n or \r\n as line separator

I suggest you first determine the line separator. I've assumed that you can do that by reading characters until you encounter "\n" or "\r" (or reach the end of the file, in which case we can regard "\n" as the line separator). If the character "\n" is found, I assume that to be the separator; if "\r" is found I attempt to read the next character. If I can do so and it is "\n", I return "\r\n" as the separator. If "\r" is the last character in the file or is followed by a character other than "\n", I return "\r" as the separator.

def separator(fname)
f = File.open(fname)
enum = f.each_char
c = enum.next
loop do
case c[/\r|\n/]
when "\n" then break
when "\r"
c << "\n" if enum.peek=="\n"
break
end
c = enum.next
end
c[0][/\r|\n/] ? c : "\n"
end

Then process the file line-by-line

def process(fname)
sep = separator(fname)
IO.foreach(fname, sep) { |line| puts line }
end

I haven't converted "\r" or "\r\n" to "\n", but of course you could do that easily. Just open a file for writing and in process read each line and write it to the output file with the default line separator.

Let's try it (for clarity I show the value returned by separator):

fname = "temp"

IO.write(fname, "slash n line 1\nslash n line 2\n")
#=> 30
separator(fname)
#=> "\n"
process(fname)
# slash n line 1
# slash n line 2

IO.write(fname, "slash r line 1\rslash r line 2\r", )
#=> 30
separator(fname)
#=> "\r"
process(fname)
# slash r line 1
# slash r line 2

IO.write(fname, "slash r slash n line 1\r\nslash r slash n line 2\r\n")
#=> 48
separator(fname)
#=> "\r\n"
process(fname)
# slash r slash n line 1
# slash r slash n line 2

How to read a text file where some of the contents have line breaks?

Given your example input, you can use a regex with a forward lookahead:

pat=re.compile(r'^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)', re.S | re.M)

with open (fn) as f:
pprint([m.group(1) for m in pat.finditer(f.read())])

Prints:

['06/01/2016, 10:40 pm - abcde\n',
'07/01/2016, 12:04 pm - abcde\n',
'07/01/2016, 12:05 pm - abcde\n',
'07/01/2016, 12:05 pm - abcde\n',
'07/01/2016, 6:14 pm - abcde\n\nfghe\n',
'07/01/2016, 6:20 pm - abcde\n',
'07/01/2016, 7:58 pm - abcde\n\nfghe\n\nijkl\n',
'07/01/2016, 7:58 pm - abcde\n']

With the Dropbox example, prints:

['11/11/2015, 3:16 pm - IK: 12\n',
'13/11/2015, 12:10 pm - IK: Hi.\n\nBut this is not about me.\n\nA donation, however small, will go a long way.\n\nThank you.\n',
'13/11/2015, 12:11 pm - IK: Boo\n',
'15/11/2015, 8:36 pm - IR: Root\n',
'15/11/2015, 8:36 pm - IR: LaTeX?\n',
'15/11/2015, 8:43 pm - IK: Ws\n']

If you want to delete the \n in what is captured, just add m.group(1).strip().replace('\n', '') to the list comprehension above.


Explanation of regex:

^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)

^ start of line
^ ^ ^ ^ ^ pattern for a date
^ capture the rest...
^ until (look ahead)
^ ^ ^ another date
^ or
^ end of string


Related Topics



Leave a reply



Submit