Python read file determined by separator \r\n
Open the file with 'rb':
open('file.txt', 'rb').read().split('\r\n')
I found it a bit of challenge to create a text file with just CR and just LF, but Notepad++ helped me.
With this content:
CRLF\r\nCR\rLF\nCRLF\r\n
using print open('file.txt', 'rb').read().split('\r\n')
I got this output:
['CRLF', 'CR\rLF\nCRLF', '']
Python: Reading a file by using \n as the newline character. File also contains \r\n
I'm sure your answers are completely correct and technically advanced.
Sadly the CSV-File is not at all RFC 4180 compliant.
Therefore i'm going with the following solution and correct my temporary characters "||" afterwards:
with open(outputfile_corrected, 'w') as correctedfile_handle:
with open(outputfile, encoding="ISO-8859-15", newline='') as csvfile:
csvfile_content = csvfile.read()
csvfile_content_new = csvfile_content.replace('\r\n', '||')
correctedfile_handle.write(csvfile_content_new)
(Someone commented this, but answer has been deleted)
python read file (or string) into dictionary by first separator only
you can read file by using native python
dicti={}
f = open("file.txt", "r").read().splitlines()
for x in f:
dicti[x.split(' ')[0]]=x.split(' ',maxsplit=1)[1]
print(dicti)
and output will be:
{'AGE': '32', 'JOB': 'clerk', 'NAME': 'Bob Young'}
Reading a file with a specified delimiter for newline
You could use a generator:
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
yield buf
break
buf += chunk
with open('file') as f:
for line in myreadlines(f, "."):
print line
Reading a text file in pandas with separator as linefeed (\n) and line terminator as two linefeeds (\n\n)
Try this:
with open(filename, 'r') as f:
data = f.read().replace('\n',',').replace(',,','\n')
In [7]: pd.read_csv(pd.compat.StringIO(data), header=None)
Out[7]:
0 1 2
0 2 8 4
1 3 1 9
2 6 5 7
Python readline with custom delimiter
Python 3 allows you to define what is the newline for a particular file. It is seldom used, because the default universal newlines mode is very tolerant:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.
So here you should made explicit that only '\r\n' is an end of line:
f= open("f.txt",mode='r',encoding='utf8', newline='\r\n')
# use enumerate to show that second line is read as a whole
for i, line in enumerate(fd):
print(i, line)
Read files line by line with \r, \n or \r\n as line separator
I suggest you first determine the line separator. I've assumed that you can do that by reading characters until you encounter "\n" or "\r" (or reach the end of the file, in which case we can regard "\n" as the line separator). If the character "\n" is found, I assume that to be the separator; if "\r" is found I attempt to read the next character. If I can do so and it is "\n", I return "\r\n" as the separator. If "\r" is the last character in the file or is followed by a character other than "\n", I return "\r" as the separator.
def separator(fname)
f = File.open(fname)
enum = f.each_char
c = enum.next
loop do
case c[/\r|\n/]
when "\n" then break
when "\r"
c << "\n" if enum.peek=="\n"
break
end
c = enum.next
end
c[0][/\r|\n/] ? c : "\n"
end
Then process the file line-by-line
def process(fname)
sep = separator(fname)
IO.foreach(fname, sep) { |line| puts line }
end
I haven't converted "\r"
or "\r\n"
to "\n"
, but of course you could do that easily. Just open a file for writing and in process
read each line and write it to the output file with the default line separator.
Let's try it (for clarity I show the value returned by separator
):
fname = "temp"
IO.write(fname, "slash n line 1\nslash n line 2\n")
#=> 30
separator(fname)
#=> "\n"
process(fname)
# slash n line 1
# slash n line 2
IO.write(fname, "slash r line 1\rslash r line 2\r", )
#=> 30
separator(fname)
#=> "\r"
process(fname)
# slash r line 1
# slash r line 2
IO.write(fname, "slash r slash n line 1\r\nslash r slash n line 2\r\n")
#=> 48
separator(fname)
#=> "\r\n"
process(fname)
# slash r slash n line 1
# slash r slash n line 2
How to read a text file where some of the contents have line breaks?
Given your example input, you can use a regex with a forward lookahead:
pat=re.compile(r'^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)', re.S | re.M)
with open (fn) as f:
pprint([m.group(1) for m in pat.finditer(f.read())])
Prints:
['06/01/2016, 10:40 pm - abcde\n',
'07/01/2016, 12:04 pm - abcde\n',
'07/01/2016, 12:05 pm - abcde\n',
'07/01/2016, 12:05 pm - abcde\n',
'07/01/2016, 6:14 pm - abcde\n\nfghe\n',
'07/01/2016, 6:20 pm - abcde\n',
'07/01/2016, 7:58 pm - abcde\n\nfghe\n\nijkl\n',
'07/01/2016, 7:58 pm - abcde\n']
With the Dropbox example, prints:
['11/11/2015, 3:16 pm - IK: 12\n',
'13/11/2015, 12:10 pm - IK: Hi.\n\nBut this is not about me.\n\nA donation, however small, will go a long way.\n\nThank you.\n',
'13/11/2015, 12:11 pm - IK: Boo\n',
'15/11/2015, 8:36 pm - IR: Root\n',
'15/11/2015, 8:36 pm - IR: LaTeX?\n',
'15/11/2015, 8:43 pm - IK: Ws\n']
If you want to delete the \n
in what is captured, just add m.group(1).strip().replace('\n', '')
to the list comprehension above.
Explanation of regex:
^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)
^ start of line
^ ^ ^ ^ ^ pattern for a date
^ capture the rest...
^ until (look ahead)
^ ^ ^ another date
^ or
^ end of string
Related Topics
Hiding Axis Text in Matplotlib Plots
How to Check If a String Is Unicode or Ascii
Split Datetime Column into a Date and Time Python
Collect_List by Preserving Order Based on Another Variable
Package Only Binary Compiled .So Files of a Python Library Compiled With Cython
Unable Log in to the Django Admin Page With a Valid Username and Password
Best Way to Identify and Extract Dates from Text Python
Counting How Many Times Each Vowel Appears
How to Use a Pre-Trained Neural Network With Grayscale Images
How to Remove Lowest Elements in List
How to Sort a List of Lists by a Specific Index of the Inner List
Getting S3 Objects' Last Modified Datetimes With Boto
Missing 1 Required Positional Argument - Issue
Get the Mean Across Multiple Pandas Dataframes
How to Smooth a Curve in the Right Way
Test a Function Called Twice in Python
Best Practice to Run Multiple Spark Instance At a Time in Same Jvm