Turn String into a List and Remove Carriage Returns (Python)

How can I remove carriage return from a text file with Python?

Technically, there is an answer!

with open(filetoread, "rb") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
fixed.write(line)

The b in open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.

Thanks everyone!

Replace carriage returns in list python

You have to count the number of fields, to match 5 per line:

import re
with open(filetoread, "r+b") as inf:
with open(filetowrite, "w") as fixed:
for l in re.finditer('(?:.*?\|){4}(?:.*?)\n', inf.read(), re.DOTALL):
fixed.write(l.group(0).replace('\n','') + '\n')

Remove all line breaks from a long string of text

How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.

>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>

In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).

basically:

# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')

Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.

How to remove \n and \r from a string

A simple solution is to strip trailing whitespace:

with open('gash.txt', 'r') as var:
for line in var:
line = line.rstrip()
print(line)

The advantage of rstrip() over using a [:-2] slice is that this is safe for UNIX style files as well.

However, if you only want to get rid of \r and they might not be at the end-of-line, then str.replace() is your friend:

line = line.replace('\r', '')

If you have a byte object (that's the leading b') the you can convert it to a native Python 3 string using:

line = line.decode()

Replace Carriage Return (CR) and Carriage Return and Line Feed (CRLF) python

As the error message says, supply a bytes object.

line = line.replace(b'\r\r\n', b'\r\n')

To get the desired output

line = line.replace(b'\r\r\n', b'')

Replace all newline characters using python

I don't have access to your pdf file, so I processed one on my system. I also don't know if you need to remove all new lines or just double new lines. The code below remove double new lines, which makes the output more readable.

Please let me know if this works for your current needs.

from tika import parser

filename = 'myfile.pdf'

# Parse the PDF
parsedPDF = parser.from_file(filename)

# Extract the text content from the parsed PDF
pdf = parsedPDF["content"]

# Convert double newlines into single newlines
pdf = pdf.replace('\n\n', '\n')

#####################################
# Do something with the PDF
#####################################
print (pdf)


Related Topics



Leave a reply



Submit