How to Remove Carriage Return in a Dataframe

Remove carriage return and newline feeds within a list of dictionaries - Python

Finally found a working solution to remove carriage returns and newline feeds in a list of dictionaries.

Firstly, you use json.dumps which takes a dictionary as input and returns a string as output to enable you to use .replace as it only works with strings.

Once the newline feeds and carriage returns have been removed from the string, the string can now be converted back to a dictionary using json.loads which will take a string as input and returns a dictionary as an output.

docs2 = json.dumps(docs)
docs2 = doc2.replace(r"\n",'').replace(r"\r\n",'').replace(r"\r",'')
docs2 = json.loads(docs2)
docs2 = json_normalize(docs2)
print(docs2)

Why can't I replace a newline in my pandas dataframe?

Try this:
df['Title'] = df['Title'].str.replace("\n"," ")

This will replace every line breaker, with a simple space, in every row.

If you want for all columns:
df = df.replace(r'\n',' ', regex=True)

How to remove newline in pandas dataframe columns?

From what I've learnt, the third parameter for the .replace() parameter takes the count of the number of times you want to replace the old substring with the new substring, so instead just remove the third parameter since you don't know the number of times the new line exists.

new_f = f[keep_col].replace('\\n',' ')

This should help

Removing Carriage Returns from Csv String

Rather than stripping the carriage returns from your CSV file, ensure that those fields that contain them are quoted. One way is to just quote all fields:

import csv
import pandas as pd

df.to_csv(sep=',', encoding='utf-8', index=False, header=False, quoting=csv.QUOTE_ALL)

Alternatively you can use quoting=csv.QUOTE_NONNUMERIC to quote only those fields likely to contain \r.

One other way is to set the line terminator to \r\n (or just \r) which will indirectly cause any field that contains \r to be quoted. This might be preferred because only those individual "cells" that require it are quoted:

df.to_csv(sep=',', encoding='utf-8', index=False, header=False, line_terminator='\r\n')

Ignore carriage returns (u1000D) with read_csv in python pandas

Pandas supports multiline CSV files if the file is properly escaped and quoted. If you cannot read a CSV file in Python using pandas or csv modules nor open it in MS Excel then it's probably a non-compliant "CSV" file.

Recommend to manually edit a sample of the CSV file and get it working so can open with Excel. Then recreate the steps to normalize it programmatically in Python to process the large file.

Use this code to create a sample CSV file copying first ~100 lines into a new file.

with open('bigfile.csv', "r") as csvin, open('test.csv', "w") as csvout:
line = csvin.readline()
count = 0
while line and count < 100:
csvout.write(line)
count += 1
line = csvin.readline()

Now you have a small test file to work with. If the original CSV file has millions of rows and "bad" rows are found much later in the file then you need to add some logic to find the "bad" lines.



Related Topics



Leave a reply



Submit