Remove carriage return and newline feeds within a list of dictionaries - Python
Finally found a working solution to remove carriage returns and newline feeds in a list of dictionaries.
Firstly, you use json.dumps
which takes a dictionary as input and returns a string as output to enable you to use .replace
as it only works with strings.
Once the newline feeds and carriage returns have been removed from the string, the string can now be converted back to a dictionary using json.loads
which will take a string as input and returns a dictionary as an output.
docs2 = json.dumps(docs)
docs2 = doc2.replace(r"\n",'').replace(r"\r\n",'').replace(r"\r",'')
docs2 = json.loads(docs2)
docs2 = json_normalize(docs2)
print(docs2)
Why can't I replace a newline in my pandas dataframe?
Try this:df['Title'] = df['Title'].str.replace("\n"," ")
This will replace every line breaker, with a simple space, in every row.
If you want for all columns:df = df.replace(r'\n',' ', regex=True)
How to remove newline in pandas dataframe columns?
From what I've learnt, the third parameter for the .replace() parameter takes the count of the number of times you want to replace the old substring with the new substring, so instead just remove the third parameter since you don't know the number of times the new line exists.
new_f = f[keep_col].replace('\\n',' ')
This should help
Removing Carriage Returns from Csv String
Rather than stripping the carriage returns from your CSV file, ensure that those fields that contain them are quoted. One way is to just quote all fields:
import csv
import pandas as pd
df.to_csv(sep=',', encoding='utf-8', index=False, header=False, quoting=csv.QUOTE_ALL)
Alternatively you can use quoting=csv.QUOTE_NONNUMERIC
to quote only those fields likely to contain \r
.
One other way is to set the line terminator to \r\n
(or just \r
) which will indirectly cause any field that contains \r
to be quoted. This might be preferred because only those individual "cells" that require it are quoted:
df.to_csv(sep=',', encoding='utf-8', index=False, header=False, line_terminator='\r\n')
Ignore carriage returns (u1000D) with read_csv in python pandas
Pandas supports multiline CSV files if the file is properly escaped and quoted. If you cannot read a CSV file in Python using pandas or csv modules nor open it in MS Excel then it's probably a non-compliant "CSV" file.
Recommend to manually edit a sample of the CSV file and get it working so can open with Excel. Then recreate the steps to normalize it programmatically in Python to process the large file.
Use this code to create a sample CSV file copying first ~100 lines into a new file.
with open('bigfile.csv', "r") as csvin, open('test.csv', "w") as csvout:
line = csvin.readline()
count = 0
while line and count < 100:
csvout.write(line)
count += 1
line = csvin.readline()
Now you have a small test file to work with. If the original CSV file has millions of rows and "bad" rows are found much later in the file then you need to add some logic to find the "bad" lines.
Related Topics
How to Open Different Urls At the Same Time by Using Python Selenium
How to Check Whether All Elements of Array Are in Between Two Values
How to Update/Delete Rows in Bigquery from the Python API
Getting S3 Objects' Last Modified Datetimes With Boto
Python: Requests.Exceptions.Connectionerror. Max Retries Exceeded With Url
How to Copy/Repeat an Array N Times into a New Array
Retrieve Top N in Each Group of a Dataframe in Pyspark
Python Multiprocessing Pool Hangs At Join
Windowserror: [Error 126] the Specified Module Could Not Be Found
A Better Way Than Looping and Calling Functions That Loop and Call Another Functions
How to Move to One Folder Back in Python
How to Share Single Sqlite Connection in Multi-Threaded Python Application
How to Extract Rar Files Inside Google Colab
How to Get Max() to Return Variable Names Instead of Values in Python
How to Find Words in a List That Starts With a Certain Letter the User Asked For
Split List into Two Parts Based on Some Delimiter in Each List Element in Python