Is there a way to trim/strip whitespace in multiple columns of a pandas dataframe?
Use DataFrame.apply
with list of columns:
cols = ['col_1', 'col_2', 'col_4']
df[cols] = df[cols].apply(lambda x: x.str.strip())
Or parse only object columns, it is obviously strings:
cols = df.select_dtypes(object).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
Remove whitespace from list of strings with pandas/python
If I understand correctly some of your dataframe cells have list type
values.
The file_name.json
content is below:
[
{
"key1": "value1 ",
"key2": "2",
"key3": ["a", "b 2 ", " exp white space 210"]
},
{
"key1": "value1 ",
"key2": "2",
"key3": []
}
]
Possible solution in this case is the following:
import pandas as pd
import re
df = pd.read_json("file_name.json")
def cleanup_data(value):
if value and type(value) is list:
return [re.sub(r'\s+', ' ', x.strip()) for x in value]
elif value and type(value) is str:
return re.sub(r'\s+', ' ', value.strip())
else:
return value
# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)
df
Returns
key1 key2 key3
0 value1 2 [a, b 2, exp white space 210]
1 value1 2 []
Strip / trim all strings of a dataframe
You can use DataFrame.select_dtypes
to select string
columns and then apply
function str.strip
.
Notice: Values cannot be types
like dicts
or lists
, because their dtypes
is object
.
df_obj = df.select_dtypes(['object'])
print (df_obj)
0 a
1 c
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)
0 1
0 a 10
1 c 5
But if there are only a few columns use str.strip
:
df[0] = df[0].str.strip()
How can I strip the whitespace from Pandas DataFrame headers?
You can give functions to the rename
method. The str.strip()
method should do what you want:
In [5]: df
Out[5]:
Year Month Value
0 1 2 3
[1 rows x 3 columns]
In [6]: df.rename(columns=lambda x: x.strip())
Out[6]:
Year Month Value
0 1 2 3
[1 rows x 3 columns]
Note: that this returns a DataFrame
object and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes, either use this in a method chain or re-assign the df
variabe:
df = df.rename(columns=lambda x: x.strip())
Removing spaces from a nested list of objects with pandas
We can create a lambda function to strip the spaces from string values in dictionary, then map
this function on the details
column of dataframe:
strip = lambda d: {k: v.strip() if isinstance(v, str) else v for k, v in d.items()}
df['details'] = df['details'].map(lambda L: [strip(d) for d in L])
Result
>>> df.to_dict('r')
[{'name': 'Book1',
'details': [{'id': 30278752,
'isbn': '1594634025',
'isbn13': '9781594634024',
'text_reviews_count': 417,
'work_reviews_count': 3313007,
'work_text_reviews_count': 109912,
'average_rating': '3.92'}]},
{'name': 'Book2',
'details': [{'id': 34006942,
'isbn': '1501173219',
'isbn13': '9781501173219',
'text_reviews_count': 565,
'work_reviews_count': 2142280,
'work_text_reviews_count': 75053,
'average_rating': '4.33'}]}]
Wanted: function to remove whitespace from column headers that is robust to column headers not being strings
You could use a list comprehension, which is quite unusual when working with Pandas as it's usually more efficient to apply built-in Pandas functions (as you've done). But for something as simple as fixing column names, this should be fine:
df = pd.DataFrame(columns=[1, 2, 'A '])
df.columns = [col.strip() if isinstance(col, str) else col for col in df.columns]
Results:
In [75]: df.columns
Out[75]: Index([1, 2, 'A'], dtype='object')
Related Topics
How to Put a Space Between Two String Items in Python
How to Mention a User in Discord.Py
Python Pandas Dataframe Get All Combinations of Column Values
Python Pandas - Get Row Based on Previous Row Value
How to Drop Rows from Pandas Data Frame That Contains a Particular String in a Particular Column
How to Insert String Value into Specific Column Value on Python Pandas
Pip Error: Microsoft Visual C++ 14.0 Is Required
I Received an Error Message That I Don't Quite Understand
Python: How to Print Separate Lines from a List
How to Update a Pyspark Dataframe With New Values from Another Dataframe
Combine Date and Time Columns Using Python Pandas
Python Db-Api: Fetchone VS Fetchmany VS Fetchall
How to Remove Text Within Parentheses With a Regex
Convert Commas Decimal Separators to Dots Within a Dataframe
Move Seaborn Plot Legend to a Different Position