Strip / trim all strings of a dataframe
You can use DataFrame.select_dtypes
to select string
columns and then apply
function str.strip
.
Notice: Values cannot be types
like dicts
or lists
, because their dtypes
is object
.
df_obj = df.select_dtypes(['object'])
print (df_obj)
0 a
1 c
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)
0 1
0 a 10
1 c 5
But if there are only a few columns use str.strip
:df[0] = df[0].str.strip()
Is there a way to trim/strip whitespace in multiple columns of a pandas dataframe?
Use DataFrame.apply
with list of columns:
cols = ['col_1', 'col_2', 'col_4']
df[cols] = df[cols].apply(lambda x: x.str.strip())
Or parse only object columns, it is obviously strings:cols = df.select_dtypes(object).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
Integrate strip or trim in python script
You should be able to do this right between two of your lines:
df_o = df.astype(str)
df_o = df_o.applymap(lambda x: x.strip() if isinstance(x, str) else x)
df_o.to_json(filename_json, orient = "records", lines = bool, date_format = "iso", double_precision = 15, force_ascii = False, date_unit = 'ms', default_handler = str)
Or wherever you want to do this stripping. Note that the other answer, to operate directly on a dictionary is valid too. how to remove white space from strings of data frame column?
I would stack
, strip
, get_dummies
, and groupby.max
:
If the separator is ', '
:
df.stack().str.strip().str.get_dummies(sep=', ').groupby(level=0).max()
else:df.stack().str.replace(r'\s', '', regex=True).str.get_dummies(sep=',').groupby(level=0).max()
output: ab ac ba bc bd be df fg gh hj-jk
0 1 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1 0 0
2 0 1 0 0 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0 0 1
4 0 1 0 0 0 1 0 0 0 0
5 0 0 0 0 0 1 1 0 1 0
6 0 0 1 0 0 0 0 0 1 0
Pandas - Strip white space
You can strip()
an entire Series in Pandas using .str.strip():
df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()
This will remove leading/trailing whitespaces on the employee_id
column in both df1
and df2
Alternatively, you can modify your read_csv
lines to also use skipinitialspace=True
df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)
It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:
df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")
Pandas trim leading & trailing white space in a dataframe
I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip
:
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
A B C
0 A b 2 3.0
1 NaN 2 3.0
2 random 43 4.0
3 any txt is possible 2 1 22.0
4 23 99.0
5 help 23 NaN
If columns have same dtypes, not get NaN
s like in your sample for numeric values in column B
:cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
A B C
0 A b NaN 3.0
1 NaN NaN 3.0
2 random NaN 4.0
3 any txt is possible 2 1 22.0
4 NaN 99.0
5 help NaN NaN
Strip space from all column values in Pandas
Without an example it is not fully clear what you want to accomplish, but maybe the following will help:
import pandas as pd
df = pd.DataFrame({'A ': [1, 2], 'B ': [4, 5], 'C': [8,9]})
The column headers do have trailing white spaces:df.columns
Index([u'A ', u'B ', u'C'], dtype='object')
Now you can use map
and strip
to get rid of them:df.columns = df.columns.map(lambda x: x.strip())
or alternativelydf.columns = df.columns.map(str.strip)
or simply (which should be the preferred option)df.columns = df.columns.str.strip()
If you now calldf.columns
it yieldsIndex([u'A', u'B', u'C'], dtype='object')
If it is about the values and not the headers, you can also use applymap
:df = pd.DataFrame({'A': ['1', '2 '], 'B': ['4 ', '5 '], 'C': ['8 ','9']})
A B C
0 1 4 8
1 2 5 9
Then the following gets rid of the trailing white spaces:df.applymap(lambda x: x.strip())
or alternatively (which is the better option):df.applymap(str.strip)
A B C
0 1 4 8
1 2 5 9
Note: This assumes, that you have only strings in your columns. You can also check this link.
Related Topics
Fitting a Closed Curve to a Set of Points
Using Configparser to Read a File Without Section Name
Python - Requests.Exceptions.Sslerror - Dh Key Too Small
Python - Email Header Decoding Utf-8
How to Skip Iterations in a Loop
Row-Wise Average for a Subset of Columns with Missing Values
Wrapping Long Y Labels in Matplotlib Tight Layout Using Setp
Looping from 1 to Infinity in Python
Which Classes Cannot Be Subclassed
Splitting a String by List of Indices
Built in Python Hash() Function
Python: Multiplication Override
How to Log a Python Error with Debug Information