Strip/Trim All Strings of a Dataframe

Strip / trim all strings of a dataframe

You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

Notice: Values cannot be types like dicts or lists, because their dtypes is object.

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

df[0] = df[0].str.strip()

Is there a way to trim/strip whitespace in multiple columns of a pandas dataframe?

Use DataFrame.apply with list of columns:

cols = ['col_1', 'col_2', 'col_4']
df[cols] = df[cols].apply(lambda x: x.str.strip())

Or parse only object columns, it is obviously strings:

cols = df.select_dtypes(object).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())

Integrate strip or trim in python script

You should be able to do this right between two of your lines:

    df_o = df.astype(str)
    df_o = df_o.applymap(lambda x: x.strip() if isinstance(x, str) else x)
    df_o.to_json(filename_json, orient = "records",  lines = bool, date_format = "iso", double_precision = 15, force_ascii = False, date_unit = 'ms', default_handler = str)

Or wherever you want to do this stripping. Note that the other answer, to operate directly on a dictionary is valid too.

how to remove white space from strings of data frame column?

I would stack, strip, get_dummies, and groupby.max:

If the separator is ', ':

df.stack().str.strip().str.get_dummies(sep=', ').groupby(level=0).max()

else:

df.stack().str.replace(r'\s', '', regex=True).str.get_dummies(sep=',').groupby(level=0).max()

output:

   ab  ac  ba  bc  bd  be  df  fg  gh  hj-jk
0   1   0   1   0   0   0   0   0   0      0
1   0   0   0   1   0   0   0   1   0      0
2   0   1   0   0   1   0   0   0   0      0
3   0   0   0   0   0   1   0   0   0      1
4   0   1   0   0   0   1   0   0   0      0
5   0   0   0   0   0   1   1   0   1      0
6   0   0   1   0   0   0   0   0   1      0

Pandas - Strip white space

You can strip() an entire Series in Pandas using .str.strip():

df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()

This will remove leading/trailing whitespaces on the employee_id column in both df1 and df2

Alternatively, you can modify your read_csv lines to also use skipinitialspace=True

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)

It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:

df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")

Pandas trim leading & trailing white space in a dataframe

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN

Strip space from all column values in Pandas

Without an example it is not fully clear what you want to accomplish, but maybe the following will help:

import pandas as pd

df = pd.DataFrame({'A  ': [1, 2], 'B ': [4, 5], 'C': [8,9]})

The column headers do have trailing white spaces:

df.columns
Index([u'A  ', u'B ', u'C'], dtype='object')

Now you can use map and strip to get rid of them:

df.columns = df.columns.map(lambda x: x.strip())

or alternatively

df.columns = df.columns.map(str.strip)

or simply (which should be the preferred option)

df.columns = df.columns.str.strip()

If you now call

df.columns

it yields

Index([u'A', u'B', u'C'], dtype='object')

If it is about the values and not the headers, you can also use applymap:

df = pd.DataFrame({'A': ['1', '2  '], 'B': ['4 ', '5 '], 'C': ['8 ','9']})

     A   B   C
0    1  4   8 
1  2    5    9

Then the following gets rid of the trailing white spaces:

df.applymap(lambda x: x.strip())

or alternatively (which is the better option):

df.applymap(str.strip)

   A  B  C
0  1  4  8
1  2  5  9

Note: This assumes, that you have only strings in your columns. You can also check this link.

Strip/Trim All Strings of a Dataframe