Strip/Trim All Strings of a Dataframe

Strip / trim all strings of a dataframe

You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

Notice: Values cannot be types like dicts or lists, because their dtypes is object.

df_obj = df.select_dtypes(['object'])
print (df_obj)
0 a
1 c

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

0 1
0 a 10
1 c 5

But if there are only a few columns use str.strip:

df[0] = df[0].str.strip()

Is there a way to trim/strip whitespace in multiple columns of a pandas dataframe?

Use DataFrame.apply with list of columns:

cols = ['col_1', 'col_2', 'col_4']
df[cols] = df[cols].apply(lambda x: x.str.strip())

Or parse only object columns, it is obviously strings:

cols = df.select_dtypes(object).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())

Integrate strip or trim in python script

You should be able to do this right between two of your lines:

    df_o = df.astype(str)
df_o = df_o.applymap(lambda x: x.strip() if isinstance(x, str) else x)
df_o.to_json(filename_json, orient = "records", lines = bool, date_format = "iso", double_precision = 15, force_ascii = False, date_unit = 'ms', default_handler = str)

Or wherever you want to do this stripping. Note that the other answer, to operate directly on a dictionary is valid too.

how to remove white space from strings of data frame column?

I would stack, strip, get_dummies, and groupby.max:

If the separator is ', ':

df.stack().str.strip().str.get_dummies(sep=', ').groupby(level=0).max()

else:

df.stack().str.replace(r'\s', '', regex=True).str.get_dummies(sep=',').groupby(level=0).max()

output:

   ab  ac  ba  bc  bd  be  df  fg  gh  hj-jk
0 1 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1 0 0
2 0 1 0 0 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0 0 1
4 0 1 0 0 0 1 0 0 0 0
5 0 0 0 0 0 1 1 0 1 0
6 0 0 1 0 0 0 0 0 1 0

Pandas - Strip white space

You can strip() an entire Series in Pandas using .str.strip():

df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()

This will remove leading/trailing whitespaces on the employee_id column in both df1 and df2

Alternatively, you can modify your read_csv lines to also use skipinitialspace=True

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)

It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:

df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")

Pandas trim leading & trailing white space in a dataframe

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
A B C
0 A b 2 3.0
1 NaN 2 3.0
2 random 43 4.0
3 any txt is possible 2 1 22.0
4 23 99.0
5 help 23 NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
A B C
0 A b NaN 3.0
1 NaN NaN 3.0
2 random NaN 4.0
3 any txt is possible 2 1 22.0
4 NaN 99.0
5 help NaN NaN

Strip space from all column values in Pandas

Without an example it is not fully clear what you want to accomplish, but maybe the following will help:

import pandas as pd

df = pd.DataFrame({'A ': [1, 2], 'B ': [4, 5], 'C': [8,9]})

The column headers do have trailing white spaces:

df.columns
Index([u'A ', u'B ', u'C'], dtype='object')

Now you can use map and strip to get rid of them:

df.columns = df.columns.map(lambda x: x.strip())

or alternatively

df.columns = df.columns.map(str.strip)

or simply (which should be the preferred option)

df.columns = df.columns.str.strip()

If you now call

df.columns

it yields

Index([u'A', u'B', u'C'], dtype='object')

If it is about the values and not the headers, you can also use applymap:

df = pd.DataFrame({'A': ['1', '2  '], 'B': ['4 ', '5 '], 'C': ['8 ','9']})

A B C
0 1 4 8
1 2 5 9

Then the following gets rid of the trailing white spaces:

df.applymap(lambda x: x.strip())

or alternatively (which is the better option):

df.applymap(str.strip)

A B C
0 1 4 8
1 2 5 9

Note: This assumes, that you have only strings in your columns. You can also check this link.



Related Topics



Leave a reply



Submit