Pythonic/Efficient Way to Strip Whitespace from Every Pandas Data Frame Cell That Has a Stringlike Object in It

Is there a way to trim/strip whitespace in multiple columns of a pandas dataframe?

Use DataFrame.apply with list of columns:

cols = ['col_1', 'col_2', 'col_4']
df[cols] = df[cols].apply(lambda x: x.str.strip())

Or parse only object columns, it is obviously strings:

cols = df.select_dtypes(object).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())

Remove whitespace from list of strings with pandas/python

If I understand correctly some of your dataframe cells have list type values.

The file_name.json content is below:

[
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": ["a", "b  2 ", " exp  white   space 210"]
    }, 
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": []
    }
]

Possible solution in this case is the following:

import pandas as pd
import re

df = pd.read_json("file_name.json")


def cleanup_data(value):
    if value and type(value) is list:
        return [re.sub(r'\s+', ' ', x.strip()) for x in value]
    elif value and type(value) is str:
        return re.sub(r'\s+', ' ', value.strip())
    else:
        return value

# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)

df

Returns

     key1  key2                           key3
0  value1     2  [a, b 2, exp white space 210]
1  value1     2                             []

Strip / trim all strings of a dataframe

You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

Notice: Values cannot be types like dicts or lists, because their dtypes is object.

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

df[0] = df[0].str.strip()

How can I strip the whitespace from Pandas DataFrame headers?

You can give functions to the rename method. The str.strip() method should do what you want:

In [5]: df
Out[5]: 
   Year  Month   Value
0     1       2      3

[1 rows x 3 columns]

In [6]: df.rename(columns=lambda x: x.strip())
Out[6]: 
   Year  Month  Value
0     1      2      3

[1 rows x 3 columns]

Note: that this returns a DataFrame object and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes, either use this in a method chain or re-assign the df variabe:

df = df.rename(columns=lambda x: x.strip())

Removing spaces from a nested list of objects with pandas

We can create a lambda function to strip the spaces from string values in dictionary, then map this function on the details column of dataframe:

strip = lambda d: {k: v.strip() if isinstance(v, str) else v for k, v in d.items()}
df['details'] = df['details'].map(lambda L: [strip(d) for d in L])

Result

>>> df.to_dict('r')

[{'name': 'Book1',
  'details': [{'id': 30278752,
    'isbn': '1594634025',
    'isbn13': '9781594634024',
    'text_reviews_count': 417,
    'work_reviews_count': 3313007,
    'work_text_reviews_count': 109912,
    'average_rating': '3.92'}]},
 {'name': 'Book2',
  'details': [{'id': 34006942,
    'isbn': '1501173219',
    'isbn13': '9781501173219',
    'text_reviews_count': 565,
    'work_reviews_count': 2142280,
    'work_text_reviews_count': 75053,
    'average_rating': '4.33'}]}]

Wanted: function to remove whitespace from column headers that is robust to column headers not being strings

You could use a list comprehension, which is quite unusual when working with Pandas as it's usually more efficient to apply built-in Pandas functions (as you've done). But for something as simple as fixing column names, this should be fine:

df = pd.DataFrame(columns=[1, 2, 'A '])
df.columns = [col.strip() if isinstance(col, str) else col for col in df.columns]

Results:

In [75]: df.columns
Out[75]: Index([1, 2, 'A'], dtype='object')