Remove White Space from Entire Dataframe

Remove white space from pandas data frame

Use pd.DataFrame.applymap to take care of data. Use pd.DataFrame.rename to take care of column names.

df.applymap(str.strip).rename(columns=str.strip)

A B C
0 1 2 3
1 4 5 6 7
2 8 9 1 0

To account for that little quote guy

f = lambda x: "'" + x.strip("'").strip() + "'"
df.applymap(f).rename(columns=f)

'A' 'B' 'C'
0 '1' '2' '3'
1 '4' '5 6' '7'
2 '8' '9' '1 0'

How to remove excess whitespaces in entire python dataframe columns

You could use apply:

df = df.applymap(lambda x: " ".join(x.split()) if isinstance(x, str) else x)

Can't remove spaces from pandas dataframe

Just this should do:

df['gdp_per_capita'] = df['gdp_per_capita'].astype(str).str.replace('\s+', '').replace('nan', np.nan)
df['gdp_per_capita'] = pd.to_numeric(df['gdp_per_capita'])
print(df)

region gdp_per_capita
0 Coasts of USA 71546
1 USA: New York, New Jersey 81615
2 USA: California 74205
3 USA: New England 74000

Removing Whitespace From a Whole Data Frame in R

If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

 apply(myData,2,function(x)gsub('\\s+', '',x))

Hope this works.

This will return a matrix however, if you want to change it to data frame then do:

as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))

EDIT In 2020:

Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

DATA:

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))

## situation: 1 (Using data.table, removing only leading and trailing blanks)

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

Output from situation1:

    val val1 num num1
1: abc klm 1 2
2: kl m gdfs 2 3
3: dfsd 123 3 4

## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]

Output from situation2:

    val val1 num num1
1: abc klm 1 2
2: klm gdfs 2 3
3: dfsd 123 3 4

Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

I hope this helps , Thanks

Pandas trim leading & trailing white space in a dataframe

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
A B C
0 A b 2 3.0
1 NaN 2 3.0
2 random 43 4.0
3 any txt is possible 2 1 22.0
4 23 99.0
5 help 23 NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
A B C
0 A b NaN 3.0
1 NaN NaN 3.0
2 random NaN 4.0
3 any txt is possible 2 1 22.0
4 NaN 99.0
5 help NaN NaN

Remove whitespace in a string for multiple dataframe columns

You can use .replace:

df = df.replace(r'([A-Z]+)\s(\d+)', r'\1\2', regex=True)

Or, to only replace in specified columns:

df[['Title', 'Description', 'Numbers']] = df[['Title', 'Description', 'Numbers']].replace(r'([A-Z]+)\s(\d+)', r'\1\2', regex=True)

Here, ([A-Z]+)\s(\d+) matches and captures one or more capital letters into Group 1 and (\d+) will match one or more digits into Group 2, and the whitespace between will be matched, but not captured. The replacement contains two backreferences to the group values captured, so the whitespace in between will be removed.



Related Topics



Leave a reply



Submit