Remove white space from pandas data frame
Use pd.DataFrame.applymap
to take care of data. Use pd.DataFrame.rename
to take care of column names.
df.applymap(str.strip).rename(columns=str.strip)
A B C
0 1 2 3
1 4 5 6 7
2 8 9 1 0
To account for that little quote guy
f = lambda x: "'" + x.strip("'").strip() + "'"
df.applymap(f).rename(columns=f)
'A' 'B' 'C'
0 '1' '2' '3'
1 '4' '5 6' '7'
2 '8' '9' '1 0'
How to remove excess whitespaces in entire python dataframe columns
You could use apply
:
df = df.applymap(lambda x: " ".join(x.split()) if isinstance(x, str) else x)
Can't remove spaces from pandas dataframe
Just this should do:
df['gdp_per_capita'] = df['gdp_per_capita'].astype(str).str.replace('\s+', '').replace('nan', np.nan)
df['gdp_per_capita'] = pd.to_numeric(df['gdp_per_capita'])
print(df)
region gdp_per_capita
0 Coasts of USA 71546
1 USA: New York, New Jersey 81615
2 USA: California 74205
3 USA: New England 74000
Removing Whitespace From a Whole Data Frame in R
If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:
apply(myData,2,function(x)gsub('\\s+', '',x))
Hope this works.
This will return a matrix however, if you want to change it to data frame then do:
as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))
EDIT In 2020:
Using lapply
and trimws
function with both=TRUE
can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.
DATA:
df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)
# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).
(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))
## situation: 1 (Using data.table, removing only leading and trailing blanks)
library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
Output from situation1:
val val1 num num1
1: abc klm 1 2
2: kl m gdfs 2 3
3: dfsd 123 3 4
## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]
Output from situation2:
val val1 num num1
1: abc klm 1 2
2: klm gdfs 2 3
3: dfsd 123 3 4
Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws
we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).
I hope this helps , Thanks
Pandas trim leading & trailing white space in a dataframe
I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip
:
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
A B C
0 A b 2 3.0
1 NaN 2 3.0
2 random 43 4.0
3 any txt is possible 2 1 22.0
4 23 99.0
5 help 23 NaN
If columns have same dtypes, not get NaN
s like in your sample for numeric values in column B
:
cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
A B C
0 A b NaN 3.0
1 NaN NaN 3.0
2 random NaN 4.0
3 any txt is possible 2 1 22.0
4 NaN 99.0
5 help NaN NaN
Remove whitespace in a string for multiple dataframe columns
You can use .replace
:
df = df.replace(r'([A-Z]+)\s(\d+)', r'\1\2', regex=True)
Or, to only replace in specified columns:
df[['Title', 'Description', 'Numbers']] = df[['Title', 'Description', 'Numbers']].replace(r'([A-Z]+)\s(\d+)', r'\1\2', regex=True)
Here, ([A-Z]+)\s(\d+)
matches and captures one or more capital letters into Group 1 and (\d+)
will match one or more digits into Group 2, and the whitespace between will be matched, but not captured. The replacement contains two backreferences to the group values captured, so the whitespace in between will be removed.
Related Topics
Replace Single Quote With Double Quote in a String Python
Regex Check If Specific Multiple Words Present in a Sentence
How to Concatenate Multiple Column Values into a Single Column in Pandas Dataframe
How to Convert Number 1 to a Boolean in Python
Keras Valueerror: Input 0 Is Incompatible With Layer Conv2D_1: Expected Ndim=4, Found Ndim=5
How to Download the Latest File of an S3 Bucket Using Boto3
Find the Item With Maximum Occurrences in a List
How to Stop Execution of Python Script in Visual Studio Code
Convert CSV to Parquet File Using Python
Python Db-Api: Fetchone VS Fetchmany VS Fetchall
Saving the Output of a Python Program
Importing Modules from Parent Folder
How to Find a Word That Starts With a Specific Character
How to Add Parenthesis Around a Substring in a String
How to Stop a Running Function Without Exiting the Tkinter Window Entirely