How to fix spaces in column names of a data.frame (remove spaces, inject dots)?
UDPDATE 2022 Aug:
df %>% rename_with(make.names)
OLD code was: (still works though)
as of Jan 2021: drplyr solution that is brief and uses no extra libraries is
df %<>% dplyr::rename_all(make.names)
credit goes to commenter.
Can I remove whitespace from all column names with dplyr?
As @camille metions you can use rename_all
library(tidyverse)
mpg %>%
rename("tr ans" = trans, "mo del" = model) %>%
rename_all(~str_replace_all(., "\\s+", ""))
Or rename_at
with everything()
mpg %>%
rename("tr ans" = trans, "mo del" = model) %>%
rename_at(vars(everything()), ~str_replace_all(., "\\s+", ""))
remove spaces in selected pandas columns at once
Use Series.str.strip
, because working with Series
(columns
):
print (df)
A B C D E
0 d d s s a
1 a a s a r
df[['A','B','D','E']]=df[['A','B','D','E']].apply(lambda x : x.str.strip())
print (df)
A B C D E
0 d d s s a
1 a a s a r
Your solution should be possible with DataFrame.applymap
for element wise processing:
df[['A','B','D','E']]=df[['A','B','D','E']].applymap(lambda x : x.strip())
Or use if possible:
df = pd.read_csv(file, skipinitialspace=True)
remove and replace spaces in columns for multiple dataframes
Does
import pandas as pd
df_1 = pd.DataFrame(columns = ['iQ Name','Cx Name'] )
df_2 = pd.DataFrame(columns = ['Cn Class'])
df_columns = df_1.columns.tolist() + df_2.columns.tolist()
df_columns = [item.replace(' ', '_') for item in df_columns]
df_columns
give you the output you are looking for? It would concatenate the column names into one list, remove the spaces and return them as a list.
Removing spaces from a column in pandas
You want:
df.loc[df['column'] == 'foo', 'py'].apply(lambda x: x.replace(' ',''))
Note the notation of loc
.
How to remove blank spaces from column names of Spark DataFrame?
You can use selectExpr
or withColumn
approaches described below with full example:
while using select expr you have to use column names like this
"`Device ID` as DeviceId", "`Office Address` as OfficeAddress"
println("selectExpr approach")
val basedf = Seq(
(1, "100abcd", "8100 Memorial Ln Plano Texas")
, (0, "100abcd1", "8100 Memorial Ln Plano Texas")
, (0, "100abcd2", "8100 Memorial Ln Plano Texas")
, (1, "100abcd2", "8100 Memorial Ln Plano Texas")
, (1, "100abcd2", "8100 Memorial Ln Plano Texas")
).toDF("Type", "Device ID", "Office Address")
basedf.show(false)
basedf.selectExpr("Type as type", "`Device ID` as DeviceId", "`Office Address` as OfficeAddress").show(false)
// second exaample
println("with column approach")
val df1 = basedf
.withColumn("DeviceID", $"Device Id")
.withColumn("OfficeAddress", $"Office Address")
.drop("Device Id", "Office Address")
df1.show(false)
Result :
selectExpr approach
+----+---------+----------------------------+
|Type|Device ID|Office Address |
+----+---------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1 |8100 Memorial Ln Plano Texas|
|0 |100abcd2 |8100 Memorial Ln Plano Texas|
|1 |100abcd2 |8100 Memorial Ln Plano Texas|
|1 |100abcd2 |8100 Memorial Ln Plano Texas|
+----+---------+----------------------------+
+----+--------+----------------------------+
|type|DeviceId|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+
with column approach
+----+--------+----------------------------+
|Type|DeviceID|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+
Generic way of doing irrespective of what column names has white space is like below...
println("Generic column rename approach for n number of Columns")
basedf.printSchema()
var newDf: DataFrame = basedf
newDf.columns.foreach { col =>
println(col + " after column replace " + col.replaceAll(" ", ""))
newDf = newDf.withColumnRenamed(col, col.replaceAll(" ", "")
)
}
newDf.printSchema()
newDf.show(false)
Result :
Generic column rename approach for ***n*** number of Columns
root
|-- Type: integer (nullable = false)
|-- Device ID: string (nullable = true)
|-- Office Address: string (nullable = true)
Type after column replace Type
Device ID after column replace DeviceID
Office Address after column replace OfficeAddress
root
|-- Type: integer (nullable = false)
|-- DeviceID: string (nullable = true)
|-- OfficeAddress: string (nullable = true)
+----+--------+----------------------------+
|Type|DeviceID|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+
Conclusion :
Out of all these 3 approaches I will prefer generic approach since if
you have large number of columns it can efficiently handle the rename
with out hiccups
Related Topics
Python 2.X Gotchas and Landmines
Is There a Difference Between Using a Dict Literal and a Dict Constructor
How to Use _Init_.Py to Define Global Variables
Django - How to Make a Variable Available to All Templates
How to Get the Domain Name of My Site Within a Django Template
Installing MySQL Python on MAC Os X
Argparse with Required Subparser
Class Variables Is Shared Across All Instances in Python
Printing a List of Objects of User Defined Class
What Is the Meaning of "Failed Building Wheel for X" in Pip Install
Find the Indexes of All Regex Matches
Crawling with an Authenticated Session in Scrapy
How to Change the String Representation of a Python Class
How to Initialize the Base (Super) Class
Will Ordereddict Become Redundant in Python 3.7
Scikit-Learn Gridsearchcv with Multiple Repetitions
Python Multiple Inheritance Passing Arguments to Constructors Using Super