Remove or Replace Spaces in Column Names

How to fix spaces in column names of a data.frame (remove spaces, inject dots)?

UDPDATE 2022 Aug:

df %>% rename_with(make.names)

OLD code was: (still works though)
as of Jan 2021: drplyr solution that is brief and uses no extra libraries is

df %<>% dplyr::rename_all(make.names)

credit goes to commenter.

Can I remove whitespace from all column names with dplyr?

As @camille metions you can use rename_all

library(tidyverse)

mpg %>%
rename("tr ans" = trans, "mo del" = model) %>%
rename_all(~str_replace_all(., "\\s+", ""))

Or rename_at with everything()

mpg %>%
rename("tr ans" = trans, "mo del" = model) %>%
rename_at(vars(everything()), ~str_replace_all(., "\\s+", ""))

remove spaces in selected pandas columns at once

Use Series.str.strip, because working with Series (columns):

print (df)
A B C D E
0 d d s s a
1 a a s a r

df[['A','B','D','E']]=df[['A','B','D','E']].apply(lambda x : x.str.strip())
print (df)
A B C D E
0 d d s s a
1 a a s a r

Your solution should be possible with DataFrame.applymap for element wise processing:

df[['A','B','D','E']]=df[['A','B','D','E']].applymap(lambda x : x.strip())

Or use if possible:

df = pd.read_csv(file, skipinitialspace=True)

remove and replace spaces in columns for multiple dataframes

Does

import pandas as pd

df_1 = pd.DataFrame(columns = ['iQ Name','Cx Name'] )
df_2 = pd.DataFrame(columns = ['Cn Class'])

df_columns = df_1.columns.tolist() + df_2.columns.tolist()
df_columns = [item.replace(' ', '_') for item in df_columns]

df_columns

give you the output you are looking for? It would concatenate the column names into one list, remove the spaces and return them as a list.

Removing spaces from a column in pandas

You want:

df.loc[df['column'] == 'foo', 'py'].apply(lambda x: x.replace(' ',''))

Note the notation of loc.

How to remove blank spaces from column names of Spark DataFrame?

You can use selectExpr or withColumn approaches described below with full example:

while using select expr you have to use column names like this

"`Device ID` as DeviceId", "`Office Address` as OfficeAddress" 
println("selectExpr approach")

val basedf = Seq(
(1, "100abcd", "8100 Memorial Ln Plano Texas")
, (0, "100abcd1", "8100 Memorial Ln Plano Texas")
, (0, "100abcd2", "8100 Memorial Ln Plano Texas")
, (1, "100abcd2", "8100 Memorial Ln Plano Texas")
, (1, "100abcd2", "8100 Memorial Ln Plano Texas")
).toDF("Type", "Device ID", "Office Address")
basedf.show(false)
basedf.selectExpr("Type as type", "`Device ID` as DeviceId", "`Office Address` as OfficeAddress").show(false)
// second exaample
println("with column approach")
val df1 = basedf
.withColumn("DeviceID", $"Device Id")
.withColumn("OfficeAddress", $"Office Address")
.drop("Device Id", "Office Address")
df1.show(false)

Result :

selectExpr approach
+----+---------+----------------------------+
|Type|Device ID|Office Address |
+----+---------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1 |8100 Memorial Ln Plano Texas|
|0 |100abcd2 |8100 Memorial Ln Plano Texas|
|1 |100abcd2 |8100 Memorial Ln Plano Texas|
|1 |100abcd2 |8100 Memorial Ln Plano Texas|
+----+---------+----------------------------+

+----+--------+----------------------------+
|type|DeviceId|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+

with column approach
+----+--------+----------------------------+
|Type|DeviceID|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+

Generic way of doing irrespective of what column names has white space is like below...

println("Generic column rename approach for n number of Columns")
basedf.printSchema()
var newDf: DataFrame = basedf
newDf.columns.foreach { col =>
println(col + " after column replace " + col.replaceAll(" ", ""))
newDf = newDf.withColumnRenamed(col, col.replaceAll(" ", "")
)
}
newDf.printSchema()
newDf.show(false)

Result :

Generic column rename approach for ***n*** number of Columns
root
|-- Type: integer (nullable = false)
|-- Device ID: string (nullable = true)
|-- Office Address: string (nullable = true)

Type after column replace Type
Device ID after column replace DeviceID
Office Address after column replace OfficeAddress
root
|-- Type: integer (nullable = false)
|-- DeviceID: string (nullable = true)
|-- OfficeAddress: string (nullable = true)

+----+--------+----------------------------+
|Type|DeviceID|OfficeAddress |
+----+--------+----------------------------+
|1 |100abcd |8100 Memorial Ln Plano Texas|
|0 |100abcd1|8100 Memorial Ln Plano Texas|
|0 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
|1 |100abcd2|8100 Memorial Ln Plano Texas|
+----+--------+----------------------------+

Conclusion :

Out of all these 3 approaches I will prefer generic approach since if
you have large number of columns it can efficiently handle the rename
with out hiccups



Related Topics



Leave a reply



Submit