Specifying Column Names in a Data.Frame Changes Spaces to "."

Specifying column names in a data.frame changes spaces to .

You don't.

With the space you desire the format would not satisfy the requirements for an identifier that come to play when you use df$column.1 -- that could not cope with a space. So see the make.names() function for details or an example:

> make.names(c("Foo Bar", "tic tac"))
[1] "Foo.Bar" "tic.tac"
>

Edit eleven years later: The answer still stands that R prefers column names can be valid variable names. But R is flexible: if you insist you can use the other form _but then need to require the not-otherwise-valid-within-the-language column names explicitly:

> x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
> df <- data.frame("Label 1"=x,"Label 2"=rnorm(100), check.names=FALSE)
> summary( df$`Label 2` )
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.2719 -0.7148 -0.0971 -0.0275 0.6559 2.5820
>

So by saying check.names=FALSE we override the default (and sensible) check, and by wrapping the identifier in backticks we can access the column.

Renaming dataframe column names which contain a space

You can use the dplyr function rename_with() to rename all columns that match a certain condition (in this case that it contains a space). In this example I replace the space in the column name with an underscore:

library(dplyr)

df <- data.frame(a = 1:2,
b = LETTERS[1:2],
c = 101:102)
names(df) <- c("a", "b b", "c e f")

df %>%
rename_with(~ gsub(" ","_", .x), contains(" "))

Pandas column access w/column names containing spaces

I think the default way is to use the bracket method instead of the dot notation.

import pandas as pd

df1 = pd.DataFrame({
'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'dat a1': range(7)
})

df1['dat a1']

The other methods, like exposing it as an attribute are more for convenience.

How to deal with spaces in column names?

This is a "bug" in the package ggplot2 that comes from the fact that the function as.data.frame() in the internal ggplot2 function quoted_df converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.

To remind you :

syntactically valid names consists of letters, numbers and the dot or
underline characters, and start with a letter or the dot (but the dot
cannot be followed by a number)

There's a reason for that. There's also a reason why ggplot allows you to set labels using labs, eg using the following dummy dataset with valid names:

X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c("over 80",'under 80'),each=4),
NumberofPractices = rpois(8,70)
)

You can use labs at the end to make this code work

ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +
geom_bar() +
facet_grid(AgeGroup~ .) +
labs(x="% on OAC", y="Number of Practices",fill = "Age Group")

To produce

Sample Image

Select columns with spaced heading in R

We can use backquotes to select those unusual names i.e. column names that doesn't start with letters

subset(df, select = c(height, `80% height`))

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

Also, the dplyr use with specifying df twice is not needed. We can have select function from dplyr

library(dplyr)
df %>%
select(height, `80% height`)

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

It may be also better to remove spaces and append a letter for those column names that start with numbers. clean_names from janitor does

library(janitor)
df %>%
clean_names()

Substitute multiple periods in all column names in R

You can use gsub for name replacement

names(df) <- gsub(".", "-", names(df), fixed=TRUE)

Note that you need fixed=TRUE because normally gsub expects regular expressions and . is a special regular expression character.

But be aware that - is a non-standard character for variable names. If you try to use those columns with functions that use non-standard evaluation, you will need to surround the names in back-ticks to use them. For example

dplyr::filter(df, `a-dfs-56`=="a")

Converting Pandas dataframe to dictionary renames column headers with spaces

It can be done

[y.iloc[0,:].to_dict() for x , y in df.groupby(level=0)]
[{'City': 'Seattle', 'Distance (ft)': 1, 'Temp (F)': 10}, {'City': 'Portland', 'Distance (ft)': 2, 'Temp (F)': 20}, {'City': 'Spokane', 'Distance (ft)': 3, 'Temp (F)': 30}, {'City': 'Everett', 'Distance (ft)': 4, 'Temp (F)': 40}, {'City': 'Tacoma', 'Distance (ft)': 5, 'Temp (F)': 50}]

How to fix spaces in column names of a data.frame (remove spaces, inject dots)?

UDPDATE 2022 Aug:

df %>% rename_with(make.names)

OLD code was: (still works though)
as of Jan 2021: drplyr solution that is brief and uses no extra libraries is

df %<>% dplyr::rename_all(make.names)

credit goes to commenter.

Renaming column names in Pandas

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20

>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20


Related Topics



Leave a reply



Submit