Specifying column names in a data.frame changes spaces to .
You don't.
With the space you desire the format would not satisfy the requirements for an identifier that come to play when you use df$column.1
-- that could not cope with a space. So see the make.names()
function for details or an example:
> make.names(c("Foo Bar", "tic tac"))
[1] "Foo.Bar" "tic.tac"
>
Edit eleven years later: The answer still stands that R prefers column names can be valid variable names. But R is flexible: if you insist you can use the other form _but then need to require the not-otherwise-valid-within-the-language column names explicitly:
> x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
> df <- data.frame("Label 1"=x,"Label 2"=rnorm(100), check.names=FALSE)
> summary( df$`Label 2` )
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.2719 -0.7148 -0.0971 -0.0275 0.6559 2.5820
>
So by saying check.names=FALSE
we override the default (and sensible) check, and by wrapping the identifier in backticks we can access the column.
Renaming dataframe column names which contain a space
You can use the dplyr function rename_with() to rename all columns that match a certain condition (in this case that it contains a space). In this example I replace the space in the column name with an underscore:
library(dplyr)
df <- data.frame(a = 1:2,
b = LETTERS[1:2],
c = 101:102)
names(df) <- c("a", "b b", "c e f")
df %>%
rename_with(~ gsub(" ","_", .x), contains(" "))
Pandas column access w/column names containing spaces
I think the default way is to use the bracket method instead of the dot notation.
import pandas as pd
df1 = pd.DataFrame({
'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'dat a1': range(7)
})
df1['dat a1']
The other methods, like exposing it as an attribute are more for convenience.
How to deal with spaces in column names?
This is a "bug" in the package ggplot2
that comes from the fact that the function as.data.frame()
in the internal ggplot2 function quoted_df
converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.
To remind you :
syntactically valid names consists of letters, numbers and the dot or
underline characters, and start with a letter or the dot (but the dot
cannot be followed by a number)
There's a reason for that. There's also a reason why ggplot allows you to set labels using labs
, eg using the following dummy dataset with valid names:
X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c("over 80",'under 80'),each=4),
NumberofPractices = rpois(8,70)
)
You can use labs at the end to make this code work
ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +
geom_bar() +
facet_grid(AgeGroup~ .) +
labs(x="% on OAC", y="Number of Practices",fill = "Age Group")
To produce
Select columns with spaced heading in R
We can use backquotes to select those unusual names i.e. column names that doesn't start with letters
subset(df, select = c(height, `80% height`))
-output
# height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2
Also, the dplyr
use with specifying df
twice is not needed. We can have select
function from dplyr
library(dplyr)
df %>%
select(height, `80% height`)
-output
# height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2
It may be also better to remove spaces and append a letter for those column names that start with numbers. clean_names
from janitor
does
library(janitor)
df %>%
clean_names()
Substitute multiple periods in all column names in R
You can use gsub
for name replacement
names(df) <- gsub(".", "-", names(df), fixed=TRUE)
Note that you need fixed=TRUE
because normally gsub
expects regular expressions and .
is a special regular expression character.
But be aware that -
is a non-standard character for variable names. If you try to use those columns with functions that use non-standard evaluation, you will need to surround the names in back-ticks to use them. For example
dplyr::filter(df, `a-dfs-56`=="a")
Converting Pandas dataframe to dictionary renames column headers with spaces
It can be done
[y.iloc[0,:].to_dict() for x , y in df.groupby(level=0)]
[{'City': 'Seattle', 'Distance (ft)': 1, 'Temp (F)': 10}, {'City': 'Portland', 'Distance (ft)': 2, 'Temp (F)': 20}, {'City': 'Spokane', 'Distance (ft)': 3, 'Temp (F)': 30}, {'City': 'Everett', 'Distance (ft)': 4, 'Temp (F)': 40}, {'City': 'Tacoma', 'Distance (ft)': 5, 'Temp (F)': 50}]
How to fix spaces in column names of a data.frame (remove spaces, inject dots)?
UDPDATE 2022 Aug:
df %>% rename_with(make.names)
OLD code was: (still works though)
as of Jan 2021: drplyr solution that is brief and uses no extra libraries is
df %<>% dplyr::rename_all(make.names)
credit goes to commenter.
Renaming column names in Pandas
Just assign it to the .columns
attribute:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20
Related Topics
Marker Mouse Click Event in R Leaflet for Shiny
How to Position Strip Labels in Facet_Wrap Like in Facet_Grid
Formatting Reactive Data.Frames in Shiny
Using Row-Wise Column Indices in a Vector to Extract Values from Data Frame
Practical Limits of R Data Frame
Network Chord Diagram Woes in R
Recommendations for Windows Text Editor for R
Automatically Create Formulas for All Possible Linear Models
Add Secondary X Axis Labels to Ggplot with One X Axis
Subfigures or Subcaptions with Knitr
Deleting Columns from a Data.Frame Where Na Is More Than 15% of the Column Length
Splitting a Data.Frame by a Variable
Time Out an R Command via Something Like Try()