How to Deal with Spaces in Column Names

How to deal with spaces in column names?

This is a "bug" in the package ggplot2 that comes from the fact that the function as.data.frame() in the internal ggplot2 function quoted_df converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.

To remind you :

syntactically valid names consists of letters, numbers and the dot or
underline characters, and start with a letter or the dot (but the dot
cannot be followed by a number)

There's a reason for that. There's also a reason why ggplot allows you to set labels using labs, eg using the following dummy dataset with valid names:

X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c("over 80",'under 80'),each=4),
NumberofPractices = rpois(8,70)
)

You can use labs at the end to make this code work

ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +
geom_bar() +
facet_grid(AgeGroup~ .) +
labs(x="% on OAC", y="Number of Practices",fill = "Age Group")

To produce

Sample Image

How to select a column name with a space in MySQL

Generally the first step is to not do that in the first place, but if this is already done, then you need to resort to properly quoting your column names:

SELECT `Business Name` FROM annoying_table

Usually these sorts of things are created by people who have used something like Microsoft Access and always use a GUI to do their thing.

Renaming dataframe column names which contain a space

You can use the dplyr function rename_with() to rename all columns that match a certain condition (in this case that it contains a space). In this example I replace the space in the column name with an underscore:

library(dplyr)

df <- data.frame(a = 1:2,
b = LETTERS[1:2],
c = 101:102)
names(df) <- c("a", "b b", "c e f")

df %>%
rename_with(~ gsub(" ","_", .x), contains(" "))

How do you deal with blank spaces in column names in SQL Server?

select [Response Status Code], [Client Response Status Code]
from TC_Sessions (NOLOCK)
WHERE StartDate BETWEEN '05-15-2012' AND '06-01-2012'
AND SupplyID = 3367

Wrap the names in square brackets.

It is , however, best to avoid spaces in names if possible. It just creates more work for you down the road...

How do you handle column names having spaces in them when using pd.read_clipboard?

What I do in this situation is that I make all my columns two or more spaces apart, then I use sep='\s\s+' for my delimiter, this way when I do have column headings with a single space such as, Col #3 above it treats it as one column.

         A         B     Col #3
NaN NaN NaN
-0.041158 -0.161571 0.329038
0.238156 0.525878 0.110370
0.606738 0.854177 -0.095147
0.200166 0.385453 0.166235

df = pd.read_clipboard(sep='\s\s+')

You do get this warning, but you can ignore it since it as done it right. Or you could put the engine='python' if your OCD gets the best of you. :)

C:\Program
Files\Anaconda3\lib\site-packages\pandas\io\clipboards.py:63:
ParserWarning: Falling back to the 'python' engine because the 'c'
engine does not support regex separators (separators > 1 char and
different from '\s+' are interpreted as regex); you can avoid this
warning by specifying engine='python'. return
read_table(StringIO(text), sep=sep, **kwargs)

print(df)

A B Col #3
0 NaN NaN NaN
1 -0.041158 -0.161571 0.329038
2 0.238156 0.525878 0.110370
3 0.606738 0.854177 -0.095147
4 0.200166 0.385453 0.166235

How to fix spaces in column names of a data.frame (remove spaces, inject dots)?

UDPDATE 2022 Aug:

df %>% rename_with(make.names)

OLD code was: (still works though)
as of Jan 2021: drplyr solution that is brief and uses no extra libraries is

df %<>% dplyr::rename_all(make.names)

credit goes to commenter.

How to use $ in R to select a column with space in its colname?

By using `` as in

df$`Avg Estimates`         

Select columns with spaced heading in R

We can use backquotes to select those unusual names i.e. column names that doesn't start with letters

subset(df, select = c(height, `80% height`))

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

Also, the dplyr use with specifying df twice is not needed. We can have select function from dplyr

library(dplyr)
df %>%
select(height, `80% height`)

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

It may be also better to remove spaces and append a letter for those column names that start with numbers. clean_names from janitor does

library(janitor)
df %>%
clean_names()


Related Topics



Leave a reply



Submit