R: Import CSV with Column Names That Contain Spaces

R: Import CSV with column names that contain spaces

Unless you specify check.names=FALSE, R will convert column names that are not valid variable names (e.g. contain spaces or special characters or start with numbers) into valid variable names, e.g. by replacing spaces with dots. Try names(s_data). If you do use check.names=TRUE, then use single back-quotes (`) to surround the names.

I would also recommend using rename from the reshape package (or, these days, dplyr::rename).

s_data <- read.csv2( file=f_name )
library(reshape)
s_df <- rename(s_data,ID="scada_id",
PlantNo="plant",DateTime="date",Main.status="main_code",
Additional.status="seco_code",MainStatustext="main_text",
AddStatustext="seco_test",Duration="duration")

For what it's worth, the tidyverse tools (i.e. readr::read_csv) have the opposite default; they don't transform the column names to make them legal R symbols unless you explicitly request it.

Import CSV file with spaces in header using read_csv from readr

You could use make.names after you read in the data.

df <- data.frame(x=NA)
colnames(df) <- c("This col name has spaces")
colnames(df) <- make.names(colnames(df), unique=TRUE)

It will return column names with periods rather than spaces as separators.

colnames(df)
[1] "This.col.name.has.spaces"

According to the help page make.names takes a character vector and returns a:

A syntactically valid name consisting of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number

EDIT: Including an example with special characters.

df <- data.frame(x=NA)
colnames(df) <- c("Higher than 80(°F)")
colnames(df) <- make.names(colnames(df), unique=TRUE)

colnames(df)
[1] "Higher.than.80..F."

As you can see make.names takes 'illegal' characters and replaces them with periods, to prevent any syntax errors/issues when calling an object name directly.

If you want to remove repeating .'s then add-

colnames(df) <- gsub('(\\.)\\1+', '\\1', colnames(df))
colnames(df)
[1] "Higher.than.80.F."

Avoid that space in column name is replaced with period (.) when using read.csv()

If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.

Read a csv file in sparkR where columns have spaces

Following worked for me

df = collect(df)
colnames_df<-colnames(df)
colnames_df<-gsub(" ","_",colnames_df)
colnames(df)<-colnames_df
df <- createDataFrame(sqlContext, df)
printSchema(df)

Here we need to locally collect the data first, which will convert spark data frame to normal R data frame. I am sceptical whether this is a good solution as I don't want to call collect. However I investigated and found that even to use ggplot libraries we need to convert this into a local data frame

Read dataframe with different number of spaces in between columns

I think you want to use read.fwf - it allows the reading of data with fixed width columns which is what you appear to have.



Related Topics



Leave a reply



Submit