R: Import CSV with column names that contain spaces
Unless you specify check.names=FALSE
, R will convert column names that are not valid variable names (e.g. contain spaces or special characters or start with numbers) into valid variable names, e.g. by replacing spaces with dots. Try names(s_data)
. If you do use check.names=TRUE
, then use single back-quotes (`) to surround the names.
I would also recommend using rename
from the reshape
package (or, these days, dplyr::rename
).
s_data <- read.csv2( file=f_name )
library(reshape)
s_df <- rename(s_data,ID="scada_id",
PlantNo="plant",DateTime="date",Main.status="main_code",
Additional.status="seco_code",MainStatustext="main_text",
AddStatustext="seco_test",Duration="duration")
For what it's worth, the tidyverse tools (i.e. readr::read_csv
) have the opposite default; they don't transform the column names to make them legal R symbols unless you explicitly request it.
Import CSV file with spaces in header using read_csv from readr
You could use make.names
after you read in the data.
df <- data.frame(x=NA)
colnames(df) <- c("This col name has spaces")
colnames(df) <- make.names(colnames(df), unique=TRUE)
It will return column names with periods rather than spaces as separators.
colnames(df)
[1] "This.col.name.has.spaces"
According to the help page make.names
takes a character vector and returns a:
A syntactically valid name consisting of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number
EDIT: Including an example with special characters.
df <- data.frame(x=NA)
colnames(df) <- c("Higher than 80(°F)")
colnames(df) <- make.names(colnames(df), unique=TRUE)
colnames(df)
[1] "Higher.than.80..F."
As you can see make.names
takes 'illegal' characters and replaces them with periods, to prevent any syntax errors/issues when calling an object name directly.
If you want to remove repeating .
's then add-
colnames(df) <- gsub('(\\.)\\1+', '\\1', colnames(df))
colnames(df)
[1] "Higher.than.80.F."
Avoid that space in column name is replaced with period (.) when using read.csv()
If your set check.names=FALSE
in read.csv
when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.
Read a csv file in sparkR where columns have spaces
Following worked for me
df = collect(df)
colnames_df<-colnames(df)
colnames_df<-gsub(" ","_",colnames_df)
colnames(df)<-colnames_df
df <- createDataFrame(sqlContext, df)
printSchema(df)
Here we need to locally collect the data first, which will convert spark data frame to normal R data frame. I am sceptical whether this is a good solution as I don't want to call collect. However I investigated and found that even to use ggplot libraries we need to convert this into a local data frame
Read dataframe with different number of spaces in between columns
I think you want to use read.fwf
- it allows the reading of data with fixed width columns which is what you appear to have.
Related Topics
R Partial Reshape Data from Long to Wide
Using If Else Conditions on Vectors
R Random Forests Variable Importance
How to Manually Set Colors in a Bar Chart
Writing R Function with If Enviornment
Correct Positioning of Multiple Significance Labels on Dodged Groups in Ggplot
Parse String with Additional Characters in Format to Date
R Aggregate Data in One Column Based on 2 Other Columns
Weird As.Posixct Behavior Depending on Daylight Savings Time
Merge Two Dataframes If Timestamp of X Is Within Time Interval of Y
How to Produce Time Series for Each Row of a Data Frame with an Unnamed First Column
Separate Columns with Constant Numbers and Condense Them to One Row in R Data.Frame
R: How to Get the Last Element from Each Group
Download Attachment from an Outlook Email Using R
Code Organisation in R Package Development
R Formatting a Date from a Character Mmm Dd, Yyyy to Class Date