Why are Xs added to data frame variable names when using read.csv?
read.table
and read.csv
have a check.names=
argument that you can set to FALSE
.
For example, try it with this input consisting of just a header:
> read.csv(text = "a,1,b")
[1] a X1 b
<0 rows> (or 0-length row.names)
versus
> read.csv(text = "a,1,b", check.names = FALSE)
[1] a 1 b
<0 rows> (or 0-length row.names)
Why am I getting X. in my column names when reading a data frame?
read.csv()
is a wrapper around the more general read.table()
function. That latter function has argument check.names
which is documented as:
check.names: logical. If ‘TRUE’ then the names of the variables in the
data frame are checked to ensure that they are syntactically
valid variable names. If necessary they are adjusted (by
‘make.names’) so that they are, and also to ensure that there
are no duplicates.
If your header contains labels that are not syntactically valid then make.names()
will replace them with a valid name, based upon the invalid name, removing invalid characters and possibly prepending X
:
R> make.names("$Foo")
[1] "X.Foo"
This is documented in ?make.names
:
Details:
A syntactically valid name consists of letters, numbers and the
dot or underline characters and starts with a letter or the dot
not followed by a number. Names such as ‘".2way"’ are not valid,
and neither are the reserved words.
The definition of a _letter_ depends on the current locale, but
only ASCII digits are considered to be digits.
The character ‘"X"’ is prepended if necessary. All invalid
characters are translated to ‘"."’. A missing value is translated
to ‘"NA"’. Names which match R keywords have a dot appended to
them. Duplicated values are altered by ‘make.unique’.
The behaviour you are seeing is entirely consistent with the documented way read.table()
loads in your data. That would suggest that you have syntactically invalid labels in the header row of your CSV file. Note the point above from ?make.names
that what is a letter depends on the locale of your system; The CSV file might include a valid character that your text editor will display but if R is not running in the same locale that character may not be valid there, for example?
I would look at the CSV file and identify any non-ASCII characters in the header line; there are possibly non-visible characters (or escape sequences; \t
?) in the header row also. A lot may be going on between reading in the file with the non-valid names and displaying it in the console which might be masking the non-valid characters, so don't take the fact that it doesn't show anything wrong without check.names
as indicating that the file is OK.
Posting the output of sessionInfo()
would also be useful.
Reading CSV in R creates new columns with X
you can try to select the first three columns after reading the file:
df = df[,c(1:3)]
Avoid that space in column name is replaced with period (.) when using read.csv()
If your set check.names=FALSE
in read.csv
when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.
Prevent variable name getting mangled by read.csv/read.table?
This is a BOM (Byte Order Mark) UTF-8 issue.
To prevent this from happening, 2 options:
- Save your file as UTF-8 without BOM / signature -- or --
- Use
fileEncoding = "UTF-8-BOM"
when usingread.table
orread.csv
Example:
mydata <- read.table(file = "myfile.txt", fileEncoding = "UTF-8-BOM")
R: read.csv adding sub-script X in header
according to @Joshua
read.csv("filename.csv",check.names=FALSE)
R: Why am I getting an extra column titled X.1 in my dataframe after reading my .txt file?
If all other column names are correct, you have probably a trailing \t
in the text file. R tries to include it and gives it the generic column name X.1
.
You could try and read the file first as 'plain text' and remove the trailing \t
and only then use read.csv
:
file_connection <- file("Objects_Population - AllCells.txt")
content <- readLines(file_connection )
close(file_connection)
Now we try to get rid of these trailing \t
(this might need some testing to fit your needs)
sanitized <- gsub("\\t$", "", content)
And then we read this sanitized string as if it was a file (using the argument text
)
df <- read.csv(text=paste0(sanitized, collapse="\n"), sep="\t", skip = 9,header=TRUE, fill = T)
R read.csv Importing Column Names Incorrectly
If you set the argument
check.names = FALSE
in read.csv, then R will not override the names. But these names are not valid in R and they'll have to be handled differently than valid names.
Give column name when read csv file pandas
I'd do it like this:
colnames=['TIME', 'X', 'Y', 'Z']
user1 = pd.read_csv('dataset/1.csv', names=colnames, header=None)
Related Topics
Shared Memory in Parallel Foreach in R
Stylecolorbar Center and Shift Left/Right Dependent on Sign
Rm(List=Ls()) Doesn't Completely Clear the Workspace
R: Using Rgl to Generate 3D Rotatable Plots That Can Be Viewed in a Web Browser
Plotting Multiple Curves Same Graph and Same Scale
Finding Elements That Do Not Overlap Between Two Vectors
Using Grid and Ggplot2 to Create Join Plots Using R
Adding Time to Posixct Object in R
Is There a Reason to Prefer Extractor Functions to Accessing Attributes with $
Annotating Facet Title as Strip Over Facet
Using R to Analyze Balance Sheets and Income Statements
Embedding an R HTMLwidget into Existing Webpage
Sendmailr (Part2): Sending Files as Mail Attachments
Changing Font Size in R Datatables (Dt)