With the R package xlsx, is it possible to set na.strings when reading an Excel file?
No this is not possible for the simple reason that read.xlsx
doesn't take care of special missing values. But this can be a possible enhancement for getCellvalue
function.
You can either replace missing values using something like :
Data[Data=="no info"] <- NA
Or, transform your data to a csv and use read.csv
, or as commented use another package that take care of missing values.
Edit use XLConnect package:
The more performant XLConnect
package takes care of missing values using setMissingValue
function. Here the equivalent code can be written as:
library("XLConnect")
wb <- loadWorkbook("my file.xlsx")
setMissingValue(wb, value = "no info")
readWorksheet(wb, sheet = "MyData")
Importing -100 as NA
This appears to be a bug in openxlsx::read.xlsx
. I created a small .xlsx
document with two columns:
Then tried reading it with read.xlsx
. The na.strings
argument doesn't seem to work very well. It omits the last row with two "N/A"
values (not desired) and keeps the "-99"
values as-is rather than replacing them with NA
as desired:
library(openxlsx)
read.xlsx("test.xlsx", na.strings = c("N/A", "-99"))
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3
# for comparison, without na.strings
read.xlsx("test.xlsx")
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3
# 4 N/A N/A
The readxl
package does much better:
library(readxl)
read_excel("test.xlsx", na = "-99")
# # A tibble: 4 x 2
# num char
# <dbl> <chr>
# 1 1 hello
# 2 NA NA
# 3 3 3
# 4 NA NA
This was using a freshly installed openxlsx
version 4.1.0, and readxl
version 1.2.0 (current version is 1.3.0).
The openxlsx
github page has an open issue regarding na.strings
. I added this example. You can track/comment on the issue here.
Import with xlsx package in R gives NA, NA and empty entries, can´t delete NA values
It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).
When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx
does import it as numeric
vector instead of a factor
and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,]
to remove the rows that contain NA's.
Can't tell you what exactly is going on here, but the above solution seems to work.
Using read_excel(na = ) how do you specify more than one NA character string?
As you gathered, read_excel
does not accept more than one value. Consider using gdata::read.xls
instead.
gdata::read.xls("file.xlsx", na.strings = c("N/A", "n/a"))
Edit: Note that you need to have perl installed to run this. If you're on windows you may need to specify something like perl="C:/Perl/bin/perl.exe"
in the call to read.xls
.
Edit 2: As @r2evans suggested in the comments, the development version of readxl
supports multiple na values:
devtools::install_github("tidyverse/readxl")
readxl::read_excel(path = "file.xlsx", na = c("N/A", "n/a"))
write.xlsx in R giving incorrect NA in cell
This can be done either by using R or Excel
Using the xlsx package, you can leave NA values as blank cells
write.xlsx(y, file = "test.xlsx", showNA=FALSE)
Using excel you can ignore the NA values. Remember to press ctrl +shift+enter
{=SUM(IF(ISNA(A3:D3),0,A3:D3))}
R read xlsx, NA not as characters
The following works for me
data <- read.xlsx(file = "test.xlsx", header = TRUE)
data[data == "NA"] <- NA
data type with read.xlsx in R
You can also try
df[]=lapply(df,type.convert,as.is=TRUE)
type.convert
will attempt to find the appropriate class of each column and convert accordingly. Without the option as.is=TRUE
it will convert the character
columns to factors.
It also handles NA
strings. The default option na.strings="NA"
should be ok for you.
Related Topics
Converting R Matrix into Latex Matrix in the Math or Equation Environment
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
Collapse Consecutive Runs of Numbers to a String of Ranges
How to Programmatically Darken the Color Given Rgb Values
Can .Sd Be Viewed from a Browser Within [.Data.Table()
Handling Latex Backslashes in Xtable
Control Transparency of Smoother and Confidence Interval
Different Results with Randomforest() and Caret's Randomforest (Method = "Rf")
How to Rotate the X-Axis Labels 90 Degrees in Levelplot
Combine Lists While Overriding Values with Same Name in R
Set Upper Limit in Ggplot to Include Label Greater Than the Maximum Value
Passing by Reference a Data.Frame and Updating It with Rcpp
Finding Elements of Lists in R
Error in Unserialize(Socklist[[N]]):Error Reading from Connection on Unix
How to Find the First and Last Occurrences of an Element in a Data.Frame