With the R Package Xlsx, How to Set Na.Strings When Reading an Excel File

With the R package xlsx, is it possible to set na.strings when reading an Excel file?

No this is not possible for the simple reason that read.xlsx doesn't take care of special missing values. But this can be a possible enhancement for getCellvalue function.

You can either replace missing values using something like :

 Data[Data=="no info"] <- NA

Or, transform your data to a csv and use read.csv , or as commented use another package that take care of missing values.

Edit use XLConnect package:

The more performant XLConnect package takes care of missing values using setMissingValue function. Here the equivalent code can be written as:

library("XLConnect")
wb <- loadWorkbook("my file.xlsx")
setMissingValue(wb, value = "no info")
readWorksheet(wb, sheet = "MyData")

Importing -100 as NA

This appears to be a bug in openxlsx::read.xlsx. I created a small .xlsx document with two columns:

Sample Image

Then tried reading it with read.xlsx. The na.strings argument doesn't seem to work very well. It omits the last row with two "N/A" values (not desired) and keeps the "-99" values as-is rather than replacing them with NA as desired:

library(openxlsx)
read.xlsx("test.xlsx", na.strings = c("N/A", "-99"))
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3

# for comparison, without na.strings
read.xlsx("test.xlsx")
# num char
# 1 1 hello
# 2 -99 -99
# 3 3 3
# 4 N/A N/A

The readxl package does much better:

library(readxl)
read_excel("test.xlsx", na = "-99")
# # A tibble: 4 x 2
# num char
# <dbl> <chr>
# 1 1 hello
# 2 NA NA
# 3 3 3
# 4 NA NA

This was using a freshly installed openxlsx version 4.1.0, and readxl version 1.2.0 (current version is 1.3.0).


The openxlsx github page has an open issue regarding na.strings. I added this example. You can track/comment on the issue here.

Import with xlsx package in R gives NA, NA and empty entries, can´t delete NA values

It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).

When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx does import it as numeric vector instead of a factor and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,] to remove the rows that contain NA's.

Can't tell you what exactly is going on here, but the above solution seems to work.

Using read_excel(na = ) how do you specify more than one NA character string?

As you gathered, read_excel does not accept more than one value. Consider using gdata::read.xls instead.

gdata::read.xls("file.xlsx", na.strings = c("N/A", "n/a"))

Edit: Note that you need to have perl installed to run this. If you're on windows you may need to specify something like perl="C:/Perl/bin/perl.exe" in the call to read.xls.

Edit 2: As @r2evans suggested in the comments, the development version of readxl supports multiple na values:

devtools::install_github("tidyverse/readxl")
readxl::read_excel(path = "file.xlsx", na = c("N/A", "n/a"))

write.xlsx in R giving incorrect NA in cell

This can be done either by using R or Excel

Using the xlsx package, you can leave NA values as blank cells

 write.xlsx(y, file = "test.xlsx", showNA=FALSE)

Using excel you can ignore the NA values. Remember to press ctrl +shift+enter

{=SUM(IF(ISNA(A3:D3),0,A3:D3))}

R read xlsx, NA not as characters

The following works for me

data <- read.xlsx(file = "test.xlsx", header = TRUE)
data[data == "NA"] <- NA

data type with read.xlsx in R

You can also try

df[]=lapply(df,type.convert,as.is=TRUE)

type.convert will attempt to find the appropriate class of each column and convert accordingly. Without the option as.is=TRUE it will convert the character columns to factors.
It also handles NA strings. The default option na.strings="NA" should be ok for you.



Related Topics



Leave a reply



Submit