How to read csv file in R where some values contain the percent symbol (%)
There is no "percentage" type in R. So you need to do some post-processing:
DF <- read.table(text="actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%", sep=",", header=TRUE)
DF[,3] <- as.numeric(gsub("%", "",DF[,3]))/100
# actual simulated percent.error
#1 2.1496 8.6066 -3.00
#2 0.9170 8.0266 -7.75
#3 7.9406 0.2152 0.97
#4 4.9637 3.5237 0.29
Reading csv files with R with percentages as X% and varying NA characters
With NAs you don't necessarily need to use a solution involving gsub or some of it's kin. There is an argument na.strings in read.table(), and you can specify several NA strings at the same time. For example, the example table you posted could be read in R with the following command:
test<-read.table("clipboard", header=T, sep="\t", na.strings=c("9", "does not apply"))
That takes the table from the clipboard, and converts both "9" and "does not apply" to NAs in the resulting table:
test
x1 x2 x3
1 1 10% 1
2 2 20% 2
3 3 30% NA
4 NA 40% 4
This works fine, unless some of the columns contain, e.g., "9" as data and others have it meaning NA.
As for the percentage problem, that might be easiest to solve using the gsub method. Another solution to the percentage problem might be to define a new coersion function, and then specify the colClasses argument in read.table()
. Something like this should work:
# New coersion function
setAs("character", "num_pct", function(from) as.numeric(gsub("%", "", from))/100)
# Define column classes for the columns in the table
test<-read.table("clipboard", header=T, sep="\t", na.strings=c("9", "does not apply"),
colClasses=c("character", "num_pct", "character"))
This command now reads in the table with the specified classes for the columns, and converts the percentages in the second column of the table to decimal numbers on the fly.
read.table: percent sign (%) and forward slah (/) in headers replaced by dot (.)
R by default tries to makes sure that the dataframe you are importing have syntactically valid names using check.names
which is TRUE
by default. It does not allow column names with symbols like %
, /
(or other as defined in make.names
).
We can, however, override this behavior using check.names = FALSE
read.table(text = "Subject,Exp1_BSL_SDNN,Exp1_BSL_LF/HF,Exp1_BSL_%LF
s1,123,123,123
s2,123,123,123", sep=",", header=TRUE, check.names = FALSE)
# Subject Exp1_BSL_SDNN Exp1_BSL_LF/HF Exp1_BSL_%LF
#1 s1 123 123 123
#2 s2 123 123 123
Read csv file in R with currency column as numeric
I'm not sure how to read it in directly, but you can modify it once it's in:
> A <- read.csv("~/Desktop/data.csv")
> A
id desc price
1 0 apple $1.00
2 1 banana $2.25
3 2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
id desc price
1 0 apple 1.00
2 1 banana 2.25
3 2 grapes 1.97
> str(A)
'data.frame': 3 obs. of 3 variables:
$ id : int 0 1 2
$ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
$ price: num 1 2.25 1.97
I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...
Reading X%-formatted percentages into R
Here's a dplyr
and readr
solution:
library(dplyr) # Version >= 1.0.0
library(readr)
library(stringr)
data %>%
mutate(across(where(~any(str_detect(.,"%"))), parse_number))
# A tibble: 3 x 3
name count percentage
<chr> <dbl> <dbl>
1 Alice 4 40
2 Bob 10 65
3 Carol 15 15
Feel free to replace any
with all
if you prefer.
A benefit of this approach is it detects columns that have the %
and only parses those. No need to know which columns need to be convereted in advance.
How to read data when some numbers contain commas as thousand separator?
I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub
, I think this is about as neat as I can do:
x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})
Read CSV file up to line with unique marker
Three thoughts:
Use
readLines
(as @user2554330 suggested), find/remove the specific row, filter it, then parse the text vector withread.csv
, the least of the three.before[seq_len(min(head(which(!grepl("^[^- ]+$", before$Total)),1)-1L,nrow(before))),]
; a bit complicated, granted, but it does what you need (assuming that you've already filtered the first 14 rows withskip=
.Use an external script such as
sed -e '1,14d;/^[ -]\+$/{g;q;}
in apipe(...)
-type thing.
Related Topics
Split the Title Onto Multiple Lines
Convert a Date Vector into Julian Day in R
Change Day of the Month in a Date to First Day (01)
Group by and Filter Data Management Using Dplyr
Typeof Returns Integer for Something That Is Clearly a Factor
Collapse Continuous Integer Runs to Strings of Ranges
Set One or More of Coefficients to a Specific Integer
How to Change the Color in Geom_Point or Lines in Ggplot
Count How Many Values in Some Cells of a Row Are Not Na (In R)
How to Check If CSV File Has a Comma or a Semicolon as Separator
Looping Through T.Tests for Data Frame Subsets in R
R Group by Date, and Summarize the Values
Adding Regression Line Per Group with Ggplot2
How Does Cut with Breaks Work in R