Read csv file in R with currency column as numeric
I'm not sure how to read it in directly, but you can modify it once it's in:
> A <- read.csv("~/Desktop/data.csv")
> A
id desc price
1 0 apple $1.00
2 1 banana $2.25
3 2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
id desc price
1 0 apple 1.00
2 1 banana 2.25
3 2 grapes 1.97
> str(A)
'data.frame': 3 obs. of 3 variables:
$ id : int 0 1 2
$ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
$ price: num 1 2.25 1.97
I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...
Importing csv file into R - numeric values read as characters
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE)
. [If that does not work, please take a look at ?read.table
(which read.csv
wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
How can I strip dollar signs ($) from a data frame in R?
If you need to only remove the $
and do not want to change the class
of the columns.
indx <- sapply(data, is.factor)
data[indx] <- lapply(data[indx], function(x)
as.factor(gsub("\\$", "", x)))
If you need numeric
columns, you can strip out the ,
as well (contributed by @David
Arenburg) and convert to numeric
by as.numeric
data[indx] <- lapply(data[indx], function(x) as.numeric(gsub("[,$]", "", x)))
You can wrap this in a function
f1 <- function(dat, pat="[$]", Class="factor"){
indx <- sapply(dat, is.factor)
if(Class=="factor"){
dat[indx] <- lapply(dat[indx], function(x) as.factor(gsub(pat, "", x)))
}
else {
dat[indx] <- lapply(dat[indx], function(x) as.numeric(gsub(pat, "", x)))
}
dat
}
f1(data)
f1(data, pat="[,$]", "numeric")
data
set.seed(24)
data <- data.frame(Year=1:6, Prog.Cost= sample(c("-$3,3333", "$0"),
6, replace=TRUE), Total.Benefits= sample(c("$2,155","$2,418",
"$2,312"), 6, replace=TRUE))
Parse currency values from CSV, convert numerical suffixes for Million and Billion
We could use gsubfn
to replace the 'B', 'M' with 'e+9', 'e+6' and convert to numeric
(as.numeric
).
is.na(v1) <- v1=='N/A'
options(scipen=999)
library(gsubfn)
as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1))
#[1] 1200000 3100000000 NA
EDIT: Modified based on @nicola's suggestion
data
v1 <- c('$1.2M', '$3.1B', 'N/A')
How to read in numbers with a comma as decimal separator?
When you check ?read.table
you will probably find all the answer that you need.
There are two issues with (continental) European csv files:
- What does the
c
in csv stand for? For standard csv this is a,
, for European csv this is a;
sep
is the corresponding argument inread.table
- What is the character for the decimal point? For standard csv this is a
.
, for European csv this is a,
dec
is the corresponding argument inread.table
To read standard csv use read.csv
, to read European csv use read.csv2
. These two functions are just wrappers to read.table
that set the appropriate arguments.
If your file does not follow either of these standards set the arguments manually.
How to read data when some numbers contain commas as thousand separator?
I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub
, I think this is about as neat as I can do:
x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})
R - identify which columns contain currency data $
Using dplyr
and stringr
packages, you can use mutate_if
to identify columns that have any string starting with a $
and then change the accordingly.
library(dplyr)
library(stringr)
emp_data %>%
mutate_if(~any(str_detect(., '^\\$'), na.rm = TRUE),
~as.numeric(str_replace_all(., '[$,]', '')))
Related Topics
Counting Non Nas in a Data Frame; Getting Answer as a Vector
How to Read the Header But Also Skip Lines - Read.Table()
Identify Records in Data Frame a Not Contained in Data Frame B
Combining Duplicated Rows in R and Adding New Column Containing Ids of Duplicates
Dplyr Piping Data - Difference Between '.' and '.X'
Split a Vector by Its Sequences
Add Column with Counts of Another
Changing Font in PDF Produced by Rmarkdown
Special Characters and Superscripts on Plot Axis Titles
R Data.Table Apply Function to Rows Using Columns as Arguments
Rselenium: Server Signals Port Is Already in Use
Shinydashboard Some Font Awesome Icons Not Working
Convert Factor to Date/Time in R
Pie Charts in Ggplot2 with Variable Pie Sizes
Adding New Column with Conditional Values Using Ifelse
Matching a Sequence in a Larger Vector
Reshape a Dataframe to Long Format with Multiple Sets of Measure Columns