Read CSV File in R with Currency Column as Numeric

Read csv file in R with currency column as numeric

I'm not sure how to read it in directly, but you can modify it once it's in:

> A <- read.csv("~/Desktop/data.csv")
> A
id desc price
1 0 apple $1.00
2 1 banana $2.25
3 2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
id desc price
1 0 apple 1.00
2 1 banana 2.25
3 2 grapes 1.97
> str(A)
'data.frame': 3 obs. of 3 variables:
$ id : int 0 1 2
$ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
$ price: num 1 2.25 1.97

I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...

Importing csv file into R - numeric values read as characters

Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.

Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue].

For example:

   delim = ","  # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)

Then, let's say your numeric columns is column 4

   myDataFrame[, 4]  <- as.numeric(myDataFrame[, 4])  # you can also refer to the column by "itsName"


Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out

How can I strip dollar signs ($) from a data frame in R?

If you need to only remove the $ and do not want to change the class of the columns.

indx <- sapply(data, is.factor) 
data[indx] <- lapply(data[indx], function(x)
as.factor(gsub("\\$", "", x)))

If you need numeric columns, you can strip out the , as well (contributed by @David
Arenburg) and convert to numeric by as.numeric

data[indx] <- lapply(data[indx], function(x) as.numeric(gsub("[,$]", "", x)))

You can wrap this in a function

f1 <- function(dat, pat="[$]", Class="factor"){
indx <- sapply(dat, is.factor)
if(Class=="factor"){
dat[indx] <- lapply(dat[indx], function(x) as.factor(gsub(pat, "", x)))
}
else {
dat[indx] <- lapply(dat[indx], function(x) as.numeric(gsub(pat, "", x)))
}
dat
}

f1(data)
f1(data, pat="[,$]", "numeric")

data

set.seed(24)
data <- data.frame(Year=1:6, Prog.Cost= sample(c("-$3,3333", "$0"),
6, replace=TRUE), Total.Benefits= sample(c("$2,155","$2,418",
"$2,312"), 6, replace=TRUE))

Parse currency values from CSV, convert numerical suffixes for Million and Billion

We could use gsubfn to replace the 'B', 'M' with 'e+9', 'e+6' and convert to numeric (as.numeric).

is.na(v1) <- v1=='N/A'
options(scipen=999)
library(gsubfn)
as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1))
#[1] 1200000 3100000000 NA

EDIT: Modified based on @nicola's suggestion

data

v1 <- c('$1.2M', '$3.1B', 'N/A')

How to read in numbers with a comma as decimal separator?

When you check ?read.table you will probably find all the answer that you need.

There are two issues with (continental) European csv files:

  1. What does the c in csv stand for? For standard csv this is a ,, for European csv this is a ;
    sep is the corresponding argument in read.table
  2. What is the character for the decimal point? For standard csv this is a ., for European csv this is a ,
    dec is the corresponding argument in read.table

To read standard csv use read.csv, to read European csv use read.csv2. These two functions are just wrappers to read.table that set the appropriate arguments.

If your file does not follow either of these standards set the arguments manually.

How to read data when some numbers contain commas as thousand separator?

I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub, I think this is about as neat as I can do:

x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})

R - identify which columns contain currency data $

Using dplyr and stringr packages, you can use mutate_if to identify columns that have any string starting with a $ and then change the accordingly.

library(dplyr)
library(stringr)

emp_data %>%
mutate_if(~any(str_detect(., '^\\$'), na.rm = TRUE),
~as.numeric(str_replace_all(., '[$,]', '')))


Related Topics



Leave a reply



Submit