Convert a Character Vector of Mixed Numbers, Fractions, and Integers to Numeric

Convert a character vector of mixed numbers, fractions, and integers to numeric

Anything less "hackish" will have to parse your inputs and match them to a number of pre-defined patterns. I came up with this:

mixedToFloat <- function(x){
is.integer <- grepl("^\\d+$", x)
is.fraction <- grepl("^\\d+\\/\\d+$", x)
is.mixed <- grepl("^\\d+ \\d+\\/\\d+$", x)
stopifnot(all(is.integer | is.fraction | is.mixed))

numbers <- strsplit(x, "[ /]")

ifelse(is.integer, as.numeric(sapply(numbers, `[`, 1)),
ifelse(is.fraction, as.numeric(sapply(numbers, `[`, 1)) /
as.numeric(sapply(numbers, `[`, 2)),
as.numeric(sapply(numbers, `[`, 1)) +
as.numeric(sapply(numbers, `[`, 2)) /
as.numeric(sapply(numbers, `[`, 3))))
}

mixedToFloat(c('1 1/2', '2 3/4', '2/3', '11 1/4', '1'))
# [1] 1.5000000 2.7500000 0.6666667 11.2500000 1.0000000

R - Converting Fractions in Text to Numeric

You can try to transform the unicode encoding to ASCII directly when reading the XML using a special return function:

library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

You can then use @Metrics' suggestion to convert it to numbers.

You could do for example, using @G. Grothendieck's function from this post clean up the Arms data:

library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
x[1] + x[2] / x[3]
}

url <- paste("http://mockdraftable.com/players/2014/", sep = "")

combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
"Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad",
"Cone3", "ShortShuttle20")

sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)

#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875

There might be some encoding issues depending on your machine (see the comments)

R: Convert Fraction to Decimal within a String

An option would be gsubfn

gsubfn("(\\d+) (\\d+)", ~ as.numeric(x) + as.numeric(y), 
gsubfn("(\\d+)/(\\d+)", ~ as.numeric(x)/as.numeric(y), ef))

Convert a character vector of mixed numbers, fractions, and integers to numeric

Anything less "hackish" will have to parse your inputs and match them to a number of pre-defined patterns. I came up with this:

mixedToFloat <- function(x){
is.integer <- grepl("^\\d+$", x)
is.fraction <- grepl("^\\d+\\/\\d+$", x)
is.mixed <- grepl("^\\d+ \\d+\\/\\d+$", x)
stopifnot(all(is.integer | is.fraction | is.mixed))

numbers <- strsplit(x, "[ /]")

ifelse(is.integer, as.numeric(sapply(numbers, `[`, 1)),
ifelse(is.fraction, as.numeric(sapply(numbers, `[`, 1)) /
as.numeric(sapply(numbers, `[`, 2)),
as.numeric(sapply(numbers, `[`, 1)) +
as.numeric(sapply(numbers, `[`, 2)) /
as.numeric(sapply(numbers, `[`, 3))))
}

mixedToFloat(c('1 1/2', '2 3/4', '2/3', '11 1/4', '1'))
# [1] 1.5000000 2.7500000 0.6666667 11.2500000 1.0000000

How to convert a symmetrical character matrix to a numerical data frame?

If you want the fractions represented as numeric values you can use eval together with parse (as e.g. the link stated that @SymbolixAU gave you).

Here is a matrix with numeric entries:

MYmatrix02 <- matrix(sapply(MYmatrix, function(x) eval(parse(text = x))),
nrow = nrow(MYmatrix), dimnames = dimnames(MYmatrix))

> MYmatrix02
t534 t535 t830
t534 0.0 0.2 0.2
t535 0.2 0.0 0.3
t830 0.2 0.3 0.0

Or if you want a data frame:

MYdataframe <- as.data.frame(MYmatrix02)

Extract numeric part of strings of mixed numbers and characters in R

Using gsub or sub you can do this :

 gsub('.*-([0-9]+).*','\\1','Ab_Cd-001234.txt')
"001234"

you can use regexpr with regmatches

m <- gregexpr('[0-9]+','Ab_Cd-001234.txt')
regmatches('Ab_Cd-001234.txt',m)
"001234"

EDIT the 2 methods are vectorized and works for a vector of strings.

x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
sub('.*-([0-9]+).*','\\1',x)
"001234" "001234"

m <- gregexpr('[0-9]+',x)
> regmatches(x,m)
[[1]]
[1] "001234"

[[2]]
[1] "001234"

r - data.table 1.10.0 - why does a named column index value not work while a integer column index value works without with = FALSE

The long note 3 in v1.9.8 NEWS starts :

When j contains no unquoted variable names (whether column names or not), with= is now automatically set to FALSE. Thus ...

But your j does contain an unquoted variable name. In fact, it is solely an unquoted variable name. So that item does not apply to it.

That's what the options(datatable.WhenJisSymbolThenCallingScope=TRUE) was about so you could try out the new feature going forward. Please read that same NEWS item about that again. If you set that option, it will work as you expected it to.

HOWEVER please don't. Because yesterday I changed it and in development that option has now gone. A migration timeline is no longer needed. The new strategy needs no code changes and has no breakage. Please see the new notes in the latest development NEWS for v1.10.1. I won't copy them here to save duplication.

So going forward, when j is a symbol (i.e. an unquoted variable name) you either still need with=FALSE :

water_nonair[2, water_nonair_column, with=FALSE]

or you can use the new .. prefix from v1.10.1 added yesterday :

water_nonair[2, ..water_nonair_column]

Otherwise, if j is a symbol it must be a column name for safety, consistency and backwards compatibility. If not, you'll now get the new more helpful error message :

DT = data.table(a=1:3, b=4:6)
myCols = "b"
DT[,myCols]
Error in `[.data.table`(DT, , myCols) :
j (the 2nd argument inside [...]) is a single symbol but column name
'myCols' is not found. Perhaps you intended DT[,..myCols] or
DT[,myCols,with=FALSE]. This difference to data.frame is deliberate
and explained in FAQ 1.1.

As mentioned in NEWS, I reran all 313 CRAN and Bioconductor packages that use data.table against data.table v1.10.1 and 2 of them do break with this change. But that is what we want because they do have a bug (the value of j in calling scope is being returned literally which cannot be what was intended). I've informed their maintainers. This is exactly what we wanted to reveal and improve. The other 311 packages all pass with this change. It doesn't rely on test coverage (which is weak for many packages). The new error happens when j is a symbol that isn't a column, whether there's a test for the result or not.

Allow User Input to Just Fractions and Integers (but Not Mixed Numbers) in HTML?

You can use this filter: