Convert a character vector of mixed numbers, fractions, and integers to numeric
Anything less "hackish" will have to parse your inputs and match them to a number of pre-defined patterns. I came up with this:
mixedToFloat <- function(x){
is.integer <- grepl("^\\d+$", x)
is.fraction <- grepl("^\\d+\\/\\d+$", x)
is.mixed <- grepl("^\\d+ \\d+\\/\\d+$", x)
stopifnot(all(is.integer | is.fraction | is.mixed))
numbers <- strsplit(x, "[ /]")
ifelse(is.integer, as.numeric(sapply(numbers, `[`, 1)),
ifelse(is.fraction, as.numeric(sapply(numbers, `[`, 1)) /
as.numeric(sapply(numbers, `[`, 2)),
as.numeric(sapply(numbers, `[`, 1)) +
as.numeric(sapply(numbers, `[`, 2)) /
as.numeric(sapply(numbers, `[`, 3))))
}
mixedToFloat(c('1 1/2', '2 3/4', '2/3', '11 1/4', '1'))
# [1] 1.5000000 2.7500000 0.6666667 11.2500000 1.0000000
R - Converting Fractions in Text to Numeric
You can try to transform the unicode encoding to ASCII directly when reading the XML using a special return function:
library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
You can then use @Metrics' suggestion to convert it to numbers.
You could do for example, using @G. Grothendieck's function from this post clean up the Arms
data:
library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
x[1] + x[2] / x[3]
}
url <- paste("http://mockdraftable.com/players/2014/", sep = "")
combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
"Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad",
"Cone3", "ShortShuttle20")
sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)
#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875
There might be some encoding issues depending on your machine (see the comments)
R: Convert Fraction to Decimal within a String
An option would be gsubfn
gsubfn("(\\d+) (\\d+)", ~ as.numeric(x) + as.numeric(y),
gsubfn("(\\d+)/(\\d+)", ~ as.numeric(x)/as.numeric(y), ef))
Convert a character vector of mixed numbers, fractions, and integers to numeric
Anything less "hackish" will have to parse your inputs and match them to a number of pre-defined patterns. I came up with this:
mixedToFloat <- function(x){
is.integer <- grepl("^\\d+$", x)
is.fraction <- grepl("^\\d+\\/\\d+$", x)
is.mixed <- grepl("^\\d+ \\d+\\/\\d+$", x)
stopifnot(all(is.integer | is.fraction | is.mixed))
numbers <- strsplit(x, "[ /]")
ifelse(is.integer, as.numeric(sapply(numbers, `[`, 1)),
ifelse(is.fraction, as.numeric(sapply(numbers, `[`, 1)) /
as.numeric(sapply(numbers, `[`, 2)),
as.numeric(sapply(numbers, `[`, 1)) +
as.numeric(sapply(numbers, `[`, 2)) /
as.numeric(sapply(numbers, `[`, 3))))
}
mixedToFloat(c('1 1/2', '2 3/4', '2/3', '11 1/4', '1'))
# [1] 1.5000000 2.7500000 0.6666667 11.2500000 1.0000000
How to convert a symmetrical character matrix to a numerical data frame?
If you want the fractions represented as numeric values you can use eval
together with parse
(as e.g. the link stated that @SymbolixAU gave you).
Here is a matrix with numeric entries:
MYmatrix02 <- matrix(sapply(MYmatrix, function(x) eval(parse(text = x))),
nrow = nrow(MYmatrix), dimnames = dimnames(MYmatrix))
> MYmatrix02
t534 t535 t830
t534 0.0 0.2 0.2
t535 0.2 0.0 0.3
t830 0.2 0.3 0.0
Or if you want a data frame:
MYdataframe <- as.data.frame(MYmatrix02)
Extract numeric part of strings of mixed numbers and characters in R
Using gsub
or sub
you can do this :
gsub('.*-([0-9]+).*','\\1','Ab_Cd-001234.txt')
"001234"
you can use regexpr
with regmatches
m <- gregexpr('[0-9]+','Ab_Cd-001234.txt')
regmatches('Ab_Cd-001234.txt',m)
"001234"
EDIT the 2 methods are vectorized and works for a vector of strings.
x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
sub('.*-([0-9]+).*','\\1',x)
"001234" "001234"
m <- gregexpr('[0-9]+',x)
> regmatches(x,m)
[[1]]
[1] "001234"
[[2]]
[1] "001234"
r - data.table 1.10.0 - why does a named column index value not work while a integer column index value works without with = FALSE
The long note 3 in v1.9.8 NEWS starts :
When j contains no unquoted variable names (whether column names or not),
with=
is now automatically set toFALSE
. Thus ...
But your j
does contain an unquoted variable name. In fact, it is solely an unquoted variable name. So that item does not apply to it.
That's what the options(datatable.WhenJisSymbolThenCallingScope=TRUE)
was about so you could try out the new feature going forward. Please read that same NEWS item about that again. If you set that option, it will work as you expected it to.
HOWEVER please don't. Because yesterday I changed it and in development that option has now gone. A migration timeline is no longer needed. The new strategy needs no code changes and has no breakage. Please see the new notes in the latest development NEWS for v1.10.1. I won't copy them here to save duplication.
So going forward, when j
is a symbol (i.e. an unquoted variable name) you either still need with=FALSE
:
water_nonair[2, water_nonair_column, with=FALSE]
or you can use the new ..
prefix from v1.10.1 added yesterday :
water_nonair[2, ..water_nonair_column]
Otherwise, if j
is a symbol it must be a column name for safety, consistency and backwards compatibility. If not, you'll now get the new more helpful error message :
DT = data.table(a=1:3, b=4:6)
myCols = "b"
DT[,myCols]
Error in `[.data.table`(DT, , myCols) :
j (the 2nd argument inside [...]) is a single symbol but column name
'myCols' is not found. Perhaps you intended DT[,..myCols] or
DT[,myCols,with=FALSE]. This difference to data.frame is deliberate
and explained in FAQ 1.1.
As mentioned in NEWS, I reran all 313 CRAN and Bioconductor packages that use data.table against data.table v1.10.1 and 2 of them do break with this change. But that is what we want because they do have a bug (the value of j
in calling scope is being returned literally which cannot be what was intended). I've informed their maintainers. This is exactly what we wanted to reveal and improve. The other 311 packages all pass with this change. It doesn't rely on test coverage (which is weak for many packages). The new error happens when j
is a symbol that isn't a column, whether there's a test for the result or not.
Allow User Input to Just Fractions and Integers (but Not Mixed Numbers) in HTML?
You can use this filter:
<form>
<input type="text" oninput="this.value = this.value.replace(/^(-?(?:\d+(?:\/\d*)?)?).*$/, '$1')" />
</form>
Related Topics
R Ggplot2: Labelling a Horizontal Line on the Y Axis with a Numeric Value
Split/Subset a Data Frame by Factors in One Column
Knitr: How to Prevent Text Wrapping in Output
Control the Height in Fluidrow in R Shiny
How to Install a Package from a Download Zip File
How to Delete Rows from a Data.Frame, Based on an External List, Using R
How to Return Number of Decimal Places in R
One-Hot Encoding in [R] | Categorical to Dummy Variables
Use Ggpairs to Create This Plot
R: Lm() Result Differs When Using 'Weights' Argument and When Using Manually Reweighted Data
How to Redirect Console Output to a Variable
How to Add a Factor Column to Dataframe Based on a Conditional Statement from Another Column
R - When Trying to Install Package: Internetopenurl Failed
Difference Between Rbind() and Bind_Rows() in R
Changing Million/Billion Abbreviations into Actual Numbers? Ie. 5.12M -> 5,120,000
How to Add Multiple Captions in Ggplot2 Outside of the Main Graph Area