Converting unit abbreviations to numbers
- So you want to translate SI unit abbreviations ('K','M',...) into exponents, and thus numerical powers-of-ten.
Given that all units are single-letter, and the exponents are uniformly-spaced powers of 10**3, here's working code that handles 'Kilo'...'Yotta', and any future exponents:
> 10 ** (3*as.integer(regexpr('T', 'KMGTPEY')))
[1] 1e+12
Then just multiply that power-of-ten by the decimal value you have.
- Also, you probably want to detect and handle the 'no-match' case for unknown letter prefixes, otherwise you'd get a nonsensical
-1*3
> unit_to_power <- function(u) {
exp_ <- 10**(as.integer(regexpr(u, 'KMGTPEY')) *3)
return (if(exp_>=0) exp_ else 1)
}
Now if you want to case-insensitive-match both 'k' and 'K' to Kilo (as computer people often write, even though it's technically an abuse of SI), then you'll need to special-case e.g with if-else ladder/expression (SI units are case-sensitive in general, 'M' means 'Mega' but 'm' strictly means 'milli' even if disk-drive users say otherwise; upper-case is conventionally for positive exponents). So for a few prefixes, @DanielV's case-specific code is better.
If you want negative SI prefixes too, use
as.integer(regexpr(u, 'zafpnum@KMGTPEY')-8)
where@
is just some throwaway character to keep uniform spacing, it shouldn't actually get matched. Again if you need to handle non-power-of-10**3 units like 'deci', 'centi', will require special-casing, or the general dict-based approach WeNYoBen uses.base::regexpr
is not vectorized also its performance is bad on big inputs, so if you want to vectorize and get higher-performance usestringr::str_locate
.
Convert a factor column with numbers in k format into numeric without losing any data
First detect which records with a "k".
df$is_k <- grepl("k", df$Likes)
Strip the "k", and then convert to numeric. If the record had a "k" then multiple my 1000, else multiple by 1.
df$Likes_num <- as.numeric(gsub("k", "", df$Likes)) * ifelse(df$is_k, 1000, 1)
Edit
For multiple units, I adapted something I had elsewhere for a more complex problem. This shows the steps and is simple enough, though I am not sure how robust it is.
Function
convert_units <- function(x) {
if (class(x) == "numeric") return(x)
# named vector of scalings (you can add to this)
unit_scale <- c("k" = 1e3, "m" = 1e6)
# clean up some potential nuisances with the input
x_str <- gsub(",", "", trimws(tolower(as.character(x))))
# extract out the letters
unit_char <- gsub("[^a-z]", "", x_str)
# extract out the numbers and convert to numeric
x_num <- as.numeric(gsub("[a-z]", "", x_str), "", x_str)
# develop a vector of multipliers
multiplier <- unit_scale[match(unit_char, names(unit_scale))]
multiplier[is.na(multiplier)] <- 1
# multiply
x_num * multiplier
}
Application
df$Likes2 <- convert_units(df$Likes)
Sample Result
ID Likes Likes2
1 1 99k 99000
2 2 997 997
3 3 15.5k 15500
4 4 9.25k 9250
5 5 575 575
6 6 800 800
7 7 8.5k 8500
8 8 2,400 2400
R: Convert string (digit with K/M/G suffix) to double
This can be approached like this: first, you subset the vector on those values whose last character is K
, using sub
remove the suffix, convert the result to type numeric with as.numeric
, and mutiply it with 10^3
. As a final step, you convert the whole vector to numeric.
x[grepl("K$", x)] <- as.numeric(sub("K$", "", x))*10^3
x <- as.numeric(x)
[1] 10 10000000 10010000
And likewise for the suffix M
Data:
x <- c("10", "10.01K", "20K")
How to replace K for thousands and M for Millions in the same column in R
Supposing you have data that looks like this:
fifa2 <- data.frame(Value = c("€565K", "€5.65M", "€777777"))
you can do this:
library(dplyr)
fifa2 %>%
mutate(Value1 = as.numeric(gsub("[€MK]", "", Value)),
Value1 = ifelse(grepl("K$", Value), Value1 * 1000,
ifelse(grepl("M$", Value), Value1 * 1000000,
Value1)))
Value Value1
1 €565K 565000
2 €5.65M 5650000
3 €777777 777777
Converting number abbreviations (5.2k, 1.7m, etc) into valid integers with PHP
Something like:
switch (strtolower(substr($input, -1))) {
case 'k':
$input*=1000;
break;
// similarly for m, M, b, B.
}
Assuming your data is well-formatted. If not more check would be needed like:
if (!preg_match('/^\d+(?:\.\d+)?[mbk]$/i',$input)) {
// $input is not a valid format.
}
Related Topics
Reasons That Ggplot2 Legend Does Not Appear
R: Ggplot2 Barplot and Error Bar
R: Reshaping Multiple Columns from Long to Wide
Counting the Frequency of an Element in a Data Frame
Adding New Column with Diff() Function When There Is One Less Row in R
Recode Categorical Variable to Binary (0/1)
Collapse Continuous Integer Runs to Strings of Ranges
For the Same Code, Labels (Q1, Median) Appear on One Computer But Don't Appear on Another Computer
How to Change the Color in Geom_Point or Lines in Ggplot
How to Delete a Column by Name in Data.Table
How to Check If CSV File Has a Comma or a Semicolon as Separator