Converting Unit Abbreviations to Numbers

Converting unit abbreviations to numbers


  • So you want to translate SI unit abbreviations ('K','M',...) into exponents, and thus numerical powers-of-ten.
    Given that all units are single-letter, and the exponents are uniformly-spaced powers of 10**3, here's working code that handles 'Kilo'...'Yotta', and any future exponents:
    > 10 ** (3*as.integer(regexpr('T', 'KMGTPEY')))
[1] 1e+12

Then just multiply that power-of-ten by the decimal value you have.

  • Also, you probably want to detect and handle the 'no-match' case for unknown letter prefixes, otherwise you'd get a nonsensical -1*3
    > unit_to_power <- function(u) {
exp_ <- 10**(as.integer(regexpr(u, 'KMGTPEY')) *3)
return (if(exp_>=0) exp_ else 1)
}
  • Now if you want to case-insensitive-match both 'k' and 'K' to Kilo (as computer people often write, even though it's technically an abuse of SI), then you'll need to special-case e.g with if-else ladder/expression (SI units are case-sensitive in general, 'M' means 'Mega' but 'm' strictly means 'milli' even if disk-drive users say otherwise; upper-case is conventionally for positive exponents). So for a few prefixes, @DanielV's case-specific code is better.

  • If you want negative SI prefixes too, use as.integer(regexpr(u, 'zafpnum@KMGTPEY')-8) where @ is just some throwaway character to keep uniform spacing, it shouldn't actually get matched. Again if you need to handle non-power-of-10**3 units like 'deci', 'centi', will require special-casing, or the general dict-based approach WeNYoBen uses.

  • base::regexpr is not vectorized also its performance is bad on big inputs, so if you want to vectorize and get higher-performance use stringr::str_locate.

Convert a factor column with numbers in k format into numeric without losing any data

First detect which records with a "k".

df$is_k <- grepl("k", df$Likes)

Strip the "k", and then convert to numeric. If the record had a "k" then multiple my 1000, else multiple by 1.

df$Likes_num <- as.numeric(gsub("k", "", df$Likes)) * ifelse(df$is_k, 1000, 1)


Edit

For multiple units, I adapted something I had elsewhere for a more complex problem. This shows the steps and is simple enough, though I am not sure how robust it is.

Function

convert_units <- function(x) {

if (class(x) == "numeric") return(x)

# named vector of scalings (you can add to this)
unit_scale <- c("k" = 1e3, "m" = 1e6)

# clean up some potential nuisances with the input
x_str <- gsub(",", "", trimws(tolower(as.character(x))))

# extract out the letters
unit_char <- gsub("[^a-z]", "", x_str)

# extract out the numbers and convert to numeric
x_num <- as.numeric(gsub("[a-z]", "", x_str), "", x_str)

# develop a vector of multipliers
multiplier <- unit_scale[match(unit_char, names(unit_scale))]
multiplier[is.na(multiplier)] <- 1

# multiply
x_num * multiplier
}

Application

df$Likes2 <- convert_units(df$Likes)

Sample Result

  ID Likes Likes2
1 1 99k 99000
2 2 997 997
3 3 15.5k 15500
4 4 9.25k 9250
5 5 575 575
6 6 800 800
7 7 8.5k 8500
8 8 2,400 2400

R: Convert string (digit with K/M/G suffix) to double

This can be approached like this: first, you subset the vector on those values whose last character is K, using sub remove the suffix, convert the result to type numeric with as.numeric, and mutiply it with 10^3. As a final step, you convert the whole vector to numeric.

x[grepl("K$", x)] <- as.numeric(sub("K$", "", x))*10^3
x <- as.numeric(x)
[1] 10 10000000 10010000

And likewise for the suffix M

Data:

x <- c("10", "10.01K", "20K")

How to replace K for thousands and M for Millions in the same column in R

Supposing you have data that looks like this:

 fifa2 <- data.frame(Value = c("€565K", "€5.65M", "€777777"))

you can do this:

library(dplyr)
fifa2 %>%
mutate(Value1 = as.numeric(gsub("[€MK]", "", Value)),
Value1 = ifelse(grepl("K$", Value), Value1 * 1000,
ifelse(grepl("M$", Value), Value1 * 1000000,
Value1)))
Value Value1
1 €565K 565000
2 €5.65M 5650000
3 €777777 777777

Converting number abbreviations (5.2k, 1.7m, etc) into valid integers with PHP

Something like:

switch (strtolower(substr($input, -1))) {
case 'k':
$input*=1000;
break;
// similarly for m, M, b, B.
}

Assuming your data is well-formatted. If not more check would be needed like:

if (!preg_match('/^\d+(?:\.\d+)?[mbk]$/i',$input)) {
// $input is not a valid format.
}


Related Topics



Leave a reply



Submit