Convert Roman Numerals to Numbers in R

Convert roman numerals to numbers in R

as.roman() returns an object of class roman, so R recognizes it as such. You can directly turn it back into an Arabic numeral with as.numeric(). If you have a string that meets the criteria such that it could be a valid roman numeral, you can coerce it to a class roman object with as.roman(), and then coerce it into an Arabic numeral by composing the coercion functions. Consider:

> as.roman(79)
[1] LXXIX
> x <- as.roman(79)
> x
[1] LXXIX
> str(x)
Class 'roman'  int 79
> as.roman("LXXIX")
[1] LXXIX
> as.numeric(as.roman("LXXIX"))
[1] 79

Is there a fast way to convert Roman numerals in the string to Arabic in R?

You can directly pass a function to the replacement argument of str_replace:

library(stringr)

str_replace(A, "[IVX]+$", function(x) as.numeric(as.roman(x)))
#> [1] "Case 1"         "Big Case 2"     "Not a Case"     "This is Case 4"

Convert numbers to roman numerals in sapply in R

Hiho,

i think you have to add an as.character like this:

    sapply(sequence.to_roman, function(x) as.character(if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
> sapply(sequence.to_roman, function(x) as.character(if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
             1              2             V1             df              3 
           "I"           "II" "Some symbols" "Some symbols"          "III"

looks like the shortening of the output in sapply would convert the class roman back to its numeric value by default. So converting all outputs to char first prevents this.

try:

lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols")
> lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols")
[[1]]
[1] I

[[2]]
[1] II

[[3]]
[1] "Some symbols"

[[4]]
[1] "Some symbols"

[[5]]
[1] III

this iw what we want, but:

unlist(lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
> unlist(lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
[1] "1"            "2"            "Some symbols" "Some symbols" "3"

also gives the recodet forms.

for a maybe more visible description what causes the problem:

> as.roman("3")
[1] III
> as.character(as.roman("3"))
[1] "III"
> c(as.roman("3"), "test")
[1] "3"    "test"
> c(as.character(as.roman("3")), "test")
[1] "III"  "test"

How can I convert between numeral systems in R?

You could write your own S3 class:

base <- function(b, base = 10)
{
  base <- as.integer(base)
  if(base > 36 | base < 2) stop("'base' must be between 2 and 36.")
  
  structure(lapply(b, function(x) 
    {
      n   <- ceiling(log(x, base))
      vec <- numeric()
      val <- x
      
      while(n >= 0)
      {
        rem <- val %/% base^n
        val <- val - rem * base^n
        vec <- c(vec, rem)
        n <- n - 1
      }
      
      while(vec[1] == 0 & length(vec) > 1) vec <- vec[-1]
      structure(x, base = base, representation = vec) 
    }), class = "base")
}

Which will need a format and print method:

format.base <- function(b, ...) 
{
  sapply(b, function(x) 
    {
      glyphs <- c(0:9, LETTERS)
      base   <- attr(x, "base")
      vec    <- attr(x, "representation")
      paste0(glyphs[vec + 1], collapse = "")
    })
}

print.base <- function(b, ...) print(format(b), quote = FALSE)

We also need to make sure that maths operations work properly:

Ops.base <- function(e1, e2) {
  base <- attr(e1[[1]], "base")
  e1   <- unlist(e1)
  e2   <- unlist(e2)
  base(NextMethod(.Generic), base)
}

Math.base <- function(e1, e2) {
  base <- attr(e1[[1]], "base")
  e1   <- unlist(e1)
  e2   <- unlist(e2)
  base(NextMethod(.Generic), base)
}

And if you want to use it inside a data frame you need an as.data.frame method:

as.data.frame.base <- function(b, ...) 
{
  structure(list(b),  
            class = "data.frame", 
            row.names = seq_along(b))
}

Which all allows the following behaviour:

data.frame(binary = base(1:20, 2), hex = base(1:20, 16), oct = base(1:20, 8))
#>    binary hex oct
#> 1       1   1   1
#> 2      10   2   2
#> 3      11   3   3
#> 4     100   4   4
#> 5     101   5   5
#> 6     110   6   6
#> 7     111   7   7
#> 8    1000   8  10
#> 9    1001   9  11
#> 10   1010   A  12
#> 11   1011   B  13
#> 12   1100   C  14
#> 13   1101   D  15
#> 14   1110   E  16
#> 15   1111   F  17
#> 16  10000  10  20
#> 17  10001  11  21
#> 18  10010  12  22
#> 19  10011  13  23
#> 20  10100  14  24

And:

x <- base(67, 11)
y <- base(35, 2)
x + y
#> [1] 93

base(x + y, 10)
#> [1] 102

R remove roman numerals from column

Try this:

#Code
employee_df$employee <-gsub('^([0-9]+)|([IVXLCM]+)\\.?$','',employee_df$employee)

Output:

       employee salary
1   JOHN SMITH   21000
2  PETER RABBIT  23400
3 POPE GREGORY   26800
4     MARY SUE  100000

Or cleaner:

#Code2
employee_df$employee <- trimws(gsub('^([0-9]+)|([IVXLCM]+)\\.?$','',employee_df$employee))

Output:

      employee salary
1   JOHN SMITH  21000
2 PETER RABBIT  23400
3 POPE GREGORY  26800
4     MARY SUE 100000

The numeric component of regex is not necessary (Many thanks @BenBolker). You can use:

#Code3
employee_df$employee <- trimws(gsub('([IVXLCM]+)\\.?$','',employee_df$employee))

And obtain the same result.

How to convert Roman numerals to int while rejecting invalid numbers using standard C?

To create some level of rule flexibility, the following Roman_string_to_unsigned0() employs a table.

It follows the strtol() style of functionality in that an end-pointer is returned indicating where parsing stopped. De-ref and test against '\0' for success.

The function has a bool subtractive parameter to steer the two major types of Roman Numeral parsing: basic, subtractive.

static const struct Roman_digit {
  char ch[3];
  bool subtractive;
  unsigned char limit;
  unsigned char nextdown;  // with parse success, offset to next element to try
  unsigned value;
} Roman_table[] = {
    { "I", false, 4, 1, 1 }, //
    { "IV", true, 1, 2, 4 }, //
    { "V", false, 1, 2, 5 }, //
    { "IX", true, 1, 4, 9 }, //
    { "X", false, 4, 1, 10 }, //
    { "XL", true, 1, 2, 40 }, //
    { "L", false, 1, 2, 50 }, //
    { "XC", true, 1, 4, 90 }, //
    { "C", false, 4, 1, 100 }, //
    { "CD", true, 1, 2, 400 }, //
    { "D", false, 1, 2, 500 }, //
    { "CM", true, 1, 4, 900 }, //
    { "M", false, 4, 1, 1000 }, //
};
#define Roman_table_N (sizeof Roman_table / sizeof Roman_table[0])

const char *Roman_string_to_unsigned0(unsigned *dest, const char *src, bool subtractive){
  *dest = 0;
  for (unsigned i = Roman_table_N; i > 0;) {
    const struct Roman_digit *digit = &Roman_table[i - 1];
    if (!subtractive && digit->subtractive) {
      i--;
      continue;
    }
    unsigned limit = digit->limit;  // repeat count
    if (limit > 1 && subtractive) limit--;
    size_t ch_length = strlen(digit->ch);
    size_t next_i = i-1;
    for (unsigned j=0; j<limit; j++) {
      if (strncmp(src, digit->ch, ch_length) == 0) {
        *dest += digit->value;
        if (*dest < digit->value) { // Overflow detection
          return (char*) src;
        }
        src += ch_length;
        next_i = i - digit->nextdown;  // With success, maybe skip down the list 
      } else {
        break;
      }
    }
    i = next_i;
  }
  return (char*) src;
}

Notes: Case insensitivity not yet encoded. An empty string returns 0. By this code working most-to-least significant, "XXXMMM" does not pass.

Converting integers into words and roman numerals

Take a look at cl-format, it can return "twenty one", I used that for project euler.

http://clojuredocs.org/clojure_core/1.2.0/clojure.pprint/cl-format

and Roman too:

~@R prints arg as a Roman numeral: IV; and ~:@R prints arg as an old Roman numeral: IIII.

R capitalize roman numerals only in string

Here is a base R option using substr, sub, and paste:

people <- c("PERSON I", "PERSON II", "PERSON III", "PERSON IV")
people <- paste0(substr(people, 1, 1), tolower(sub("^\\S(\\S+).*$", "\\1", people)),
                 " ", sub("^.*?(\\S+)$", "\\1", people))
people

[1] "Person I"   "Person II"  "Person III" "Person IV"

Replace only the numbers of a data frame column with roman numerals in R

This isn't particularly elegant, but as long as you aren't dealing with millions of entries, it should work well enough. It makes use of the as.roman function in the gtools package.

library(gtools)
library(stringr)

sa<-c("Phase 1","Phase 2","Phase 1 | Phase 2","Phase 4")

sub_roman <- function(x){
  # identify any numbers (up to three digits)
  num <- as.numeric(unlist(str_extract_all(x, "\\d{1,3}")))
  for (i in seq_along(num)){
    # loop through the numbers and replace with the roman numeral
    x <- str_replace(x, "\\d{1,3}", as.character(as.roman(num[i])))
  }
  x
}

# Run the previously defined function over the vector.
sa <- 
  vapply(sa,
         sub_roman,
         character(1))

# replace the pipe with a slash.
sa <- str_replace_all(sa, "[|]", "/")

Convert Roman Numerals to Numbers in R