Convert roman numerals to numbers in R
as.roman()
returns an object of class roman, so R recognizes it as such. You can directly turn it back into an Arabic numeral with as.numeric()
. If you have a string that meets the criteria such that it could be a valid roman numeral, you can coerce it to a class roman object with as.roman()
, and then coerce it into an Arabic numeral by composing the coercion functions. Consider:
> as.roman(79)
[1] LXXIX
> x <- as.roman(79)
> x
[1] LXXIX
> str(x)
Class 'roman' int 79
> as.roman("LXXIX")
[1] LXXIX
> as.numeric(as.roman("LXXIX"))
[1] 79
Is there a fast way to convert Roman numerals in the string to Arabic in R?
You can directly pass a function to the replacement
argument of str_replace
:
library(stringr)
str_replace(A, "[IVX]+$", function(x) as.numeric(as.roman(x)))
#> [1] "Case 1" "Big Case 2" "Not a Case" "This is Case 4"
Convert numbers to roman numerals in sapply in R
Hiho,
i think you have to add an as.character like this:
sapply(sequence.to_roman, function(x) as.character(if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
> sapply(sequence.to_roman, function(x) as.character(if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
1 2 V1 df 3
"I" "II" "Some symbols" "Some symbols" "III"
looks like the shortening of the output in sapply would convert the class roman back to its numeric value by default. So converting all outputs to char first prevents this.
try:
lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols")
> lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols")
[[1]]
[1] I
[[2]]
[1] II
[[3]]
[1] "Some symbols"
[[4]]
[1] "Some symbols"
[[5]]
[1] III
this iw what we want, but:
unlist(lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
> unlist(lapply(sequence.to_roman, function(x) if(grepl("^[1-9]\\d*$",x)) as.roman(x) else "Some symbols"))
[1] "1" "2" "Some symbols" "Some symbols" "3"
also gives the recodet forms.
for a maybe more visible description what causes the problem:
> as.roman("3")
[1] III
> as.character(as.roman("3"))
[1] "III"
> c(as.roman("3"), "test")
[1] "3" "test"
> c(as.character(as.roman("3")), "test")
[1] "III" "test"
How can I convert between numeral systems in R?
You could write your own S3 class:
base <- function(b, base = 10)
{
base <- as.integer(base)
if(base > 36 | base < 2) stop("'base' must be between 2 and 36.")
structure(lapply(b, function(x)
{
n <- ceiling(log(x, base))
vec <- numeric()
val <- x
while(n >= 0)
{
rem <- val %/% base^n
val <- val - rem * base^n
vec <- c(vec, rem)
n <- n - 1
}
while(vec[1] == 0 & length(vec) > 1) vec <- vec[-1]
structure(x, base = base, representation = vec)
}), class = "base")
}
Which will need a format
and print
method:
format.base <- function(b, ...)
{
sapply(b, function(x)
{
glyphs <- c(0:9, LETTERS)
base <- attr(x, "base")
vec <- attr(x, "representation")
paste0(glyphs[vec + 1], collapse = "")
})
}
print.base <- function(b, ...) print(format(b), quote = FALSE)
We also need to make sure that maths operations work properly:
Ops.base <- function(e1, e2) {
base <- attr(e1[[1]], "base")
e1 <- unlist(e1)
e2 <- unlist(e2)
base(NextMethod(.Generic), base)
}
Math.base <- function(e1, e2) {
base <- attr(e1[[1]], "base")
e1 <- unlist(e1)
e2 <- unlist(e2)
base(NextMethod(.Generic), base)
}
And if you want to use it inside a data frame you need an as.data.frame
method:
as.data.frame.base <- function(b, ...)
{
structure(list(b),
class = "data.frame",
row.names = seq_along(b))
}
Which all allows the following behaviour:
data.frame(binary = base(1:20, 2), hex = base(1:20, 16), oct = base(1:20, 8))
#> binary hex oct
#> 1 1 1 1
#> 2 10 2 2
#> 3 11 3 3
#> 4 100 4 4
#> 5 101 5 5
#> 6 110 6 6
#> 7 111 7 7
#> 8 1000 8 10
#> 9 1001 9 11
#> 10 1010 A 12
#> 11 1011 B 13
#> 12 1100 C 14
#> 13 1101 D 15
#> 14 1110 E 16
#> 15 1111 F 17
#> 16 10000 10 20
#> 17 10001 11 21
#> 18 10010 12 22
#> 19 10011 13 23
#> 20 10100 14 24
And:
x <- base(67, 11)
y <- base(35, 2)
x + y
#> [1] 93
base(x + y, 10)
#> [1] 102
R remove roman numerals from column
Try this:
#Code
employee_df$employee <-gsub('^([0-9]+)|([IVXLCM]+)\\.?$','',employee_df$employee)
Output:
employee salary
1 JOHN SMITH 21000
2 PETER RABBIT 23400
3 POPE GREGORY 26800
4 MARY SUE 100000
Or cleaner:
#Code2
employee_df$employee <- trimws(gsub('^([0-9]+)|([IVXLCM]+)\\.?$','',employee_df$employee))
Output:
employee salary
1 JOHN SMITH 21000
2 PETER RABBIT 23400
3 POPE GREGORY 26800
4 MARY SUE 100000
The numeric component of regex is not necessary (Many thanks @BenBolker). You can use:
#Code3
employee_df$employee <- trimws(gsub('([IVXLCM]+)\\.?$','',employee_df$employee))
And obtain the same result.
How to convert Roman numerals to int while rejecting invalid numbers using standard C?
To create some level of rule flexibility, the following Roman_string_to_unsigned0()
employs a table.
It follows the strtol()
style of functionality in that an end-pointer is returned indicating where parsing stopped. De-ref and test against '\0'
for success.
The function has a bool subtractive
parameter to steer the two major types of Roman Numeral parsing: basic, subtractive.
static const struct Roman_digit {
char ch[3];
bool subtractive;
unsigned char limit;
unsigned char nextdown; // with parse success, offset to next element to try
unsigned value;
} Roman_table[] = {
{ "I", false, 4, 1, 1 }, //
{ "IV", true, 1, 2, 4 }, //
{ "V", false, 1, 2, 5 }, //
{ "IX", true, 1, 4, 9 }, //
{ "X", false, 4, 1, 10 }, //
{ "XL", true, 1, 2, 40 }, //
{ "L", false, 1, 2, 50 }, //
{ "XC", true, 1, 4, 90 }, //
{ "C", false, 4, 1, 100 }, //
{ "CD", true, 1, 2, 400 }, //
{ "D", false, 1, 2, 500 }, //
{ "CM", true, 1, 4, 900 }, //
{ "M", false, 4, 1, 1000 }, //
};
#define Roman_table_N (sizeof Roman_table / sizeof Roman_table[0])
const char *Roman_string_to_unsigned0(unsigned *dest, const char *src, bool subtractive){
*dest = 0;
for (unsigned i = Roman_table_N; i > 0;) {
const struct Roman_digit *digit = &Roman_table[i - 1];
if (!subtractive && digit->subtractive) {
i--;
continue;
}
unsigned limit = digit->limit; // repeat count
if (limit > 1 && subtractive) limit--;
size_t ch_length = strlen(digit->ch);
size_t next_i = i-1;
for (unsigned j=0; j<limit; j++) {
if (strncmp(src, digit->ch, ch_length) == 0) {
*dest += digit->value;
if (*dest < digit->value) { // Overflow detection
return (char*) src;
}
src += ch_length;
next_i = i - digit->nextdown; // With success, maybe skip down the list
} else {
break;
}
}
i = next_i;
}
return (char*) src;
}
Notes: Case insensitivity not yet encoded. An empty string returns 0. By this code working most-to-least significant, "XXXMMM"
does not pass.
Converting integers into words and roman numerals
Take a look at cl-format, it can return "twenty one", I used that for project euler.
http://clojuredocs.org/clojure_core/1.2.0/clojure.pprint/cl-format
and Roman too:
~@R prints arg as a Roman numeral: IV; and ~:@R prints arg as an old Roman numeral: IIII.
R capitalize roman numerals only in string
Here is a base R option using substr
, sub
, and paste
:
people <- c("PERSON I", "PERSON II", "PERSON III", "PERSON IV")
people <- paste0(substr(people, 1, 1), tolower(sub("^\\S(\\S+).*$", "\\1", people)),
" ", sub("^.*?(\\S+)$", "\\1", people))
people
[1] "Person I" "Person II" "Person III" "Person IV"
Replace only the numbers of a data frame column with roman numerals in R
This isn't particularly elegant, but as long as you aren't dealing with millions of entries, it should work well enough. It makes use of the as.roman
function in the gtools
package.
library(gtools)
library(stringr)
sa<-c("Phase 1","Phase 2","Phase 1 | Phase 2","Phase 4")
sub_roman <- function(x){
# identify any numbers (up to three digits)
num <- as.numeric(unlist(str_extract_all(x, "\\d{1,3}")))
for (i in seq_along(num)){
# loop through the numbers and replace with the roman numeral
x <- str_replace(x, "\\d{1,3}", as.character(as.roman(num[i])))
}
x
}
# Run the previously defined function over the vector.
sa <-
vapply(sa,
sub_roman,
character(1))
# replace the pipe with a slash.
sa <- str_replace_all(sa, "[|]", "/")
Related Topics
How to Split an Igraph into Connected Subgraphs
Is There a Predict Function for Plm in R
How to Plot a Heat Map on a Spatial Map
How to Read the Source Code for an R Function
Fitting a Curve to Specific Data
Output a Good-Looking Matrix Using Rendertable()
Nested If Else Statements Over a Number of Columns
Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File
Annotate Values Above Bars (Ggplot Faceted)
How to Produce a Heatmap with Ggplot2
How to Read Huge CSV File into R by Row Condition
"Adding Missing Grouping Variables" Message in Dplyr in R
Using Geom_Rect for Time Series Shading in R