How to Find the Length of a String in R

How to find the length of a string in R

See ?nchar. For example:

> nchar("foo")
[1] 3
> set.seed(10)
> strn <- paste(sample(LETTERS, 10), collapse = "")
> strn
[1] "NHKPBEFTLY"
> nchar(strn)
[1] 10

How to get the length of elements in a character vector in R?

An option would be to count the words with str_count

library(stringr)
str_count(group.name, "\\w+")
#[1] 3

Or replace all the non-delimiters to blank, use nchar to get the number of characters, add 1 (as the delimiter is 1 less than the number of words)

nchar(gsub("[^,]+", "", group.name)) + 1
#[1] 3

Or using regexpr

lengths(gregexpr("\\w+", group.name))
#[1] 3

It can be turned into a function

f1 <- function(stringObj){

nchar(gsub("[^,]+", "", stringObj)) + 1
}

f1(group.name)
#[1] 3

Finding the length of each string within a column of a data-frame in R

You can also apply nchar to your dataframe and get the result from the corresponding column:

data.frame(names=temp$name,chr=apply(temp,2,nchar)[,2])
names chr
1 KOSH ENTRP 10
2 JOHN DOE 8
3 S KHAN 6
4 JASINT PVT LTD 14
5 KOSH ENTRPRISE 14
6 JOHN S DOE 10
7 S KHAN 6
8 S KHAN 6

Is there a way to find the mean length of words in a string in R?

Split the titles into words and count the mean number of characters in each word.

mean(nchar(unlist(strsplit(titles, '\\s+'))))
#[1] 5.161017

Note that since we are splitting on whitespace this has words like "1981-1991", "(Scott", "#1)" etc. which should be ok for larger samples. If you don't want to include them you may need to clarify the requirement of what constitutes a word.

How to find the length of the first line of a string in a multi line string in R

You can use sub to remove everything after \n and use then nchar.

nchar(sub("\n.*", "", my_string))
#[1] 29

or using strsplit

nchar(strsplit(my_string, "\n")[[1]][1])
#nchar(strsplit(my_string, "\n")[[c(1,1)]]) #Alternative
#[1] 29

Get length of string in R as it would be `cat()`d (tab handling)

Interesting question.

I think you might just need to brute force this one, with something like the following. (It's based on the observations that: (1) tabs are displayed using at least one space; and (2) each tab-terminated substring is allocated a block of space that is the smallest multiple of 8 characters that's able to accommodate it.)

catLength <- function(x) {
xx <- strsplit(x, "(?<=\\t)", perl=TRUE)[[1]]
ii <- grepl("\\t", xx)
sum(ii * 8*ceiling((nchar(xx) + 1)/8)) + sum(!ii*(nchar(xx)))
}

catLength("\t\t")
# [1] 16
catLength("A")
# [1] 1
catLength("\tA")
# [1] 9
catLength("1234567\tA")
# [1] 9
catLength("12345678\tA")
# [1] 17
catLength("12345678\tAB")
# [1] 18

How to get the length of a formula in R?

We may use all.vars to get the variables in the formula and then apply the length

length(all.vars(myformula))
[1] 5

Find the shortest string by categories R

This should do what you need

ex %>% 
group_by(category) %>%
mutate(length = min(nchar(string)),
string = str_sub(string, 1, length))

We don't need the lapply inside the mutate to find the length. We can just run that transformation on the string column directly. And here I used stringr::str_sub to get the substring with the right number of characters since you already seem to be using tidyverse functions. You could also use the base substr function instead.

Rowwise comparison of the length of a string against a list of string lengths

Update: Removed first answer. Thanks to akrun for pointing me there!. The concept is the same: using nchar with case_when, the difference is to use separate_rows from tidyr package:

library(dplyr)
library(tidyr)

df %>%
mutate(id = row_number()) %>%
separate_rows(ALT, sep = ",") %>%
mutate(TYPE = case_when(nchar(REF)==nchar(ALT) ~ "SNM",
nchar(REF)< nchar(ALT) ~ "INS",
nchar(REF)> nchar(ALT) ~ "DEL",
TRUE ~ NA_character_)) %>%
group_by(id) %>%
mutate(TYPE = toString(TYPE)) %>%
slice(1)
 REF                               ALT                                    id TYPE    
<chr> <chr> <int> <chr>
1 TTG T 1 DEL
2 CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 2 INS, DEL
3 T TTG 3 INS
4 TTGTGTGTGTGTGTGTGTGTGT TTGTGTGTGTGTGTGTGTGTGTGT 4 INS

Finding length of a character string which is separated by commas in R

Using @akrun's sample data, here's the count.fields approach I mentioned in the comments.

> count.fields(textConnection(DF$Values), sep = ",")
[1] 4 7 6

If they are factors, just use textConnection(as.character(DF$Values)) instead.



Related Topics



Leave a reply



Submit