How to find the length of a string in R
See ?nchar
. For example:
> nchar("foo")
[1] 3
> set.seed(10)
> strn <- paste(sample(LETTERS, 10), collapse = "")
> strn
[1] "NHKPBEFTLY"
> nchar(strn)
[1] 10
How to get the length of elements in a character vector in R?
An option would be to count the words with str_count
library(stringr)
str_count(group.name, "\\w+")
#[1] 3
Or replace all the non-delimiters to blank, use nchar
to get the number of characters, add 1 (as the delimiter is 1 less than the number of words)
nchar(gsub("[^,]+", "", group.name)) + 1
#[1] 3
Or using regexpr
lengths(gregexpr("\\w+", group.name))
#[1] 3
It can be turned into a function
f1 <- function(stringObj){
nchar(gsub("[^,]+", "", stringObj)) + 1
}
f1(group.name)
#[1] 3
Finding the length of each string within a column of a data-frame in R
You can also apply
nchar
to your dataframe and get the result from the corresponding column:
data.frame(names=temp$name,chr=apply(temp,2,nchar)[,2])
names chr
1 KOSH ENTRP 10
2 JOHN DOE 8
3 S KHAN 6
4 JASINT PVT LTD 14
5 KOSH ENTRPRISE 14
6 JOHN S DOE 10
7 S KHAN 6
8 S KHAN 6
Is there a way to find the mean length of words in a string in R?
Split the titles
into words and count the mean number of characters in each word.
mean(nchar(unlist(strsplit(titles, '\\s+'))))
#[1] 5.161017
Note that since we are splitting on whitespace this has words like "1981-1991"
, "(Scott"
, "#1)"
etc. which should be ok for larger samples. If you don't want to include them you may need to clarify the requirement of what constitutes a word.
How to find the length of the first line of a string in a multi line string in R
You can use sub
to remove everything after \n
and use then nchar
.
nchar(sub("\n.*", "", my_string))
#[1] 29
or using strsplit
nchar(strsplit(my_string, "\n")[[1]][1])
#nchar(strsplit(my_string, "\n")[[c(1,1)]]) #Alternative
#[1] 29
Get length of string in R as it would be `cat()`d (tab handling)
Interesting question.
I think you might just need to brute force this one, with something like the following. (It's based on the observations that: (1) tabs are displayed using at least one space; and (2) each tab-terminated substring is allocated a block of space that is the smallest multiple of 8 characters that's able to accommodate it.)
catLength <- function(x) {
xx <- strsplit(x, "(?<=\\t)", perl=TRUE)[[1]]
ii <- grepl("\\t", xx)
sum(ii * 8*ceiling((nchar(xx) + 1)/8)) + sum(!ii*(nchar(xx)))
}
catLength("\t\t")
# [1] 16
catLength("A")
# [1] 1
catLength("\tA")
# [1] 9
catLength("1234567\tA")
# [1] 9
catLength("12345678\tA")
# [1] 17
catLength("12345678\tAB")
# [1] 18
How to get the length of a formula in R?
We may use all.vars
to get the variables in the formula and then apply the length
length(all.vars(myformula))
[1] 5
Find the shortest string by categories R
This should do what you need
ex %>%
group_by(category) %>%
mutate(length = min(nchar(string)),
string = str_sub(string, 1, length))
We don't need the lapply
inside the mutate to find the length. We can just run that transformation on the string
column directly. And here I used stringr::str_sub
to get the substring with the right number of characters since you already seem to be using tidyverse functions. You could also use the base substr
function instead.
Rowwise comparison of the length of a string against a list of string lengths
Update: Removed first answer. Thanks to akrun for pointing me there!. The concept is the same: using nchar
with case_when
, the difference is to use separate_rows
from tidyr
package:
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
separate_rows(ALT, sep = ",") %>%
mutate(TYPE = case_when(nchar(REF)==nchar(ALT) ~ "SNM",
nchar(REF)< nchar(ALT) ~ "INS",
nchar(REF)> nchar(ALT) ~ "DEL",
TRUE ~ NA_character_)) %>%
group_by(id) %>%
mutate(TYPE = toString(TYPE)) %>%
slice(1)
REF ALT id TYPE
<chr> <chr> <int> <chr>
1 TTG T 1 DEL
2 CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 2 INS, DEL
3 T TTG 3 INS
4 TTGTGTGTGTGTGTGTGTGTGT TTGTGTGTGTGTGTGTGTGTGTGT 4 INS
Finding length of a character string which is separated by commas in R
Using @akrun's sample data, here's the count.fields
approach I mentioned in the comments.
> count.fields(textConnection(DF$Values), sep = ",")
[1] 4 7 6
If they are factors, just use textConnection(as.character(DF$Values))
instead.
Related Topics
How to Apply a Function to a Certain Column for All the Data Frames in Environment in R
How to Convert a Date from a Character String
R: Eval(Parse(...)) Is Often Suboptimal
Ggplot Custom Scale Transformation with Custom Ticks
Find All Unique Values in Column Separated by Comma
Row-Wise Sort Then Concatenate Across Specific Columns of Data Frame
Maps, Ggplot2, Fill by State Is Missing Certain Areas on the Map
Getting All Combinations Which Sum Up to 100 Using R
How to Use Cast or Another Function to Create a Binary Table in R
Show Multiple Plots from Ggplot on One Page in R
Find Multiple Strings Using Str_Extract_All
Matching a Sequence in a Larger Vector
How to Change .Libpaths() Permanently in R
Why (Or When) Is Rscript (Or Littler) Better Than R Cmd Batch
R Tm Package Vcorpus: Error in Converting Corpus to Data Frame