R Get Last Element from Str_Split

R get last element from str_split

As the comment on your question suggests, this is suitable for gsub:

gsub("^.*_", "", string_thing)

I'd recommend you take note of the following cases as well.

string_thing <- c("I_AM_STRING", "I_AM_ALSO_STRING_THING", "AM I ONE", "STRING_")
gsub("^.*_", "", string_thing)
[1] "STRING" "THING" "AM I ONE" ""

Use strsplit to get last character in r

For your strsplit method to work, you can use tail with sapply

df$LastInit <- sapply(strsplit(as.character(df$Name), ""), tail, 1)
df
# Name Sex LastInit
# 1 Anna F a
# 2 Michael M l
# 3 David M d
# 4 Sarah F h

Alternatively, you can use substring

with(df, substring(Name, nchar(Name)))
# [1] "a" "l" "d" "h"

Write a function to get later elements from str_split()

We may use tail - as there are more than one element to be returned, return as a list column

Orgsplit_abrev <- function(x){
lapply(str_split(x," "), tail, 2)
}

-testing

foo %>%
summarise(Orgsplit_abrev(Organisms))
Orgsplit_abrev(Organisms)
1 Enterobacter, aerogenes
2 Enterobacter, aerogenes
3 Klebsiella, pneumoniae
4 Acinetobacter, baumannii
5 Enterobacter, cloacae
6 Klebsiella, pneumoniae

Also, if we want to specify the index, create a lambda function

Orgsplit_abrev <- function(x){
lapply(str_split(x," "), function(x) x[c(3, 4)])
}

Or may also use Extract with [

Orgsplit_abrev <- function(x){
lapply(str_split(x," "),`[`, c(3, 4))
}

R split text string into last and first elements

You can use tail to grab the last element:

df$name2 = as.character(lapply(strsplit(as.character(df$PREFIX), split="_"),
tail, n=1))
df
# PREFIX VALUE name1 name2
# 1 A_B 1 A B
# 2 A_C 2 A C
# 3 A_D 3 A D
# 4 B_A 4 B A
# 5 A_B_C 5 A C
# 6 B_D_E 6 B E
# 7 C_B_A 7 C A
# 8 B_A 8 B A

How to get empty last elements from strsplit() in R?

Here are a couple ideas

scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1] 1 2 3 NA

unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1] 1 2 3 NA

Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:

as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA

Or, you could specify what="character" in scan, or colClasses="character" in read.csv for slightly different output

scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" ""

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" ""

You could also specify na.strings="" along with colClasses="character"

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""), 
use.names=FALSE)
#[1] "1" "2" "3" NA

accessing individual values split by str_split in R, finding the last one?

Taken from: Find file name from full file path

basename("C:/some_dir/a")
> [1] "a"

dirname("C:/some_dir/a")
>[1] "C:/some_dir"

Although I think the above approach is much better, you can also use the str_split approach - which I really only mention to show how to select the last elements from a list using lapply.

example <- c("C:/some_dir/a","C:/some_dir/sdfs/a","C:/some_dir/asdf/asdf/a")
example.split <- strsplit(example,"/")
files <- unlist(lapply(example.split, tail , 1 ))

split string last delimiter

These use no packages. They assume that each element of col2 has at least one underscore. (See note if lifting this restriction is needed.)

1) The first regular expression (.*)_ matches everything up to the last underscore followed by everything remaining .* and the first sub replaces the entire match with the matched part within parens. This works because such matches are greedy so the first .* will take everything it can leaving the rest for the second .* . The second regular expression matches everything up to the last underscore and the second sub replaces that with the empty string.

transform(df, col2 = sub("(.*)_.*", "\\1", col2), col3 = sub(".*_", "", col2))

2) Here is a variation that is a bit more symmetric. It uses the same regular expression for both sub calls.

pat <- "(.*)_(.*)"
transform(df, col2 = sub(pat, "\\1", col2), col3 = sub(pat, "\\2", col2))

Note: If we did want to handle strings with no underscore at all such that "xyz" is split into "xyz" and "" then use this for the second sub. It tries to match the left hand side of the | first and if that fails (which will occur if there are no underscores) then the entire string will match the right hand side and sub will replace that with the empty string.

sub(".*_|^[^_]*$", "", col2)


Related Topics



Leave a reply



Submit