Extract Last Word in String in R

Extract last word in string in R


tail(strsplit('this is a sentence',split=" ")[[1]],1)

Basically as suggested by @Señor O.

Extract last word in string in R - error faced

I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.

Removing the whitespace using stri_trim() solved the issue.

c1$Description = stri_trim(c1$Description, "left") #remove whitespace

extracting the second last word between the special characters /

You can use word but you need to specify the separator,

library(stringr)

word(url, -2, sep = '/')
#[1] "ani" "bmc"

extract last word from string only if more than one word R

Maybe something like the following.

x <- c("Genus species", "Genus", "Genus (word) species")
y <- gsub(".*[[:blank:]](\\w+)$", "\\1", x)
is.na(y) <- y == "Genus"
y
[1] "species" NA "species"

Note that it should be very difficult to search for "species" since we don't have a full list of them. That's why I've opted by this, to set the elements of the result y to NA if they are equal to "Genus".

R remove last word from string

This will work:

gsub("\\s*\\w*$", "", df1$city)
[1] "Middletown" "Sunny Valley" "Hillside"

It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.

Extract last word in a string after comma if there are multiple words else the first word

You can try sub

 df$country <- sub('.*,\\s*', '', df$location)
df$country
#[1] "New Zealand" "USA" "France"

Or

 library(stringr)
str_extract(df$location, '\\b[^,]+$')
#[1] "New Zealand" "USA" "France"

Extracting the last n characters from a string in R

I'm not aware of anything in base R, but it's straight-forward to make a function to do this using substr and nchar:

x <- "some text in a string"

substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}

substrRight(x, 6)
[1] "string"

substrRight(x, 8)
[1] "a string"

This is vectorised, as @mdsumner points out. Consider:

x <- c("some text in a string", "I really need to learn how to count")
substrRight(x, 6)
[1] "string" " count"

R: Extract last N words from character column in data.table

I would probably use

n = 5
patt = sprintf("\\w+( \\w+){0,%d}$", n-1)

library(stringi)
test[, ext := stri_extract(original, regex = patt)]

original ext
1: the green shirt totally brings out your eyes totally brings out your eyes
2: ford focus hatchback ford focus hatchback

Comments:

  • This breaks if you set n=0, but there's probably no good reason to do that.
  • This is vectorized, in case you have n differing across rows (e.g., n=3:4).
  • @eddi provided a base analogue (for fixed n):

    test[, ext := sub('.*?(\\w+( \\w+){4})$', '\\1', original)]


Related Topics



Leave a reply



Submit