R Remove Last Word from String

R remove last word from string

This will work:

gsub("\\s*\\w*$", "", df1$city)
[1] "Middletown" "Sunny Valley" "Hillside"

It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.

Extract last word in string in R

tail(strsplit('this is a sentence',split=" ")[[1]],1)

Basically as suggested by @Señor O.

Remove last word of string to first word in R

Here is an option using dplyr and stringr.

library(dplyr)
library(stringr)

df %>%
mutate(temp = str_extract(string, str_c(trail, collapse = '|')),
result = ifelse(is.na(temp), string, str_c(temp, str_remove(string, temp), sep = ' '))) %>%
select(-temp)

# string result
#1 ABA PRIMARY SCHOOL PRIMARY SCHOOL ABA
#2 BLABLA SECONDARY SCHOOL SECONDARY SCHOOL BLABLA
#3 WAZA INSTITUT INSTITUT WAZA
#4 INSTITUT WAMA INSTITUT WAMA
#5 PRIMARY SCHOOL WAMA PRIMARY SCHOOL WAMA

data

string <- c("ABA PRIMARY SCHOOL", "BLABLA SECONDARY SCHOOL", "WAZA INSTITUT", "INSTITUT WAMA", "PRIMARY SCHOOL WAMA")
df <- data.frame(string)
trail = c(" PRIMARY SCHOOL", " SECONDARY SCHOOL", " INSTITUT")

R - Regex to Remove Last Word from String

We can capture the substring as groups using sub in pattern, then we add a delimiter (,) between the capture groups in the replacement, use that as sep in the read.table. If there are leading/lagging spaces, remove it by str_trim from stringr by looping through the columns.

library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
# V1 V2
#1 PLAYSTORE BANGKOK
#2 FLOAT@THE BAY SINGAPORE
#3 YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE

Or as suggested by @RichardScriven, a base R option for trimming leading/lagging spaces is trimws.

d1[] <- lapply(d1, trimws)

data

v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY          SINGAPORE',
'YANTRA SINGAPORE',
'AIRASIA_QS9DQQL SINGAPORE')

extract last word from string only if more than one word R

Maybe something like the following.

x <- c("Genus species", "Genus", "Genus (word) species")
y <- gsub(".*[[:blank:]](\\w+)$", "\\1", x)
is.na(y) <- y == "Genus"
y
[1] "species" NA "species"

Note that it should be very difficult to search for "species" since we don't have a full list of them. That's why I've opted by this, to set the elements of the result y to NA if they are equal to "Genus".

Extract last word in string in R - error faced

I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.

Removing the whitespace using stri_trim() solved the issue.

c1$Description = stri_trim(c1$Description, "left") #remove whitespace

How to remove the last word in a string using JavaScript

Use:

var str = "I want to remove the last word.";
var lastIndex = str.lastIndexOf(" ");

str = str.substring(0, lastIndex);

Get the last space and then get the substring.

Delete the first 2 words and the last 2 words in a string in a dataframe using Regex Python

You can use a single call to Series.str.replace with

df['Sentence'].str.replace(r'(?<![^,])\s*\w+(?:\W+\w+)?\s*|\s*\w+(?:\W+\w+)?\s*(?![^,])', '')

See the Pandas demo:

>>> pattern = r'(?<![^,])\s*\w+(?:\W+\w+)?\s*|\s*\w+(?:\W+\w+)?\s*(?![^,])'
>>> df['Sentence'].str.replace(pattern, '')
0 is jumping off
1 jumped over the,is
2

Regex details

  • (?<![^,]) - a comma or start of string must appear immediately to the left of the current location
  • \s* - 0+ whitespaces
  • \w+ - one or more word chars
  • (?:\W+\w+)? - an optional occurrence of one or more non-word chars followed with one or more word chars
  • \s* - 0+ whitespaces
  • | - or
  • \s* - 0+ whitespaces
  • \w+ - a word (one or more word chars)
  • (?:\W+\w+)? - an optional occurrence of one or more non-word chars followed with one or more word chars
  • \s* - 0+ whitespaces
  • (?![^,]) - end of string, or a location that is immediately followed with a comma.


Related Topics



Leave a reply



Submit