Separate String After Last Underscore

Split string after last underscore in R

One option would be to use strsplit with a negative lookahead which asserts that the underscore on which to split is the final one in the input:

input <- "ABC_Something_Filename"
parts <- strsplit(input, "_(?!.*_)", perl=TRUE)[[1]]
parts

[1] "ABC_Something" "Filename"

Obtain everything from string after last underscore in every string

Here is one of the possible solutions :

SELECT SUBSTRING(your_field, LOCATE('MOB_',your_field)+4)
FROM your_table;

or with regexp

SELECT REGEXP_SUBSTR(your_field,'(?<=MOB_).*$') 
FROM your_table

EDIT with oracle :

SELECT substr(regexp_substr(your_field,'MOB_(.*)'),5) "result" FROM your_table;

if you don't want to use substr http://sqlfiddle.com/#!4/a4021/20 :

SELECT regexp_substr(your_field,'MOB_(.*)', 1, 1, NULL, 1) "result" FROM your_table;

Edit 2 with oracle :
See more examples here http://sqlfiddle.com/#!4/a4021/31
One with regexp_replace and one with instr (= find in oracle).

Separate string after last underscore

library(tidyr)

df <- data.frame(Name = c("A","B","C"),
                 Var_1_EVU = c(2,NA,NA),
                 Var_1_BdS = c(NA,3,4),
                 Var_2_BdS = c(NA,3,4))

df %>% 
  gather("type", "value", -Name) %>% 
  separate(type, into = c("type", "type_num", "var")) %>% 
  unite(type, type, type_num, sep = "") %>% 
  spread(type, value)

#   Name var Var1 Var2
# 1    A BdS   NA   NA
# 2    A EVU    2   NA
# 3    B BdS    3    3
# 4    B EVU   NA   NA
# 5    C BdS    4    4
# 6    C EVU   NA   NA

example using tidyr::extract to deal with varnames that have an arbitrary number of underscores...

library(dplyr)
library(tidyr)

df <- data.frame(Name = c("A","B","C"),
                 Var_x_1_EVU = c(2,NA,NA),
                 Var_x_1_BdS = c(NA,3,4),
                 Var_x_y_2_BdS = c(NA,3,4))

df %>% 
  gather("col_name", "value", -Name) %>% 
  extract(col_name, c("var", "type"), "(.*)_(.*)") %>% 
  spread(var, value)

#   Name type Var_x_1 Var_x_y_2
# 1    A  BdS      NA        NA
# 2    A  EVU       2        NA
# 3    B  BdS       3         3
# 4    B  EVU      NA        NA
# 5    C  BdS       4         4
# 6    C  EVU      NA        NA

You can avoid a potential problem with duplicate observations by adding a row number column/variable first with mutate(n = row_number()) to make each observation unique, and you can avoid tidyr::extract being masked by magrittr by calling it explictly with tidyr::extract...

library(dplyr)
library(tidyr)
library(data.table)
library(magrittr)

dt <- data.table(Name = c("A", "A", "B", "C"),
                 Var_1_EVU = c(1, 2, NA, NA),
                 Var_1_BdS = c(1, NA, 3, 4),
                 Var_x_2_BdS = c(1, NA, 3, 4))

dt %>% 
  mutate(n = row_number()) %>% 
  gather("col_name", "value", -n, -Name) %>% 
  tidyr::extract(col_name, c("var", "type"), "(.*)_(.*)") %>% 
  spread(var, value)

#   Name n type Var_1 Var_x_2
# 1    A 1  BdS     1       1
# 2    A 1  EVU     1      NA
# 3    A 2  BdS    NA      NA
# 4    A 2  EVU     2      NA
# 5    B 3  BdS     3       3
# 6    B 3  EVU    NA      NA
# 7    C 4  BdS     4       4
# 8    C 4  EVU    NA      NA

Splitting the string with last underscore

You can use String#rpartition that searches for a given pattern form the right end of the string and splits when it finds it.

'a_b_c_d_e'.rpartition(/_/)
=> ["a_b_c_d", "_", "e"]

s = 'a_b_c_d_e'
parts = s.rpartition(/_/)
[parts.first, parts.last]
=> ["a_b_c_d", "e"]

EDIT: applying advices from the comments:

'a_b_c_d_e'.rpartition('_').values_at(0,2)
=> ["a_b_c_d", "e"]

How to extract the last part of the string in Excel after the last underscore

You can find the string after the last _ using the RIGHT function. The tricky part is to find the position of the last _.

First, we need to determine, how many _ there are. This can be done be removing all _ and compare the length of both strings:

LEN(A1)-LEN(SUBSTITUTE(A1;"_";""))

Since we now know that it is the second occurrence of _ that we have to look for, we can use the SUBSTITUTE function to replace the second occurrence of _ with another character (this has to be a character that is not part of your original string - I have chosen#).

Now, we use the FIND function to search for the position of the # character. This position can now be delivered to the RIGHT function.

Your final formula will be:

=RIGHT(A1;LEN(A1)-FIND("#";SUBSTITUTE(A1;"_";"#";LEN(A1)-LEN(SUBSTITUTE(A1;"_";"")));1))

Get the last string after a character with multiple occurrrences in Excel

How can I remove string after last underscore in python dataframe?

pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))

Explaination:

df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})

Creates

    col
0   AA_XX
1   AAA_BB_XX
2   AA_BB_XYX
3   AA_A_B_YXX

Use apply in order to loop through the column you want to edit.

I broke the string at _ and then joined all parts leaving the last part at _

df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))
print(df)

Results:

    col
0   AA
1   AAA_BB
2   AA_BB
3   AA_A_B

If your dataset contains values like AA (values without underscore).

Change the lambda like this

df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX', 'AA']})
df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]) if len(r.split('_')) > 1 else r)
print(df)

Regex substring from last underscore occurrence till nth underscore (from the back)

This is similar to @Kat's approach in the comment but using a function to make it dynamic.

string <- "trash_trash_trash_keep_keep_keep_trash.trash" 

return_last_n_words <- function(x, n) {
  strsplit(x, '_')[[1]] |> head(-1) |> tail(n) |> paste0(collapse = "_")
}

return_last_n_words(string, 3)
#[1] "keep_keep_keep"

return_last_n_words(string, 4)
#[1] "trash_keep_keep_keep"

return_last_n_words(string, 2)
#[1] "keep_keep"

The idea is to split the string by underscore (_), drop the last part, select last n words and paste it in one string.

How to get a substring after the last underscore (_) in unix shell script

You can do

s='this_is_test_string1_22'

In BASH:

echo "${s##*_}"
22

OR using sed:

sed 's/^.*_\([^_]*\)$/\1/' <<< 'this_is_test_string1_22'
22

EDIT for sh:

echo "$s" | sed 's/^.*_\([^_]*\)$/\1/'