Separate String After Last Underscore

Split string after last underscore in R

One option would be to use strsplit with a negative lookahead which asserts that the underscore on which to split is the final one in the input:

input <- "ABC_Something_Filename"
parts <- strsplit(input, "_(?!.*_)", perl=TRUE)[[1]]
parts

[1] "ABC_Something" "Filename"

Obtain everything from string after last underscore in every string

Here is one of the possible solutions :

SELECT SUBSTRING(your_field, LOCATE('MOB_',your_field)+4)
FROM your_table;

or with regexp

SELECT REGEXP_SUBSTR(your_field,'(?<=MOB_).*$') 
FROM your_table

EDIT with oracle :

SELECT substr(regexp_substr(your_field,'MOB_(.*)'),5) "result" FROM your_table;

if you don't want to use substr http://sqlfiddle.com/#!4/a4021/20 :

SELECT regexp_substr(your_field,'MOB_(.*)', 1, 1, NULL, 1) "result" FROM your_table;

Edit 2 with oracle :
See more examples here http://sqlfiddle.com/#!4/a4021/31
One with regexp_replace and one with instr (= find in oracle).

Separate string after last underscore

library(tidyr)

df <- data.frame(Name = c("A","B","C"),
Var_1_EVU = c(2,NA,NA),
Var_1_BdS = c(NA,3,4),
Var_2_BdS = c(NA,3,4))

df %>%
gather("type", "value", -Name) %>%
separate(type, into = c("type", "type_num", "var")) %>%
unite(type, type, type_num, sep = "") %>%
spread(type, value)

# Name var Var1 Var2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA

example using tidyr::extract to deal with varnames that have an arbitrary number of underscores...

library(dplyr)
library(tidyr)

df <- data.frame(Name = c("A","B","C"),
Var_x_1_EVU = c(2,NA,NA),
Var_x_1_BdS = c(NA,3,4),
Var_x_y_2_BdS = c(NA,3,4))

df %>%
gather("col_name", "value", -Name) %>%
extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)

# Name type Var_x_1 Var_x_y_2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA

You can avoid a potential problem with duplicate observations by adding a row number column/variable first with mutate(n = row_number()) to make each observation unique, and you can avoid tidyr::extract being masked by magrittr by calling it explictly with tidyr::extract...

library(dplyr)
library(tidyr)
library(data.table)
library(magrittr)

dt <- data.table(Name = c("A", "A", "B", "C"),
Var_1_EVU = c(1, 2, NA, NA),
Var_1_BdS = c(1, NA, 3, 4),
Var_x_2_BdS = c(1, NA, 3, 4))

dt %>%
mutate(n = row_number()) %>%
gather("col_name", "value", -n, -Name) %>%
tidyr::extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)

# Name n type Var_1 Var_x_2
# 1 A 1 BdS 1 1
# 2 A 1 EVU 1 NA
# 3 A 2 BdS NA NA
# 4 A 2 EVU 2 NA
# 5 B 3 BdS 3 3
# 6 B 3 EVU NA NA
# 7 C 4 BdS 4 4
# 8 C 4 EVU NA NA

Splitting the string with last underscore

You can use String#rpartition that searches for a given pattern form the right end of the string and splits when it finds it.

'a_b_c_d_e'.rpartition(/_/)
=> ["a_b_c_d", "_", "e"]

s = 'a_b_c_d_e'
parts = s.rpartition(/_/)
[parts.first, parts.last]
=> ["a_b_c_d", "e"]

EDIT: applying advices from the comments:

'a_b_c_d_e'.rpartition('_').values_at(0,2)
=> ["a_b_c_d", "e"]

How to extract the last part of the string in Excel after the last underscore

You can find the string after the last _ using the RIGHT function. The tricky part is to find the position of the last _.

First, we need to determine, how many _ there are. This can be done be removing all _ and compare the length of both strings:

LEN(A1)-LEN(SUBSTITUTE(A1;"_";""))

Since we now know that it is the second occurrence of _ that we have to look for, we can use the SUBSTITUTE function to replace the second occurrence of _ with another character (this has to be a character that is not part of your original string - I have chosen#).

Now, we use the FIND function to search for the position of the # character. This position can now be delivered to the RIGHT function.

Your final formula will be:

=RIGHT(A1;LEN(A1)-FIND("#";SUBSTITUTE(A1;"_";"#";LEN(A1)-LEN(SUBSTITUTE(A1;"_";"")));1))

Get the last string after a character with multiple occurrrences in Excel

How can I remove string after last underscore in python dataframe?

pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))

Explaination:

df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})

Creates

    col
0 AA_XX
1 AAA_BB_XX
2 AA_BB_XYX
3 AA_A_B_YXX

Use apply in order to loop through the column you want to edit.

I broke the string at _ and then joined all parts leaving the last part at _

df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))
print(df)

Results:

    col
0 AA
1 AAA_BB
2 AA_BB
3 AA_A_B

If your dataset contains values like AA (values without underscore).

Change the lambda like this

df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX', 'AA']})
df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]) if len(r.split('_')) > 1 else r)
print(df)

Regex substring from last underscore occurrence till nth underscore (from the back)

This is similar to @Kat's approach in the comment but using a function to make it dynamic.

string <- "trash_trash_trash_keep_keep_keep_trash.trash" 

return_last_n_words <- function(x, n) {
strsplit(x, '_')[[1]] |> head(-1) |> tail(n) |> paste0(collapse = "_")
}

return_last_n_words(string, 3)
#[1] "keep_keep_keep"

return_last_n_words(string, 4)
#[1] "trash_keep_keep_keep"

return_last_n_words(string, 2)
#[1] "keep_keep"

The idea is to split the string by underscore (_), drop the last part, select last n words and paste it in one string.

How to get a substring after the last underscore (_) in unix shell script

You can do

s='this_is_test_string1_22'

In BASH:

echo "${s##*_}"
22

OR using sed:

sed 's/^.*_\([^_]*\)$/\1/' <<< 'this_is_test_string1_22'
22

EDIT for sh:

echo "$s" | sed 's/^.*_\([^_]*\)$/\1/'


Related Topics



Leave a reply



Submit