Grabbing Specific Sections of Text from a String

How do I extract specific parts of text from a string in R?

As @Mark Neal mentioned, this is a task you can solve by using regular expressions. I'm not very skilled in using regex, but perhaps I can give you some insights:

library(tidyverse)
text <- c('"identification"":""138""city"":""New-York"":COMMENT""text"":""Very good!""COMMENT""text"":""It was delicious""guests"":""2""')

city <- text %>% str_extract('(?<=city"":"").*(?="":COMMENT"")')
comment_1 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""COMMENT"")')
comment_2 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""guests"")') %>% str_extract('(?<=COMMENT""text"":"").*')

df <- data.frame(city=city, comment_1=comment_1, comment_2=comment_2)

What did I do?

city

str_extract('(?<=city"":"").*(?="":COMMENT"")')

I search for city"":"" and "":COMMENT"" and return everything inbetween:

[1] "New-York"

comment 1

comment_1 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""COMMENT"")')

Same for COMMENT""text"":"" and ""COMMENT"" which yields

[1] "Very good!"

comment 2

Since I couldn't figure out how to get the desired result with one regex, I had to iterate.

comment_2 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""guests"")') %>% str_extract('(?<=COMMENT""text"":"").*')

The first iteration COMMENT""text"":"" and ""guests"" returns

[1] "Very good!\"\"COMMENT\"\"text\"\":\"\"It was delicious"

since the regex is greedy i.e. it returns the maximum possible string matching the pattern.
So the next iteration with COMMENT""text"":"" only returns just the desired last comment:

[1] "It was delicious"

How to get a specific part of a string in Powershell

One option is to use switch -regex. We can use the regex pattern below to match the text you need.

Pattern - (?<=\()\S{3,5}(?=[-\)])

  • Positive Lookbehind (?<=\() (Assert that the Regex below matches)
    • \( matches the character (
  • \S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])
  • {3,5} matches the previous token between 3 and 5 times, as many times as possible, giving back as needed (greedy)
  • Positive Lookahead (?=[-\)]) (Assert that the Regex below matches)
    • [-\)] will match either a - or )

Code

# some data to work with
$text = @'
some text of any length (need)
some text of any length (need-notneed)
some text of any length (notneed) (need)
some text of any length (notneed) (need-notneed)
some text of any length (RTX)
some text of any length (EOWD-notneed)
some text of any length (notneed) (12345)
some text of any length (notneed) (D.D03-notneed)
some text without any match
'@ -split '\r?\n'

# regex pattern that matches you requirement
$pattern = '(?<=\()\S{3,5}(?=[-\)])'

$results = switch -Regex ($text) {
$pattern {
[PSCustomObject]@{
String = $_
Match = $Matches[0]
}
}
Default { }
}
PS > $results

String Match
------ -----
some text of any length (need) need
some text of any length (need-notneed) need
some text of any length (notneed) (need) need
some text of any length (notneed) (need-notneed) need
some text of any length (RTX) RTX
some text of any length (EOWD-notneed) EOWD
some text of any length (notneed) (12345) 12345
some text of any length (notneed) (D.D03-notneed) D.D03

PS > $results.Match

need
need
need
need
RTX
EOWD
12345
D.D03

Get Specific part of a String in Javascript

myString.split("\n");

You'll get an array of 3 parts.

Extract substring in Bash

Use cut:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

How would I get everything before a : in a string Python

Just use the split function. It returns a list, so you can keep the first element:

>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'

Getting a substring from a string after a particular word

yourString.substring(yourString.indexOf("no") + 3 , yourString.length());

Java: Getting a substring from a string starting after a particular character

String example = "/abc/def/ghfj.doc";
System.out.println(example.substring(example.lastIndexOf("/") + 1));


Related Topics



Leave a reply



Submit