How do I extract specific parts of text from a string in R?
As @Mark Neal mentioned, this is a task you can solve by using regular expressions. I'm not very skilled in using regex, but perhaps I can give you some insights:
library(tidyverse)
text <- c('"identification"":""138""city"":""New-York"":COMMENT""text"":""Very good!""COMMENT""text"":""It was delicious""guests"":""2""')
city <- text %>% str_extract('(?<=city"":"").*(?="":COMMENT"")')
comment_1 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""COMMENT"")')
comment_2 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""guests"")') %>% str_extract('(?<=COMMENT""text"":"").*')
df <- data.frame(city=city, comment_1=comment_1, comment_2=comment_2)
What did I do?
city
str_extract('(?<=city"":"").*(?="":COMMENT"")')
I search for city"":""
and "":COMMENT""
and return everything inbetween:
[1] "New-York"
comment 1
comment_1 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""COMMENT"")')
Same for COMMENT""text"":""
and ""COMMENT""
which yields
[1] "Very good!"
comment 2
Since I couldn't figure out how to get the desired result with one regex, I had to iterate.
comment_2 <- text %>% str_extract('(?<=COMMENT""text"":"").*(?=""guests"")') %>% str_extract('(?<=COMMENT""text"":"").*')
The first iteration COMMENT""text"":""
and ""guests""
returns
[1] "Very good!\"\"COMMENT\"\"text\"\":\"\"It was delicious"
since the regex is greedy i.e. it returns the maximum possible string matching the pattern.
So the next iteration with COMMENT""text"":""
only returns just the desired last comment:
[1] "It was delicious"
How to get a specific part of a string in Powershell
One option is to use switch -regex. We can use the regex pattern below to match the text you need.
Pattern -
(?<=\()\S{3,5}(?=[-\)])
- Positive Lookbehind
(?<=\()
(Assert that the Regex below matches)
\(
matches the character(
\S
matches any non-whitespace character (equivalent to [^\r\n\t\f\v ]){3,5}
matches the previous token between 3 and 5 times, as many times as possible, giving back as needed (greedy)- Positive Lookahead
(?=[-\)])
(Assert that the Regex below matches)
[-\)]
will match either a-
or)
Code
# some data to work with
$text = @'
some text of any length (need)
some text of any length (need-notneed)
some text of any length (notneed) (need)
some text of any length (notneed) (need-notneed)
some text of any length (RTX)
some text of any length (EOWD-notneed)
some text of any length (notneed) (12345)
some text of any length (notneed) (D.D03-notneed)
some text without any match
'@ -split '\r?\n'
# regex pattern that matches you requirement
$pattern = '(?<=\()\S{3,5}(?=[-\)])'
$results = switch -Regex ($text) {
$pattern {
[PSCustomObject]@{
String = $_
Match = $Matches[0]
}
}
Default { }
}
PS > $results
String Match
------ -----
some text of any length (need) need
some text of any length (need-notneed) need
some text of any length (notneed) (need) need
some text of any length (notneed) (need-notneed) need
some text of any length (RTX) RTX
some text of any length (EOWD-notneed) EOWD
some text of any length (notneed) (12345) 12345
some text of any length (notneed) (D.D03-notneed) D.D03
PS > $results.Match
need
need
need
need
RTX
EOWD
12345
D.D03
Get Specific part of a String in Javascript
myString.split("\n");
You'll get an array of 3 parts.
Extract substring in Bash
Use cut:
echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2
More generic:
INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING
How would I get everything before a : in a string Python
Just use the split
function. It returns a list, so you can keep the first element:
>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'
Getting a substring from a string after a particular word
yourString.substring(yourString.indexOf("no") + 3 , yourString.length());
Java: Getting a substring from a string starting after a particular character
String example = "/abc/def/ghfj.doc";
System.out.println(example.substring(example.lastIndexOf("/") + 1));
Related Topics
Serial Port Doesn't Work Properly After Reboot, Unless I Execute Minicom
How to Correctly Cleanup and Re-Use Sysv Shared Memory Segments
Colored Shell Script Output Library
Reading Data from PDF Files into R
How to View Log Files in Linux and Apply Custom Filters While Viewing
Changing Name of the Video While Downloading via Youtube-Dl
"Git Add" Returning "Fatal: Outside Repository" Error
How to Convert Pe(Portable Executable) Format to Elf in Linux
What Is the Fastest Way to Find All the File with the Same Inode
Black Color Showing on Cmy Channels When Converted to Cmyk Using Ghostscript
How to Use "Py" Instead of "Python" at the Command Line in Linux
Bash Script to Calculate Time Elapsed
Installing Node.Js on Debian 6.0
How to Count Number of Unique Values of a Field in a Tab-Delimited Text File