Regarding SQLdf Package/Regexp Function

Replace String Pattern in sqldf

1) Assuming that only the forms shown in the question are allowed replace colons with minus signs and then replace minus followed by space with space, minus, space.

library(sqldf)
sqldf("select *, replace(replace([Col1], ':', '-'), '- ', ' - ') as New from df")

giving:

      Col1 Col2 Col3       New
1 ten: end 5 10 ten - end
2 five: nb 7 11 five - nb
3 12:4 12 10 12-4
4 13:56 15 16 13-56

2) If we can assume that the only forms are number:number or character: character and that the second form contains no digits.

sqldf("select *, 
case when strFilter(Col1, '0123456789') = ''
then replace(Col1, ':', ' -')
else replace(Col1, ':', '-')
end as New
from df")

giving:

      Col1 Col2 Col3       New
1 ten: end 5 10 ten - end
2 five: nb 7 11 five - nb
3 12:4 12 10 12-4
4 13:56 15 16 13-56

3) This first checks for numbers:numbers and then checks for characters: characters where characters can only be numbers or lower case letters.

dig <- "0123456789"
diglet <- "0123456789abcdefghijklmnopqrstuvwxyz"

fn$sqldf("select *,
case when trim(Col1, '$dig') = ':'
then replace(Col1, ':', '-')
when trim(Col1, '$diglet') = ': '
then replace(Col1, ': ', ' - ')
else Col1 end as New
from df")

giving:

      Col1 Col2 Col3       New
1 ten: end 5 10 ten - end
2 five: nb 7 11 five - nb
3 12:4 12 10 12-4
4 13:56 15 16 13-56

4) This one extracts the x:y and checks whether x and y are number and if so does the appropriate replacement and if no match it extracts x:yz where y is a space and if x and z are digits or lower case then it performs the appropriate replacement and otherwise returns Col1. dig and diglet are from above.

fn$sqldf("select *, 
case when trim(substr(Col1, instr(Col1, ':')-1, 3), '$dig') = ':'
then replace(Col1, ':', '-')
when trim(substr(Col1, instr(Col1, ':') -1, 4), '$diglet') = ': '
then replace(Col1, ': ', ' - ')
else Col1 end as New
from df")

Note

The input in reproducible form is:

Lines <- "Col1,Col2,Col3
ten: end,5,10
five: nb,7,11
12:4,12,10
13:56,15,16"
df <- read.csv(text = Lines, as.is = TRUE, strip.white = TRUE)

If Column contains - at the end of a value, remove the - at the end - sqldf

I think you looking for rtrim

library(sqldf)
df_new<- sqldf("select ColB,rtrim(ColA,'-') as ColA from df")
ColB ColA
1 3 djdn
2 5 dn-dn
3 8 ndmc
4 9 nd-nd-md

Replace parts of a string in R

Using stringr:

stringr::str_replace_all(input, "keyword(\\d)", "tiab LIKE '%keyword\\1%'")

# "((tiab LIKE '%keyword1%') OR (tiab LIKE '%keyword2%') AND (tiab LIKE '%keyword3%'))"

For your other example:

input <- "((ELISA) OR (antibody test) AND (blood))"

str_replace_all(input, "\\(([^\\(\\)]+)\\)", "(tiab LIKE '%\\1%')")

# "((tiab LIKE '%ELISA%') OR (tiab LIKE '%antibody test%') AND (tiab LIKE '%blood%'))"

How do I use regex in a SQLite query?

SQLite3 supports the REGEXP operator:

WHERE x REGEXP <regex>

http://www.sqlite.org/lang_expr.html#regexp

Applying Instr function in reverse order using sqldf

1) H2 backend If you use the H2 database backend to sqldf then the select statement in the question will work as written in the question:

library(sqldf)
library(RH2) # if H2 database is loaded sqldf will notice it and use it

sqldf("SELECT
substr(Quiz_answers, instr(Quiz_answers,'`',-1) + 1) AS HistoryAnswer
FROM data")

1a) H2 also supports the regexp_replace and regexp_like functions in a select statement.

2) SQLite backend If you are using the default SQLite backend to sqldf then trim off the first backtick from the left in which case the original second backtick becomes the first and we can use instr. (Alternately we could use substr(Quiz_answers, 2) in place of ltrim(...) .)

library(sqldf)  # if no other database is loaded sqldf uses the RSQLite backend

sqldf("SELECT
substr(Quiz_answers, instr(ltrim(Quiz_answers, '`'), '`') + 2) AS HistoryAnswers
from data")

2a) An alternative if the part between the backticks contains only a, digits, comma and backtick is the following

sqldf("select ltrim(Quiz_answers, '`a1234567890,') as HistoryAnswers from data")

(If in reality it could contain any letter but not an upper case letter it would still work if you replace a with all the letters of the alphabet in the ltrim second argument.)

2b) Yet another approach with SQLite would be:

sqldf("select 
case
when Quiz_answers like '%Positive' then 'Positive'
when Quiz_answers like '%Negative' then 'Negative'
else 'Neutral'
end as HistoryAnswers
from data")

This would also work under H2.

2c) or if it is OK to return +1, 0, -1 rather than Positive, Neutral, Negative then:

sqldf("select 
(Quiz_answers like '%Positive') - (Quiz_answers like '%Negative') as HistoryAnswers
from data")

This would work in H2 if we cast the two booleans to int.

2d) Another approach is to reverse the string using reverse like this:

sqldf("select 
substr(Quiz_answers, length(Quiz_answers) - instr(reverse(Quiz_answers), '`') + 2)
as HistoryAnswers
from data")


Related Topics



Leave a reply



Submit