Encoding XPath Expressions with both single and double quotes
Wow, you all sure are making this complicated. Why not just do this?
public static string XpathExpression(string value)
{
if (!value.Contains("'"))
return '\'' + value + '\'';
else if (!value.Contains("\""))
return '"' + value + '"';
else
return "concat('" + value.Replace("'", "',\"'\",'") + "')";
}
.NET Fiddle & test
Simultaneously escape double and single quotes in Xpath
The key here is realising that with xml2 you can write back into the parsed html with html-escaped characters. This function will do the trick. It's longer than it needs to be because I've included comments and some type checking / converting logic.
contains_text <- function(node_set, find_this)
{
# Ensure we have a nodeset
if(all(class(node_set) == c("xml_document", "xml_node")))
node_set %<>% xml_children()
if(class(node_set) != "xml_nodeset")
stop("contains_text requires an xml_nodeset or xml_document.")
# Get all leaf nodes
node_set %<>% xml_nodes(xpath = "//*[not(*)]")
# HTML escape the target string
find_this %<>% {gsub("\"", """, .)}
# Extract, HTML escape and replace the nodes
lapply(node_set, function(node) xml_text(node) %<>% {gsub("\"", """, .)})
# Now we can define the xpath and extract our target nodes
xpath <- paste0("//*[contains(text(), \"", find_this, "\")]")
new_nodes <- html_nodes(node_set, xpath = xpath)
# Since the underlying xml_document is passed by pointer internally,
# we should unescape any text to leave it unaltered
xml_text(node_set) %<>% {gsub(""", "\"", .)}
return(new_nodes)
}
Now:
library(rvest)
library(xml2)
html %>% xml2::read_html() %>% contains_text(target)
#> {xml_nodeset (1)}
#> [1] <div>Fat"her's son</div>
html %>% xml2::read_html() %>% contains_text(target) %>% xml_text()
#> [1] "Fat\"her's son"
ADDENDUM
This is an alternative method, which is an implementation of the method suggested by @Alejandro but allows arbitrary targets. It has the merit of leaving the xml document untouched, and is a little faster than the above method, but involves the kind of string parsing that an xml library is supposed to prevent. It works by taking the target, splitting it after each "
and '
, then enclosing each fragment in the opposite type of quote to the one it contains before pasting them all back together with commas and inserting them into an XPath concatenate
function.
library(stringr)
safe_xpath <- function(target)
{
target %<>%
str_replace_all("\"", ""&break;") %>%
str_replace_all("'", "&apo;&break;") %>%
str_split("&break;") %>%
unlist()
safe_pieces <- grep("(")|(&apo;)", target, invert = TRUE)
contain_quotes <- grep(""", target)
contain_apo <- grep("&apo;", target)
if(length(safe_pieces) > 0)
target[safe_pieces] <- paste0("\"", target[safe_pieces], "\"")
if(length(contain_quotes) > 0)
{
target[contain_quotes] <- paste0("'", target[contain_quotes], "'")
target[contain_quotes] <- gsub(""", "\"", target[contain_quotes])
}
if(length(contain_apo) > 0)
{
target[contain_apo] <- paste0("\"", target[contain_apo], "\"")
target[contain_apo] <- gsub("&apo;", "'", target[contain_apo])
}
fragment <- paste0(target, collapse = ",")
return(paste0("//*[contains(text(),concat(", fragment, "))]"))
}
Now we can generate a valid xpath like this:
safe_xpath(target)
#> [1] "//*[contains(text(),concat('Fat\"',\"her'\",\"s son\"))]"
so that
html %>% xml2::read_html() %>% html_nodes(xpath = safe_xpath(target))
#> {xml_nodeset (1)}
#> [1] <div>Fat"her's son</div>
XPath attribute quoting in JavaScript
\
is not an escape character in XPath string literals. (If it was, you could just backslash-escape one of the quotes, and never have to worry about concat
!) "\"
is a complete string in itself, which is then followed by 'hi...
, which doesn't make sense.
So there should be no backslashes in your output, it should look something like:
concat('"', "'hi'", '"')
I suggest:
function xpathStringLiteral(s) {
if (s.indexOf('"')===-1)
return '"'+s+'"';
if (s.indexOf("'")===-1)
return "'"+s+"'";
return 'concat("'+s.replace(/"/g, '",\'"\',"')+'")';
}
It's not quite as efficient as it might be (it'll include leading/trailing empty string segments if the first/last character is a double-quote), but that's unlikely to matter.
(Do you really mean let
in the above? This is a non-standard Mozilla-only langauge feature; one would typically use var
.)
How to write xpath for this particular webelement which has a double quotes?
You can escape double quotes with a backslash://*[@ng-click=\"navigateToNewCustomer('New Customer')\"].click();
Using quotes in Xpath
presumably this is all inside a string itself, so would this work?
"v:MapLink[@Entity=\"TOM'S RESTAURANT\"]"
Related Topics
How to Check If Another Instance of the Application Is Running
Does .Net's Httpwebresponse Uncompress Automatically Gziped and Deflated Responses
How to Have a Loop in a Windows Service Without Using the Timer
Open Link in New Tab (Webbrowser Control)
How to Detect Which .Net Runtime Is Being Used (Ms VS. Mono)
Why Does the Linq Cast<> Helper Not Work with the Implicit Cast Operator
How to Make Texture2D Readable via Script
How to Make 'Always-On-Bottom'-Window
Date Difference in Years Using C#
What Happens While Waiting on a Task's Result
Expose and Raise Event of a Child Control in a Usercontrol in C#
How to Concatenate Two System.Io.Stream Instances into One
Asynchronous Iterator Task<Ienumerable<T>>
How to Call the Parent Version of an Overridden Method? (C# .Net)