How can I create a nokogiri case insensitive Xpath selector?
Wrapped for legibility:
puts page.parser.xpath("
//meta[
translate(
@name,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz'
) = 'keywords'
]
").to_html
There is no "to lower case" function in XPath 1.0, so you have to use translate()
for this kind of thing. Add accented letters as necessary.
How do I make my Nokogiri :contains case insensitive?
With CSS selector rules this should not be possible as far as I know. But XPath 2.0 would able to check for text case insensitive either by transforming the text content with upper-case()
or using matches()
with third parameter 'i'
instead of contains()
, which will match with a case insensitive regular expression. Nokogiri internally transforms CSS selectors into an XPath query, so your example becomes //a[contains(., "MY TEXT")
. However, Nokogiri's XML features are based on libxml2
(MRI Ruby) or javax.xml.xpath
(JRuby) which do not support Xpath 2.0.
If this was supported you could just replace the CSS selector with this XPath query:
//a[contains(upper-case(.), "MY TEXT")]
But you can just implement the text comparison directly in ruby like this:
a_elt = doc.xpath('//a').detect { |node| /MY TEXT/i === node.text }
How can I create a nokogiri case insensitive text * search?
The lower-case
XPath function is not available but you can use the translate
XPath 1.0 function to convert your text to lowercase e.g. for the English alphabet:
translate(text(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')
I couldn't seem to use this in combination with the *=
operator but you can use contains
to do a substring search instead, making the full thing:
doc.search("//*[contains(translate(text(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'philip morris')]")
How do I write a CSS selector that looks for an element starting with text in a case-insensitive way?
Summary It's ugly. You're better off just using Ruby:
doc.css('select#select_id > option').select{ |opt| opt.text =~ /^ABC/i }
Details
Nokogiri uses libxml2, which uses XPath to search XML and HTML documents. Nokogiri transforms ~CSS expressions into XPath. For example, for your ~CSS selector, this is what Nokogiri actually searches for:
Nokogiri::CSS.xpath_for("#select_id option:starts-with('ABC')")
#=> ["//*[@id = 'select_id']//option[starts-with(., 'ABC')]"]
The expression you wrote is not actually CSS. There is no :starts-with()
pseudo-class in CSS, not even proposed in Selectors 4. What there is is the starts-with()
function in XPath, and Nokogiri is (somewhat surprisingly) allowing you to mix XPath functions into your CSS and carrying them over to the XPath it uses internally.
The libxml2 library is limited to XPath 1.0, and in XPath 1.0 case-insensitive searches are done by translating all characters to lowercase. The XPath expression you'd want is thus:
//select[@id='select_id']/option[starts-with(translate(.,'ABC','abc'),'abc')]
(Assuming you only care about those characters!)
I'm not sure that you CAN write CSS+XPath in a way that Nokogiri would produce that expression. You'd need to use the xpath
method and feed it that query.
Finally, you can create your own custom CSS pseudo-classes and implement them in Ruby. For example:
class MySearch
def insensitive_starts_with(nodes, str)
nodes.find_all{ |n| n.text =~ /^#{Regex.escape(str)}/i }
end
end
doc.css( "select#select_id > option:insensitive_starts_with('ABC')", MySearch )
...but all this gives you is re-usability of your search code.
How can I make all XML tags lowercase in Nokogiri?
If you want to transform your xml document by downcase'ing all tag names, here's one way to do it:
parsed = Nokogiri::XML.parse(xml_content)
parsed.traverse do |node|
node.name = node.name.downcase if node.kind_of?(Nokogiri::XML::Element)
end
How to match a case insensitive value with XPath
Scrapy Selectors are built over the libxml2 library, which, AFAIK, doesn't support XPath 2.0. At least libxslt does not for sure.
You can use XPath 1.0 translate() to solve this. In general it will look like:
translate(yourString,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz')
Related Topics
Why Is Enumerable#Each_With_Object Deprecated
Creating Permutations from a Multi-Dimensional Array in Ruby
Has Anyone Successfully Deployed a Rails Project with Ruby 1.9.1
Dry Way to Assign Hash Values to an Object
Rails Active Admin CSS Conflicting with Twitter Bootstrap CSS
Decrypting Salted Aes File Generated on Command Line with Ruby
Why Won't a Longer Token in an Alternation Be Matched
What's the Difference Between Rspec's Subject and Let? When Should They Be Used or Not
What Is Recursion and How Does It Work
Populating an Association with Children in Factory_Girl
Xpath Expression for Regex-Like Matching
Using Rbenv Doesn't Work with Sudo
What Does ::Myclass Ruby Scope Operator Do
How to Use Controller Specific Stylesheets in Rails 3.2.1
How Does This Ruby Injection Magic Work