Lxml Error "Ioerror: Error Reading File" When Parsing Facebook Mobile in a Python Scraper Script

lxml error IOError: Error reading file when parsing facebook mobile in a python scraper script

This is your problem:

tree = etree.parse(body)

The documentation says that "source is a filename or file object containing XML data." You have provided a string, so lxml is taking the text of your HTTP response body as the name of the file you wish to open. No such file exists, so you get an IOError.

The error message you get even says "Error reading file" and then gives your XML string as the name of the file it's trying to read, which is a mighty big hint about what's going on.

You probably want etree.XML(), which takes input from a string. Or you could just do tree = etree.parse(res) to read directly from the HTTP request into lxml (the result of opener.open() is a file-like object, and etree.parse() should be perfectly happy to consume it).

IOError passing requests Response.content to lxml.etree.parse()

etree.parse expects a filename, a file-like object, or a URL as its first argument (see help(etree.parse)). It does not expect an XML string. To parse an XML string use

xmlObject = etree.fromstring(r.content)

Note that etree.fromstring returns a lxml.etree._Element. In contrast, etree.parse returns a lxml.etree._ElementTree. Given the _Element, you can obtain the _ElementTree with the getroottree method:

xmlTree = xmlObject.getroottree()

Multithreaded lxml scraper executes without any error or output

Try changing your last if statement to

if __name__ == '__main__'

instead of '__name__'



Related Topics



Leave a reply



Submit