Scrapy - how to manage cookies/sessions
Three years later, I think this is exactly what you were looking for:
http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#std:reqmeta-cookiejar
Just use something like this in your spider's start_requests method:
for i, url in enumerate(urls):
yield scrapy.Request("http://www.example.com", meta={'cookiejar': i},
callback=self.parse_page)
And remember that for subsequent requests, you need to explicitly reattach the cookiejar each time:
def parse_page(self, response):
# do some processing
return scrapy.Request("http://www.example.com/otherpage",
meta={'cookiejar': response.meta['cookiejar']},
callback=self.parse_other_page)
Scrapy to manage session cookies with full webkit javascript execution
Actually, the first approach does work, but with one modification. The path to the cookies needs to be '/' (at least in my application), and not 'None' as in the code above. Ie, the line should be
libsoup.soup_cookie_jar_add_cookie(cookiejar, libsoup.soup_cookie_new(cookiename,cookieval,up.hostname,'/',-1))
Unfortunately this only pushes the question back a bit. Now the cookies are saved properly, but the full page (including the frames) is still not being loaded and rendered with webkit as I had expected, and so the DOM is not complete as I see it in within the browser. If I simply request the frame that I want, then I get the error page instead of the content that is shown in a real browser. I'd love to see how to use webkit to render the whole page, including frames. Or how to achieve the second approach, completing the entire session in webkit.
Related Topics
Validation of a Password - Python
How to Convert a List to a List of Tuples
Case Insensitive Flask-Sqlalchemy Query
How to Print the Key-Value Pairs of a Dictionary in Python
Is Generator.Next() Visible in Python 3
Difference Between Python3 and Python3M Executables
No Module Named When Using Pyinstaller
Catch Exception and Continue Try Block in Python
Download and Save PDF File with Python Requests Module
How to Control the Mouse in MAC Using Python
Sending Mail from Python Using Smtp
How to Load/Edit/Run/Save Text Files (.Py) into an Ipython Notebook Cell
Spark Iteration Time Increasing Exponentially When Using Join
Preprocessing in Scikit Learn - Single Sample - Depreciation Warning