Reactornotrestartable Error in While Loop with Scrapy

Scrapy - Reactor not Restartable

You cannot restart the reactor, but you should be able to run it more times by forking a separate process:

import scrapy
import scrapy.crawler as crawler
from scrapy.utils.log import configure_logging
from multiprocessing import Process, Queue
from twisted.internet import reactor

# your spider
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/tag/humor/']

def parse(self, response):
for quote in response.css('div.quote'):
print(quote.css('span.text::text').extract_first())

# the wrapper to make it run more times
def run_spider(spider):
def f(q):
try:
runner = crawler.CrawlerRunner()
deferred = runner.crawl(spider)
deferred.addBoth(lambda _: reactor.stop())
reactor.run()
q.put(None)
except Exception as e:
q.put(e)

q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()

if result is not None:
raise result

Run it twice:

configure_logging()

print('first run:')
run_spider(QuotesSpider)

print('\nsecond run:')
run_spider(QuotesSpider)

Result:

first run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
...

second run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
...

How to run Scrapy in a while loop

You can remove the while loop and use callbacks instead.

Edit: Example added:

def callback_f():
# stuff #
calling_f()

def calling_f():
answer = input("Continue? (y/n)")
if not answer == 'n':
callback_f()

callback_f()

Twisted Reactor not restarting in scrapy

Okay, I finally solved my problem.

the Python-telegram-bot API wrapper offers an easy way to restart the bot.

I simply put the lines:

time.sleep(0.2)
os.execl(sys.executable, sys.executable, *sys.argv)

at the end of the doesntRun() function. Now whenever I call the function via bot, it scrapes the page, stores the results, forwards the result, then restarts itself. Doing so allows me to execute the spider any number of times I want.

Scrapy executing different spiders in different times with reactor

This topic helps me to solve the problem. I just need to install crochet and add setup() on top of my code.

Solution link



Related Topics



Leave a reply



Submit