Scrapy - Reactor not Restartable
You cannot restart the reactor, but you should be able to run it more times by forking a separate process:
import scrapy
import scrapy.crawler as crawler
from scrapy.utils.log import configure_logging
from multiprocessing import Process, Queue
from twisted.internet import reactor
# your spider
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/tag/humor/']
def parse(self, response):
for quote in response.css('div.quote'):
print(quote.css('span.text::text').extract_first())
# the wrapper to make it run more times
def run_spider(spider):
def f(q):
try:
runner = crawler.CrawlerRunner()
deferred = runner.crawl(spider)
deferred.addBoth(lambda _: reactor.stop())
reactor.run()
q.put(None)
except Exception as e:
q.put(e)
q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()
if result is not None:
raise result
Run it twice:
configure_logging()
print('first run:')
run_spider(QuotesSpider)
print('\nsecond run:')
run_spider(QuotesSpider)
Result:
first run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
...
second run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
...
How to run Scrapy in a while loop
You can remove the while
loop and use callbacks instead.
Edit: Example added:
def callback_f():
# stuff #
calling_f()
def calling_f():
answer = input("Continue? (y/n)")
if not answer == 'n':
callback_f()
callback_f()
Twisted Reactor not restarting in scrapy
Okay, I finally solved my problem.
the Python-telegram-bot API wrapper offers an easy way to restart the bot.
I simply put the lines:
time.sleep(0.2)
os.execl(sys.executable, sys.executable, *sys.argv)
at the end of the doesntRun() function. Now whenever I call the function via bot, it scrapes the page, stores the results, forwards the result, then restarts itself. Doing so allows me to execute the spider any number of times I want.
Scrapy executing different spiders in different times with reactor
This topic helps me to solve the problem. I just need to install crochet and add setup() on top of my code.
Solution link
Related Topics
Why Does This Not Work as an Array Membership Test
On Localhost, How to Pick a Free Port Number
Fill Username and Password Using Selenium in Python
Check If a Process Is Running or Not on Windows
Multiprocessing.Dummy in Python Is Not Utilising 100% Cpu
How to Pick "X" Number of Unique Numbers from a List in Python
Changing Iteration Variable Inside for Loop in Python
Python: Change the Scripts Working Directory to the Script's Own Directory
Matplotlib: Overlay Plots with Different Scales
How to Share Conda Environments Across Platforms
Differencebetween .Quit and .Quit in Pygame
Exif Manipulation Library for Python
Elegant Way to Check If a Nested Key Exists in a Dict
How to Decorate an Instance Method with a Decorator Class
Multiple Inputs and Outputs in Python Subprocess Communicate