Can I use Watir to scrape data from a website on a linux server without monitor?
There are several ways to do this:
Use HtmlUnit, either Celerity or watir-webdriver (through the remote Selenium2/WebDriver server).
Use a real browser + a virtual X server (Xvfb). I'd recommend using watir-webdriver's Firefox driver and the Headless gem for a simple way to control this from Ruby.
This is basically a tradeoff between speed and realism. Personally I'd go with #2 if the site has any complex JavaScript or invalid HTML, but both approaches could be worth investigation.
For the future, I'm keeping an eye on this project, which looks like a terrific idea.
How do I run Firefox browser headless with my Ruby script?
I would look at using Watir-Webdriver instead of just plain Watir or Fire-watir. Especially since the only way to work with newer versions of firefox is going to be via Watir-Webdriver.
There's an earlier SO question where the answer covers just this sort of thing, so I'd suggest trying what is described there there first. Can I use Watir to scrape data from a website on a linux server without monitor?
Also since I now know you are using Mac OS, the advice in this thread from the webdriver google group might be more applicable to you
any scripting language can read AJAX/Java Script? (linux)
Check out TestPlan. It can do testing without a monitor -- by using the HTMLUnit backend. It handles quite a lot of JavaScript, including AJAX. I use it to scrape several pages and have built several tests of AJAX with it.
You can also run TestPlan with a browser if you want. This gives you the best of both worlds: develop tests and visually see what is happening, and then switch to the display-less mode.
Watir-Webdriver EOFError and Errno::ECONNREFUSED
I had success giving each app it's own Xvfb display:
On the server itself:
$ sudo /usr/bin/Xvfb :98 -screen 0 1280x1024x24 -ac &
$ sudo /usr/bin/Xvfb :99 -screen 0 1280x1024x24 -ac &
App 1 - before the browser is being created:
# ~/repo1/whatever.rb
# ...
h = headless(:display => '98')
# ...
App 2 - before the browser is being created:
# ~/repo2/something.rb
# ...
h = headless(:display => '99')
# ...
@chuck-van-der-linden is probably correct, though, that using VMs or similar are a better solution. If I was starting fresh with this architecture, this would be my approach.
Related Topics
Which Stack Is Used by Interrupt Handler - Linux
Rename Multiple Directories Matching Pattern
How to Repeat a Dash (Hyphen) in Shell
How to Get Debugging Symbols Working in Linux Perf Tool Inside Docker Containers
Grep String Inside Double Quotes
Null Modem Emulator (Com0Com) for Linux
How to Identify Multiple Usb-Serial Adapters Under Ubuntu 10.1
Unable to Start Rstudio in Centos Getting Error "Unable to Connect to Service"
Nasm Linux Assembly Printing Integers
Bash Print Stderr Only, Not Stdout
What's the Best Way to Find a String/Regex Match in Files Recursively? (Unix)
Running Shell Script Using .Env File
How to Use Awk for a Compressed File
Coreos - Get Docker Container Name by Pid
X11: Run a Gnome App as Another User
Is There Any Significant Difference Between Tcp_Cork and Tcp_Nodelay in This Use-Case