Perl-Mechanize Runs into Limitations - Several Debugging Attempts Started

perl-mechanize runs into limitations - several debugging attempts started

For one thing, your input contains slashes and then you are trying to use that input to create a filename. Since your input begins with "http://www" and not "www", your substitution operation doesn't do anything, either.

my $name = "$_";            # e.g. $name <= "http://www.zug.phz.ch"
$name =~s/^www\.//; # $name still is "http://www.zug.phz.ch"
$name .= ".png"; # $name is ""http://www.zug.phz.ch.png"
open(OUTPUT, ">$name"); # error: no directory named "./http:"
print OUTPUT $png;
sleep (5);

You'll want to do a better job of sanitizing your filename. Maybe something like

$name =~ s![:/]+!-!g; #http://foo.com/bar.html  becomes  http-foo.com-bar.html

And if anything, you return value you want to check is in the open call inside your while loop. If you had said

open(OUTPUT,">$name") or warn "Failed to open '$name': $!";

you probably would have figured this out on your own.

storing failes: binmode() on closed filehandle $out at ... print() on closed filehandle $out

You weren't able to open the file for writing. Your path is /images, and you probably don't have permissions on that directory (if it even exists). Always check the return value of your calls to open, like you did in the first open.

If I were you, I wouldn't use /images. I'd download everything into directory I control and isn't cluttering the standard directory layouts. You should almost never create new directories under / if you aren't doing system administration.

WWW::Mechanize SSL connect attempt failed for https get

Works For Me™ with IO::Socket::SSL 2.052, WWW::Mechanize 1.86, and Net::SSLeay 1.80. I suspect you need to upgrade Net::SSLeay. I'd suggest upgrading all of them.

The differences start here. Yours considers the cert to not be ok.

DEBUG: .../IO/Socket/SSL.pm:2552: did not get stapled OCSP response
DEBUG: .../IO/Socket/SSL.pm:2505: ok=0 cert=102327360

But mine does. The more verbose output is because of my upgraded Net::SSLeay.

DEBUG: .../IO/Socket/SSL.pm:2722: did not get stapled OCSP response
DEBUG: .../IO/Socket/SSL.pm:2675: ok=1 [2] /C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2009 Entrust, Inc. - for authorized use only/CN=Entrust Root Certification Authority - G2/C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2009 Entrust, Inc. - for authorized use only/CN=Entrust Root Certification Authority - G2

That process is handled by Net::SSLeay. It's possible your version of Net::SSLeay is incompatible with your OpenSSL C library. There have been a lot of fixes for compatibility with OpenSSL 1.1 since ActivePerl 5.20.2 came out.

tiny runable www::Mechanize examples for the beginner

could you be a little more specific on what exactly you are after... For instance this is a script to log into a website:

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.test.com";

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'test',passwrd=>'test');
$mech->click();
$mech->save_content("logged_in.html");

This is a script to perform google searches

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = $ARGV[$#ARGV];

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q=';

my @dork = ("inurl:dude","cheese");

#declare necessary variables
my $max = 0;
my $link;
my $sc = scalar(@dork);

#start the main loop, one itineration for every google search
for my $i ( 0 .. $sc ) {

#loop until the maximum number of results chosen isn't reached
while ( $max <= $option ) {
$mech->get( $google . $dork[$i] . "&start=" . $max );

#get all the google results
foreach $link ( $mech->links() ) {
my $google_url = $link->url;
if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
say $google_url;
}
}
$max += 10;
}

}

Simple site crawler extracting information (html comments) from every page:

    #call the mechanize object, with autocheck switched off
#so we don't get error when bad/malformed url is requested
my $mech = WWW::Mechanize->new(autocheck=>0);
my %comments;
my %links;
my @comment;

my $target = "http://google.com";
#store the first target url as not checked
$links{$target} = 0;
#initiate the search
my $url = &get_url();

#start the main loop
while ($url ne "")
{
#get the target url
$mech->get($url);
#search the source for any html comments
my $res = $mech->content;
@comment = $res =~ /<!--[^>]*-->/g;
#store comments in 'comments' hash and output it on the screen, if there are any found
$comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0;
#loop through all the links that are on the current page (including only urls that are contained in html anchor)

foreach my $link ($mech->links())
{
$link = $link->url();
#exclude some irrelevant stuff, such as javascript functions, or external links
#you might want to add checking domain name, to ensure relevant links aren't excluded

if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/)
{
#check whether the link has leading slash so we can build properly the whole url
$link = $link =~ /^\// ? $target.$link : $target."/".$link;
#store it into our hash of links to be searched, unless it's already present
$links{$link} = 0 unless $links{$link};
}
}

#indicate we have searched this url and start over
$links{$url} = 1;
$url = &get_url();
}

sub get_url
{
my $key, my $value;
#loop through the links hash and return next target url, unless it's already been searched
#if all urls have been searched return empty, ending the main loop

while (($key,$value) = each(%links))
{
return $key if $value == 0;
}

return "";
}

It really depends what you are after, but if you want more examples I would refer you to perlmonks.org, where you can find plenty of material to get you going.

Definitely bookmark this though mechanize module man page, it is the ultimate resource...

WWW::Mechanize with SSL works but response is slow

There's a configuration issue that's causing the server's IP address to differ from the one it uses in its certificate. For that reason, I'm ignoring hostname verification for the time being.

Better use the SSL_verifycn_name setting of IO::Socket::SSL to define, which name you expect in the certificate.

That could be the reason why WWW::Mechanize is slow.

Probably not, because you are just disabling checks. Disabling does not make it slower, but enabling would not make it slower too because these checks are fast and don't need any additional network activity.

I'm running on Solaris 10 (sparc), with Perl 5.20.1 and OpenSSL 0.7.9d.

I doubt that you are using 0.7.9d, you probably mean 0.9.7d. It is still a very unusual configuration, that is using a modern Perl with a 10 year old version of OpenSSL. I would suggest that you use a current version instead and maybe your problems go away then.

However, there isn't any slowness when using a regular web browser to perform the same functions.

Current browsers use a modern TLS stack which has more efficient ciphers, session resumption etc. Again, try to use a recent version of OpenSSL instead.

Problems authenticating through API key sent in headers WWW::Mechanize

I wrote a server side script that shows the output of headers from both examples and APIKEY was set identically in both cases. There were some differences in HTTP_ACCEPT / HTTP_ACCEPT_ENCODING and WWW::Mechanize adds some additional headers:

'downgrade-1.0' => '1'
'force-response-1.0' => '1'
'nokeepalive' => '1'

So I would suggest the problem is somewhere else.

running a perl cgi script from a webpage (html)

To use a cgi script on dreamhost, it is sufficient to

  1. give the script a .cgi extension
  2. put the script somewhere visible to the webserver
  3. give the script the right permissions (at least 0755)

You may want to see if you can get a toy script, say,

#!/usr/bin/perl
print "Content-type: text/plain\n\nHello world\n";

working before you tackle debugging your larger script.

That said, something I don't see in your script is the header. I think you'll want to say something like

print "Content-type: text/html\n\n";

before your other print call.



Related Topics



Leave a reply



Submit