Phantomjs: Pipe Input

PhantomJS: pipe input

You can do what you're looking for very simply (it's just not really documented) directly in PhantomJS.

var page = require('webpage').create(),
fs = require('fs');

page.viewportSize = { width: 600, height: 600 };
page.paperSize = { format: 'Letter', orientation: 'portrait', margin: '1cm' };

page.content = fs.read('/dev/stdin');

window.setTimeout(function() {
page.render('/dev/stdout', { format: 'pdf' });
phantom.exit();
}, 1);

(May need to increase the timeout if you have images that need loading, etc.)

HTML comes in stdin, PDF binary goes out stdout. You can test it like:

echo "<b>test</b>" | phantomjs makepdf.js > test.pdf && open test.pdf

PhantomJS: exported PDF to stdout

As pointed out by Niko you can use renderBase64() to render the web page to an image buffer and return the result as a base64-encoded string.
But for now this will only work for PNG, JPEG and GIF.

To write something from a phantomjs script to stdout just use the filesystem API.

I use something like this for images :

var base64image = page.renderBase64('PNG');
var fs = require("fs");
fs.write("/dev/stdout", base64image, "w");

I don't know if the PDF format for renderBase64() will be in a future version of phanthomjs but as a workaround something along these lines may work for you:

page.render(output);
var fs = require("fs");
var pdf = fs.read(output);
fs.write("/dev/stdout", pdf, "w");
fs.remove(output);

Where output is the path to the pdf file.

phantomjs pdf to stdout

When writing output to /dev/stdout/ or /dev/stderr/ on Windows, PhantomJS goes through the following steps (as seen in the render method in \phantomjs\src\webpage.cpp):

  1. In absence of /dev/stdout/ and /dev/stderr/ a temporary file path is allocated.
  2. Call renderPdf with the temporary file path.
  3. Render the web page to this file path.
  4. Read the contents of this file into a QByteArray.
  5. Call QString::fromAscii on the byte array and write to stdout or stderr.
  6. Delete the temporary file.

To begin with, I built the source for PhantomJS, but commented out the file deletion. On the next run, I was able to examine the temporary file it had rendered, which turned out to be completely fine. I also tried running phantomjs.exe rasterize.js http://google.com > test.png with the same results. This immediately ruled out a rendering issue, or anything specifically to do with PDFs, meaning that the problem had to be related to the way data is written to stdout.

By this stage I had suspicions about whether there was some text encoding shenanigans going on. From previous runs, I had both a valid and invalid version of the same file (a PNG in this case).

Using some C# code, I ran the following experiment:

//Read the contents of the known good file.
byte[] bytesFromGoodFile = File.ReadAllBytes("valid_file.png");
//Read the contents of the known bad file.
byte[] bytesFromBadFile = File.ReadAllBytes("invalid_file.png");

//Take the bytes from the valid file and convert to a string
//using the Latin-1 encoding.
string iso88591String = Encoding.GetEncoding("iso-8859-1").GetString(bytesFromGoodFile);
//Take the Latin-1 encoded string and retrieve its bytes using the UTF-8 encoding.
byte[] bytesFromIso88591String = Encoding.UTF8.GetBytes(iso88591String);

//If the bytes from the Latin-1 string are all the same as the ones from the
//known bad file, we have an encoding problem.
Debug.Assert(bytesFromBadFile
.Select((b, i) => b == bytesFromIso88591String[i])
.All(c => c));

Note that I used ISO-8859-1 encoding as QT uses this as the default encoding for c-strings. As it turned out, all those bytes were the same. The point of that exercise was to see if I could mimic the encoding steps that caused valid data to become invalid.

For further evidence, I investigated \phantomjs\src\system.cpp and \phantomjs\src\filesystem.cpp.

  • In system.cpp, the System class holds references to, among other things, File objects for stdout, stdin and stderr, which are set up to use UTF-8 encoding.
  • When writing to stdout, the write function of the File object is called. This function supports writing to both text and binary files, but because of the way the System class initializes them, all writing will be treated as though it were going to a text file.

So the problem boils down to this: we need to be performing a binary write to stdout, yet our writes end up being treated as text and having an encoding applied to them that causes the resulting file to be invalid.


Given the problem described above, I can't see any way to get this working the way you want on Windows without making changes to the PhantomJS code. So here they are:

This first change will provide a function we can call on File objects to explicitly perform a binary write.

Add the following function prototype in \phantomjs\src\filesystem.h:

bool binaryWrite(const QString &data);

And place its definition in \phantomjs\src\filesystem.cpp (the code for this method comes from the write method in this file):

bool File::binaryWrite(const QString &data)
{
if ( !m_file->isWritable() ) {
qDebug() << "File::write - " << "Couldn't write:" << m_file->fileName();
return true;
}

QByteArray bytes(data.size(), Qt::Uninitialized);
for(int i = 0; i < data.size(); ++i) {
bytes[i] = data.at(i).toAscii();
}
return m_file->write(bytes);
}

At around line 920 of \phantomjs\src\webpage.cpp you'll see a block of code that looks like this:

    if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
#endif

((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));

#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}

Change it to this:

   if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
((File *)system->_stdout())->binaryWrite(QString::fromAscii(ba.constData(), ba.size()));
#elif
((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#endif

#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}

So what that code replacement does is calls our new binaryWrite function, but does so guarded by a #ifdef Q_OS_WIN32 block. I did it this way so as to preserve the old functionality on non-Windows systems which don't seem to exhibit this problem (or do they?). Note that this fix only applies to writing to stdout - if you want to you could always apply it to stderr but it may not matter quite so much in that case.

In case you just want a pre-built binary (who wouldn't?), you can find phantomjs.exe with these fixes on my SkyDrive. My version is around 19MB whereas the one I downloaded earlier was only about 6MB, though I followed the instructions here, so it should be fine.

Execute a script in phantomjs interactive (REPL) mode

Looks like REPL mode is borked and and an overhaul is underway:

https://github.com/ariya/phantomjs/issues/11180

Parse output of spawned node.js child process line by line

Try this:

cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function(data) {
var str = data.toString(), lines = str.split(/(\r?\n)/g);
for (var i=0; i<lines.length; i++) {
// Process the line, noting it might be incomplete.
}
});

Note that the "data" event might not necessarily break evenly between lines of output, so a single line might span multiple data events.



Related Topics



Leave a reply



Submit