Generating PDF Files with JavaScript

Generating PDF files with JavaScript

I've just written a library called jsPDF which generates PDFs using Javascript alone. It's still very young, and I'll be adding features and bug fixes soon. Also got a few ideas for workarounds in browsers that do not support Data URIs. It's licensed under a liberal MIT license.

I came across this question before I started writing it and thought I'd come back and let you know :)

Generate PDFs in Javascript

Example create a "Hello World" PDF file.

// Default export is a4 paper, portrait, using milimeters for units

var doc = new jsPDF()

doc.text('Hello world!', 10, 10)

doc.save('a4.pdf')

<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/1.3.5/jspdf.debug.js"></script>

Generate pdf from HTML in div using Javascript

jsPDF is able to use plugins. In order to enable it to print HTML, you have to include certain plugins and therefore have to do the following:

Go to https://github.com/MrRio/jsPDF and download the latest Version.
Include the following Scripts in your project:
- jspdf.js
- jspdf.plugin.from_html.js
- jspdf.plugin.split_text_to_size.js
- jspdf.plugin.standard_fonts_metrics.js

If you want to ignore certain elements, you have to mark them with an ID, which you can then ignore in a special element handler of jsPDF. Therefore your HTML should look like this:

<!DOCTYPE html>
<html>
  <body>
    <p id="ignorePDF">don't print this to pdf</p>
    <div>
      <p><font size="3" color="red">print this to pdf</font></p>
    </div>
  </body>
</html>

Then you use the following JavaScript code to open the created PDF in a PopUp:

var doc = new jsPDF();          
var elementHandler = {
  '#ignorePDF': function (element, renderer) {
    return true;
  }
};
var source = window.document.getElementsByTagName("body")[0];
doc.fromHTML(
    source,
    15,
    15,
    {
      'width': 180,'elementHandlers': elementHandler
    });

doc.output("dataurlnewwindow");

For me this created a nice and tidy PDF that only included the line 'print this to pdf'.

Please note that the special element handlers only deal with IDs in the current version, which is also stated in a GitHub Issue. It states:

Because the matching is done against every element in the node tree, my desire was to make it as fast as possible. In that case, it meant "Only element IDs are matched" The element IDs are still done in jQuery style "#id", but it does not mean that all jQuery selectors are supported.

Therefore replacing '#ignorePDF' with class selectors like '.ignorePDF' did not work for me. Instead you will have to add the same handler for each and every element, which you want to ignore like:

var elementHandler = {
  '#ignoreElement': function (element, renderer) {
    return true;
  },
  '#anotherIdToBeIgnored': function (element, renderer) {
    return true;
  }
};

From the examples it is also stated that it is possible to select tags like 'a' or 'li'. That might be a little bit to unrestrictive for the most usecases though:

We support special element handlers. Register them with jQuery-style
ID selector for either ID or node name. ("#iAmID", "div", "span" etc.)
There is no support for any other type of selectors (class, of
compound) at this time.

One very important thing to add is that you lose all your style information (CSS). Luckily jsPDF is able to nicely format h1, h2, h3 etc., which was enough for my purposes. Additionally it will only print text within text nodes, which means that it will not print the values of textareas and the like. Example:

<body>
  <ul>
    <!-- This is printed as the element contains a textnode -->        
    <li>Print me!</li>
  </ul>
  <div>
    <!-- This is not printed because jsPDF doesn't deal with the value attribute -->
    <input type="textarea" value="Please print me, too!">
  </div>
</body>

How to generate PDF from HTML with JavaScript?

I think you could use Puppeteer to generate a pdf. I've not tried it but here are some links that demonstrate this approach:

advanced-pdf-generation-for-node-js-using-puppeteer
pdf-from-html-node-js-puppeteer
high-quality-pdf-generation-using-puppeteer

How to wait until a pdf is generated Javascript

Assuming you can modify your getScanReport function, use a Promise. Then in your email function, use async / await to wait for that promise.

(You can click the links to read more about how promises and async / await work if you don't use them very often.)

For example (edited since post now includes getScanReport inner workings):

report.getScanReport = () => new Promise( 

    //announcePDFReady() is a function we call to resolve our promise

    announcePDFReady => {

        Promise.
            all( [ /* ... */ ] ).
            then( function( [ results1, results2 ] ) {

                /* ... */

                renderFile( /* ... */ ( err, data ) => {

                    /* ... */

                    pdf.create( data ).toFile( filename, ( err, data ) => {
                        
                        let fileloc = 'C:/Users/' + filename;
                        if (err) {
                            console.log(err);
                        } else {
                            console.log("Successfully created pdf");
                            //announce ready right here!
                            announcePDFReady( data );
                        }

                    } )

                } )

            } )
            
    }

);

//...
//we write 'async' before the cron job function definition
async () => {
    //this is inside the cron job function

    const pdfIsReady = await report.getScanReport();
    //since report.getScanReport() returns a promise,
    //'await' here means our code will stop until the promise resolves.

    if( pdfIsReady === false ) {
        //it failed. Do something? Ignore it?
    } else {

        //pdf is ready. We can send the email
    
        //etc.

    }
}

Puppeteer create PDF files from HTML data hangs Windows 10 system

Example solution (limiting parallel browsers)

I created you a PdfPrinter class which you can integrate into your setup. It allows you to limit the amount of parallel pdf generation jobs and allows setting a limit and manages opening/closing the browser for you. The PdfPrinter class is also highly coupled and needed some modification for using it as a general queue. Logicwise this can be modified to be a general queue.

You can try to integrate that into your code. This is a fully working test example with simplified pdfs (without the part of getting the actual data from the excel..)

As far as I understood your code, you do not need to pass the page around all your functions. First create your html + css and then use the pdfPrinter and let it handle page creation + browser launching..

(I like to code stuff like this so I went straight ahead..)


var puppeteer = require('puppeteer')

const defaultPrinterOptions = {
    format: 'A4',
    printBackground: true,
    margin: {
        left: '0px',
        top: '0px',
        right: '0px',
        bottom: '0px'
    }
}

class PdfPrinter {

    maxBrowsers = 2
    enqueuedPrintJobs = []
    failedJobs = []
    browserInstances = 0

    // max browser instances in parallel 
    constructor(maxBrowsers) {
        this.maxBrowsers = maxBrowsers
    }

    /**
     * 
     * @param {*} html the html content to print
     * @param {*} css to apply to the page
     * @param {*} printOptions options passed to puppeteer
     */
    // enqueues a print but the exact end moment cannot be known..
    enqueuePrint = (html, css, path, done) => {
        // merge custom options with defaultOptions..
        const printOptions = {
            ...defaultPrinterOptions,

            // add the path to the options.
            path: path
        }

        // create a function which can be stored in an array
        // it will later be grabbed by startPrinter() OR at the time any 
        // brwoser freed up.. 
        // the function needs to be passed the actual used browser instance!
        this.enqueuedPrintJobs.push(async(browser) => {

            // catch the error which may be produced when printing something..
            try {
                // print the document
                await this.print(browser, html, css, printOptions)
            } catch (err) {
                console.error('error when printing document..CLosing browser and starting a new job!!', printOptions.path)
                console.error(err)

                // store someting so you now what failed and coudl be retried or something..
                this.failedJobs.push({ html, css, path: printOptions.path })

                // puppeteer can run into erros too!! 
                // so close the browser and launch a new one!
                await this.closeBrowser(browser)
                browser = await this.launchBrowser()
            }

            // after the print, call done() so the promise is resovled in the right moment when 
            // this particular print has ended.!
            done()

            // start the next job right now  if there are any left.
            const job = this.enqueuedPrintJobs.shift()

            if (!job) {
                console.log('No print jobs available anymore. CLosing this browser instance.. Remaining browsers now:', this.maxBrowsers - this.browserInstances + 1)
                await this.closeBrowser(browser)
                return
            }

            // job is actually this function itself! It will be executed
            // and automatically grab a new job after completion :)
            // we pass the same browser instance to the next job!.
            await job(browser)
        })

        // whenever a print job added make sure to start the printer
        // this starts new browser instances if the limit is not exceeded resp. if no browser is instantiated yet,
        // and does nothing if maximum browser count is reached..
        this.tryStartPrinter()
    }

    // same as enqueuePrint except it wraps it in a promise so we can now the
    // exact end moment and await it..
    enqueuePrintPromise(html, css, path) {
        return new Promise((resolve, reject) => {
            try {
                this.enqueuePrint(html, css, path, resolve)
            } catch (err) {
                console.error('unexpected error when setting up print job..', err)
                reject(err)
            }
        })

    }

    // If browser instance limit is not reached will isntantiate a new one and run a print job with it.
    // a print job will automatically grab a next job with the created browser if there are any left.
    tryStartPrinter = async() => {

        // Max browser count in use OR no jobs left.
        if (this.browserInstances >= this.maxBrowsers || this.enqueuedPrintJobs.length === 0) {
            return
        }
        // browser instances available! 
        // create a new one 

        console.log('launching new browser. Available after launch:', this.maxBrowsers - this.browserInstances - 1)
        const browser = await this.launchBrowser()
        
        // run job
        const job = this.enqueuedPrintJobs.shift()
        await job(browser)

    }

    closeBrowser = async(browser) => {

        // decrement browsers in use!
        // important to call before closing browser!!
        this.browserInstances--
        await browser.close()

    }

    launchBrowser = async() => {
        // increment browsers in use!
        // important to increase before actualy launching (async stuff..)
        this.browserInstances++

        // this code you have to adjust according your enviromnemt..
        const browser = await puppeteer.launch({ headless: true })

        return browser
    }

    // The actual print function which creates a pdf.
    print = async(browser, html, css, printOptions) => {

        console.log('Converting page to pdf. path:', printOptions.path)
            // Run pdf creation in seperate page.
        const page = await browser.newPage()

        await page.setContent(html, { waitUntil: 'networkidle0' });
        await page.addStyleTag({ content: css });
        await page.pdf(printOptions);
        await page.close();

    }

}

// testing the PDFPrinter with some jobs.
// make sure to run the printer in an `async` function so u can 
// use await... 
const testPrinterQueue = async() => {

    // config
    const maxOpenedBrowsers = 5 // amount of browser instances which are allowed to be opened in parallel
    const testJobCount = 100 // amount of test pdf jobs to be created
    const destDir = 'C:\\somepath' // the directory to store the pdfs in..

    // create sample jobs for testing...
    const jobs = []
    for (let i = 0; i < testJobCount; i++) {
        jobs.push({
            html: `<h1>job number [${i}]</h1>`,
            css: 'h1 { background-color: red; }',
            path: require('path').join(destDir, `pdf_${i}.pdf`)
        })
    }

    // track time
    const label = 'printed a total of ' + testJobCount + ' pdfs!'
    console.time(label)

    // run the actual pdf generation..
    const printer = new PdfPrinter(maxOpenedBrowsers)

    const jobProms = []
    for (let job of jobs) {

        // run jobs in parallel. Each job wil be runned async and return a Promise therefor
        jobProms.push(
            printer.enqueuePrintPromise(job.html, job.css, job.path)
        )
    }

    console.log('All jobs enqueued!! Wating for finish now.')

    // helper function which awaits all the print jobs, resp. an array of promises.
    await Promise.all(jobProms)
    console.timeEnd(label)

    // failed jobs::
    console.log('jobs failed:', printer.failedJobs)

    // as file:
    await require('fs').promises.writeFile('failed-jobs.json', JSON.stringify(printer.failedJobs))
}

testPrinterQueue().then(() => {
    console.log('done with everyting..')
}).catch(err => {
    console.error('unexpected error occured while printing all pages...', err)
})

You only need to adjust the destDir / openedBrowsers and testJobCount vars in the beginning of testPrinterQueue() for getting this to work.

What caused the problem in your code

Let's have a look at this piece

(async () => {
        browser = await Puppeteer.launch({
            headless: true,
            handleSIGINT: false,
            args: args,
        });

        const page = await browser.newPage();
    
        await page.setViewport({
            width: resolution.x,
            height: resolution.y,
        })

        await computeFirstTerm(page);
        await computeSecondTerm(page);
        await computeThirdTerm(page);
        browser.close()
    })()

You created an anonymous function which is executed immediatly. Within the function all the statements are correctly awaited using await. But if you run this whole piece within a synchronious part of your application, the whole function will start immediatly but NOT been awaited before running next code.

Checkout this example:

//utility
function wait(ms) {
    return new Promise(resolve => {
        setTimeout(resolve, ms)
    })
}

const AsyncFunction = async() => {
    console.log('Async named function started')
        // simulate execution time of 2 seconds
    await wait(2000)

    console.log('Async named function ended')
};

function SyncFunction() {
    console.log('sync function started')

    // example of async function execution within a sync function..
    AsyncFunction();

    // what you have done in your code:
    (async() => {
        console.log('Async anonymus function started')
        await wait(3000)
        console.log('Async anonymus function ended')

    })()

    // what
    console.log('sync function ended.')
}

SyncFunction()
console.log('done')

Note the output:

Async named function started
Async anonymus function started
sync function ended. // => sync function already ended 
done   // sync function ended and code continues execution.
Async named function ended
Async anonymus function ended

To correctly await your async stuff you need to put your whole application in async scope:

//utility
function wait(ms) {
    return new Promise(resolve => {
        setTimeout(resolve, ms)
    })
}

const AsyncFunction = async() => {
    console.log('Async named function started')
        // simulate execution time of 2 seconds
    await wait(2000)

    console.log('Async named function ended')
};

// this is now async!!
async function SyncFunction() {
    console.log('sync function started')

    // example of async function execution within a sync function..
    await AsyncFunction();

    // what you have done in your code:
    await (async() => {
        console.log('Async anonymus function started')
        await wait(3000)
        console.log('Async anonymus function ended')

    })()

    // what
    console.log('sync function ended.')
}

SyncFunction().then(() => {
    console.log('done')
}).catch(err => {
    console.error('unexpected error occured..')
})

This output is what we want

sync function started
Async named function started
Async named function ended
Async anonymus function started
Async anonymus function ended
sync function ended.
done

Hope this helps you understand.

Feel free to leave a comment.

Generating PDF Files with JavaScript