Remote Software Engineer at Stripe and cellist based out of Ontario. Previously at GitLab. Fascinated with building usable, delightful software.
April 15, 2019 | 7 minutes to read
Generating PDF reports is one of those features that every enterprise developer will implement at some point in their career. I had my turn on a project with my previous employer. After exploring the available options, I settled on using Chrome’s headless mode to render HTML and save the result as a PDF.
This approach seems kind of weird and a bit overkill at first, but it has a number of pretty huge advantages:
It’s not all unicorns and rainbows, though. Below are a few of the gotchas I discovered while building a real PDF using headless Chrome.
This is the big one. If you try and place an <img>
tag in your header or footer (a pretty common use case for a header or footer):
<img src="/assets/logo.jpg" />
…your image won’t show up. This is because Chrome won’t make any requests for external resources that appear in the header or footer templates.
One workaround is to encode the image into the template as a base64’d string:
<img src="data:image/png;base64, iVBORw0KGg..." />
I’ve found this site handy for converting an image into an <img>
-compatible base64 string.
Headers and footers are specified at PDF render time by passing HTML strings to the page.pdf()
method:
page.pdf({
headerTemplate: '<h1>This is the header!</h1>',
footerTemplate: '<h1>This is the footer!</h1>',
});
These templates are rendered in a separate context than the content of the webpage. Because of this, the CSS styles that apply to the content won’t apply to the header and the footer. Any styles that apply to the content of your PDF that you would like to also apply to your header and footer must be repeated in each of your header and footer templates. And unfortunately, you can’t just reference a common stylesheet using a <link>
element - see point #1 above.
This one took me a while to figure out. Chrome won’t automatically resize your content to make space for the header and footer templates. You’ll need to make space for your header and footer by specifying a fixed margin at the top and bottom of your page:
page.pdf({
headerTemplate: '<h1>This is the header!</h1>',
footerTemplate: '<h1>This is the footer!</h1>',
margin: {
top: '100px',
bottom: '50px',
},
});
Without these margins, the content will be rendered on top of your header and footer, leaving you wondering why your header and footer templates aren’t showing up.
CSS provides some rules that determine where a page break should be placed when printing, for example:
@media print {
.page {
page-break-after: always;
}
}
These rules work - but they can be finicky. You may run into problem when trying to page break inside of[1]:
I also had issues using page-break-after
inside of a flexbox layout.
There are a few edge cases - mostly dealing with headers/footers and page wrapping - that you simply can’t control. For example, want to place a special footer only on pages 2, 4, and 7? Not possible. (If it is, let me know how!)
If the page being screenshotted requires time to load, (for example, if the page has JavaScript that makes an AJAX request for some data), you’ll need to wait for this initialization to complete before triggering the screenshot. If you simply screenshot the page right after the initial load, your PDF will be filled with loading bars and missing data.
I worked around this by setting a global flag in the webpage once all initialization work is finished:
// in the web page
async init() {
const data = await this.dataService.getData();
const user = await this.userService.getUserProfile();
// ...etc...
window.isReadyForPDF = true;
}
Then, using Puppeteer’s page.waitForFunction()
method, we can wait for this global variable to bet set:
// on the server
await page.waitForFunction('window.isReadyForPDF');
// now we know the page is ready for a screenshot
If the page you’re screenshotting is part of a web application, it’s likely there’s an authentication step that’s required to view the page. This can be a bit of a pain to work around, but fortunately, Puppeteer provides enough control to programmatically log in to the application:
await page.waitForSelector('#username');
await page.waitForSelector('#password');
await page.evaluate(() => {
document.querySelector('#username').value = 'my-username';
document.querySelector('#password').value = 'my-password';
document.querySelector('#log-in-button').click();
});
There are some downsides to this approach, though:
Disclaimer: my PDF generator was written in .NET Core, so I actually used a library called Puppeteer Sharp which aims to replicate the API of the official Puppeteer library (which runs on Node). Some of the code examples above might be slightly off since I translated them from C♯ into JavaScript.
References/Attributions
[1]: https://stackoverflow.com/a/26265549/1063392
Minifigure/Chrome image from https://hackernoon.com/so-many-testing-frameworks-so-little-time-b03c707b8f90
Other posts you may enjoy:
November 25, 2024 | 11 minutes to read
October 14, 2024 | 3 minutes to read
May 31, 2024 | 6 minutes to read
June 26, 2023 | 14 minutes to read
January 25, 2022 | 6 minutes to read