How can i reduce the size of my PDF with my shell?

Question

After using the Pdf2htmlEX command to convert my PDF to HTML, I translated my HTML and then used the wkhtmltopdf command to convert this HTML to PDF, it gave me the PDF file correctly but the problem was that it became extremely large since the old PDF was 116.4K and the new converted and translated PDF is 3.4MB.

Here are my HTML and PDF files on Github.

Here's the command I used to convert PDF to HTML with Pdf2htmlEX:

pdf2htmlEX --fit-width 1024 --space-as-offset 1 fss4.pdf fss4.html

Here's the command I used to convert HTML to PDF with Pdf2htmlEX:

xvfb-run -a wkhtmltopdf --no-images --quiet --dpi 150 --disable-smart-shrinking fss4.html fss4-fr.pdf

What can I do to reduce the size of this PDF file? I'm confused, I don't know what to do.

Any of you guys will help me a lot... Do you even have an idea how I should solve this problem???

K J · Accepted Answer · 2024-04-23 18:00:05Z

The file is a highly specialist Adobe format with scripting and restricted abilities in other readers.

Below showing Acrobat warning and 3rd part Reader recommendation it only works in Adobe Reader. There is much proprietary Forms data and thus not suitable for use in any other application. It is purely designed to be used with Adobe Licensed Server applications. (Really these files should not carry the .PDF extension, but use say .XFA, however that is Adobes prerogative and they universally use .PDF for reader based files.)

You should convert using an XFA to PDF application not try to bypass by inferior conversion messy means.
You should have no need to convert such a form as it will not work in any other application.

Even if you neuter scripting and the Adobe enhancements it will still say in Acrobat. You can not use this file as a simple e-form and must be simply printed out, as if a paper record!

The only suitable means to convert such a file is PRINT to PDF

So best for paper filling is printout such as a flatter paper image with gigantic increase in size and less ability to be acceptable as an online resource.

If you want a smaller electronic file of 33 KB. Then use GhostScript to remove all the baggage and attempt to "FIX" the file into a conventional PDF. You will then need to add conventional PDF fields to the result.

NOTE the comment that the Adobe file format does not meet Adobe PDF published standard format. (Perfectly correct, as it is an Acrobat Designer specific format!)

gs -sDEVICE=pdfwrite -oform.pdf fss4.pdf

Finally

Now you have a new file to transfer the old fields to new. You can use any suitable PDF SDK, to copy the fields across and the final file will be much much smaller without all that XFA nonsense.

This is what an XFA to PDF converter such as Apryse / Aspose or other powerful PDF products will do, faster and better than my manual approach.

Fields copied over produces a TrueForm.pdf of 82.47 KB (84,451 bytes).

Colateral damage is you should always test radio button features since they have enhanced group logic. So a manual copy may not work correctly without manual grouping. Thus as per the OP example the copy (without additional editing) does not control YES OR NO it will allow both to be accepted!

johnwhitington · Accepted Answer · 2024-04-24 00:24:10Z

We might expect something to do with images, but no, it is the actual PDF contents streams:

$ cpdf -composition fss4.pdf
Images: 0 bytes (0.00%)
Fonts: 16988 bytes (14.06%)
Content streams: 38147 bytes (31.57%)
Structure Info: 9452 bytes (7.82%)
Attached Files: 0 bytes (0.00%)
XRef Table: 12658 bytes (10.48%)
Unclassified: 43593 bytes (36.08%)

$ cpdf -composition fss4-fr.pdf
Images: 226713 bytes (7.19%)
Fonts: 12694 bytes (0.40%)
Content streams: 2908943 bytes (92.31%)
Structure Info: 0 bytes (0.00%)
Attached Files: 0 bytes (0.00%)
XRef Table: 760 bytes (0.02%)
Unclassified: 2017 bytes (0.06%)

Upon closer inspection, it's not even inline images in the content streams. Something in your process has converted most of the text in the file to shapes - and in an inefficient way, so each letter is stored separately. So you have (uncompressed) a 15Mb content stream, and compressed about 3Mb. Why, I can't tell you - that's a wkhtmltopdf problem.

Collectives™ on Stack Overflow

How can i reduce the size of my PDF with my shell?

2 Answers 2

Finally

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Finally

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related