tldr; If you’re on arch, not all hope is lost when trying to deal with PDFs. pdfunite
is out there for combining PDFs, Firefox is surprisingly helpful since is uses pdf.js
, pdftk
is there if you’re down with downloading the dependencies, convert
is available for paring down scanned images, and ultimately, any software you can run on ubuntu can run on arch with a little docker
.
In my case I had to sign some pages of a PDF, then return the whole thing, with the signed pages. After a few seconds of thinking, the obvious answer is to split apart the original PDF, sign the pages I needed to sign, then combine the old pdf (without the signed pages), with the new signed pages. Hacky, yes, and if I was on a different platform, I might have been able to very easily sign the PDF with my mouse.. but let’s not think about that too much.
Firefox is great for this – you can actually use it to open and view the PDF, and then save only certain pages to a new PDF by using the print-to-pdf option.
I found pdfunite
in a relevant stack overflow post and it turns out it’s a super easy command line tool for putting two pdfs together.
This is necessary because of course, when scanning the PDFs, I actually scanned them in the wrong way. Surprisingly, it’s not all that easy to actually rotate pages (or a single page) of a PDF… Going down this rabit hole ultimately lead to a program called pdftk
that seemed to be especially good. Unfortunately, I wasn’t interested in downloading the large list of dependencies that pdftk
would bring into my system.
If only there was a way to isolate pdftk
when running it… Maybe I could even use a distribution that’s better suited/mentioned in all the guides I see…
A great solution to using pdftk
without dirtying my own system too much is to run pdftk
in a Docker container!
The first container I found, aultman-pdftk, seemed supe rusable, but I couldn’t get it to work properly with the entrypoint. This is probably just me being not quite used to using docker containers for specific commands, but I’m sure a smart reader can figure it out.
Since I was in a do-anything-to-make-it-work kind of mood, what I ended up actually doing was:
ubuntu:latest
container/bin/bash
(i.e. docker run -it ubuntu /bin/bash
)apt-get update && apt-get install pdftk
)docker cp
to move the files that I actually wanted to work on into the container and backNOTE Again, you should probably just figure out how to use some pdftk container as a command and pipe the input to it or whatever… I just didn’t feel like reading through Docker docs to figure out what I was doing wrong with the command line syntax.
I found that when I finished creating the PDFs they were gigantic. Since my PDFs were basically just scans, I was able to use convert
to pare down their size, after finding a relevant askubuntu question.
The actual command I used was:
convert -density 200x200 -quality 60 -compress jpeg input.pdf output.pdf