Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Working with PDFs on Arch linux


Working with PDFs on arch linux

tldr; If you’re on arch, not all hope is lost when trying to deal with PDFs. pdfunite is out there for combining PDFs, Firefox is surprisingly helpful since is uses pdf.js, pdftk is there if you’re down with downloading the dependencies, convert is available for paring down scanned images, and ultimately, any software you can run on ubuntu can run on arch with a little docker.

The problem

In my case I had to sign some pages of a PDF, then return the whole thing, with the signed pages. After a few seconds of thinking, the obvious answer is to split apart the original PDF, sign the pages I needed to sign, then combine the old pdf (without the signed pages), with the new signed pages. Hacky, yes, and if I was on a different platform, I might have been able to very easily sign the PDF with my mouse.. but let’s not think about that too much.

Saving certain pages of a PDF

Firefox is great for this – you can actually use it to open and view the PDF, and then save only certain pages to a new PDF by using the print-to-pdf option.

Combining PDFs

I found pdfunite in a relevant stack overflow post and it turns out it’s a super easy command line tool for putting two pdfs together.

Rotating PDFs

This is necessary because of course, when scanning the PDFs, I actually scanned them in the wrong way. Surprisingly, it’s not all that easy to actually rotate pages (or a single page) of a PDF… Going down this rabit hole ultimately lead to a program called pdftk that seemed to be especially good. Unfortunately, I wasn’t interested in downloading the large list of dependencies that pdftk would bring into my system.

If only there was a way to isolate pdftk when running it… Maybe I could even use a distribution that’s better suited/mentioned in all the guides I see…

Here comes… Docker

A great solution to using pdftk without dirtying my own system too much is to run pdftk in a Docker container!

The first container I found, aultman-pdftk, seemed supe rusable, but I couldn’t get it to work properly with the entrypoint. This is probably just me being not quite used to using docker containers for specific commands, but I’m sure a smart reader can figure it out.

Since I was in a do-anything-to-make-it-work kind of mood, what I ended up actually doing was:

  • Using a base ubuntu:latest container
  • Getting into it using /bin/bash (i.e. docker run -it ubuntu /bin/bash)
  • Doing messy installation/update stuff (i.e. apt-get update && apt-get install pdftk)
  • Using docker cp to move the files that I actually wanted to work on into the container and back

NOTE Again, you should probably just figure out how to use some pdftk container as a command and pipe the input to it or whatever… I just didn’t feel like reading through Docker docs to figure out what I was doing wrong with the command line syntax.

Reducing the size of your PDFs

I found that when I finished creating the PDFs they were gigantic. Since my PDFs were basically just scans, I was able to use convert to pare down their size, after finding a relevant askubuntu question.

The actual command I used was:

convert -density 200x200 -quality 60 -compress jpeg input.pdf output.pdf