Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Yet Another IPC in Rust Experiment

Categories
Rust logo

tl;dr - I wrote some Inter-Process Communication (“IPC”) experiments in Rust w/ slightly more abstraction and more realistic serialization (JSON) – Check out the code.

UPDATE (10/10/2024)

Thanks to feedback from Reddit, I've updated my references to "throughput" to "latency" which is the more accurate description of what roundtrips measures.

Rust is kinda hot right now. OK it’s been hot for a while – and I’ve been enjoying using it, but that’s not we’re here to talk about today.

Lots of people are annoyed by the above fact (surely it does get tiring hearing about the new most-awesome language), but one of the nice things about a language’s meteoric rise to popularity is that it encourages people to re-tread age old ground – like the fastest ways to do Inter-Process Communication (IPC).

IPC is an incredibly important topic that doesn’t get much air time – 99% of the time, you’re not deciding how you do IPC with the thing you need to integrate. The question is usually more along the lines of REST or GraphQL (yikes), OpenAPI vs gRPC and whether or not to use Websockets or something like that.

Prior Art

At least a few others have been kicking the tires on this lately, and seeing their posts encouraged and inspired me to do one of my own.

I was primarily inspired by 3tilley’s post (linked above), and the others are just recent and/or reactions to that.

Feel free to read/peruse these other resources now or later – this post can wait.

I did the thing, again – but why?

So my approach (and resulting code) is extremely similar to the prior art, there are a few differences that might be interesting.

The first possibly interesting difference is where I started – while I’ve been ruminating on IPC for a while (just “I wonder what the upper limit is?” thoughts), I am much more interested in what the boundaries are and what I can actually use, off the shelf. Unlike some people (linked above) I don’t actually want to spend the next x months/years building a bulletproof implementation for IPC – I just want to do cool things without losing much performance, with IPC.

My focus is on building code that’s somewhat closer to what you might actually ship, which means some more abstraction and practical concerns like serialization (even starting with something as inefficient as JSON).

I was mainly curious about the roundtrip latencies I would see, and exploring the highest performance choices – named pipes and shared memory.

These answers should apply to any language, and honestly this is more about OS primitives (and what’s available cross-platform) than anything else.

Where’s the code?

Find it on GitHub (with even more context, and hopefully some nice cross-platform setup so you can run it yourself easily).

If you find bugs, do me a favor and file an issue or make a PR!

What were the results?

Plucked from the GitHub page:

Oryx Pro result

Macbook Air result

So clearly, manually managed shared memory is incredibly fast. It’s also reasonably easy to make work across Unixes (Maybe Windows too? All the libraries do build for Windows) – all crates used are multi-platform ready as far as I know.

Since I didn’t do any particularly intense performance tuning, these graphs are a better measure of what I can expect to get “out of the box” with different approaches.

ipc-channel is likely the best of both worlds, because it has great pedigree (maintained by Mozillans/Mozilla-adjacent/Servo people), and gives a reasonable amount of performance compared to shared memory via raw-sync. Paying for ease of use is often worth it, especially when you can always rewrite your way to higher performance when you absolutely need it later.

In a Parent/Child process scenario, it seems pretty clear that passing initial payloads over STDIN (or any other easy to use method) and then upgrading (and possibly downgrading) to a higher performance medium is likely the way to go.

That said, This clearly shows that investigating an approach like raw-sync might actually be worth it. In fact, I know I’m right there, since iceoryx exists for that very purpose

The iceoryx benchmarks from the 2.0.4 release seem to show show that the performance I’m getting with raw-sync is within an order of magnitude (no idea what their workload is, or much many nanoseconds it takes) – this is really amazing for a completely unoptimized code base, with only 2 cores under load.

Thoughts

Abstraction

The abstractions in the code base aren’t perfect, but the interesting choice was at what level the ux-improving abstractions went in.

There were what seemed like there choices:

  • Whole-grouping/program level (doesn’t make much sense, too inflexible)
  • “Object” level (parent/child) – e.g. trait Parents and trait Child (I chose this)
  • Communication level (Handles/Adapters/Senders/etc – I did this too, but maybe should have leaned into it harder)

What would actually going to production take?

As nice as raw-sync looks, the real problem with going to production with something like that is the failure recovery story (or lack of one) – if something goes wrong with your manually shared memory, it’s completely on you to resolve issues – and debugging also becomes much more difficult.

However, as mentioned previously, it seems like a 10x boost might be worth it in very specific situations, like same-machine. 10x performance is a lot to leave on the table (depending on the problem!)

Rust’s ecosystem made it easy

This codebase wasn’t hard to write, and in typical Rust ecosystem style is very easy for me to trust.

Rust’s affordances around conditional compilation, strong typing, memory-safety-by-default, etc help where they can, and means that the libraries that I’m depending on “just work”.

Obviously, a library like raw-sync is really a “mostly safe” wrapper on top of OS primitives (which can go wrong in all sorts of ways, but generally don’t), and my library is a shaky wrapper on top of that, but I certainly find it much easier to trust the code, as written, and am confident I could make it work relatively easily across platforms and more generically at higher levels (which much more effort, of course).