Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

A Galaxy Scale Addressing Scheme for Compute Availability

Categories
earth + servers

tl;dr - Back of the napkin ideation of what a universal planet-scale (so ensuring satellite compute could be included) addressablility for compute resources. Goodbye us-west-1 (AWS)/us-west1 (GCP)/ West US (Azure) and hello COUNTRY:USA-COMPASS:WEST or STATE:CA-COMPASS:NORTH? Skip to the end for some examples

Wouldn’t it be cool if we had a cloud provider agnostic way of talking about regions? To my knowledge there isn’t one, and with the recent ops-kick I’ve been on the last few years (containers, kubernetes, easy CI/CD with Gitlab), I recently spent a super small amount of time thinking about this and think I’ve come up with a scheme enough so simple that it just might work (and maybe even work well!).

A few edge cases I’ve “designed” for:

  • Compute/storage resources that are not on earth (the ISS, satellites, other planets)
  • Regions so simple that they can be built on the fly (i.e. describing a region that spans Nigeria & Spain to serve Tunisia/Algeria?)

These edge cases come from a pretty big belief that I’ve been kicking around in my head for a while which I’ve seen echoed around the internet a little but not a whole lot – the future of the computing has to be a “fog”. My rationale is simple:

  • Compute, storage and “edge” compute demand will continue to increase as humans find more and more ways to use computers/technology
  • Barriers to entry for creating reliable data centers is dropping due to tools like Kubernetes, Hashicorp Nomad, OpenShift, and MAAS
  • More and more people will choose to utilize otherwise idle capital (land) to provide computing, especially when the natural environment is advantageous (lots of heat means some potential for solar power, lots of cold means potential for reduced cooling bills)
  • AWS/GCP/Azure have space in their margins, currently protected by the difficulty of creating a “good” cloud provider
  • If Kubernetes continues on at it’s current rate of adoption, people will very likely be using a k8s API (or similar) to create resources across platforms
  • Companies that focus on the value added benefits provided by AWS/GCP/Azure will out-compete the cloud providers on their own platforms
  • As more new cloud providers spring up, they’ll be able to use new tooling to reduce their barrier to entry and the companies providing run-anywhere value-added solutions will start trying to deploy to a lot more clouds.

There’s a whole lot I could unpack in the predictions above but I won’t bother, let’s just pretend lightning will strike ~5 times. With the assumption that things will play out something like what I suspect above, there’s a lot of things that fall out, one thing is that we’ll likely need a better way to characterize regions – which is what this post is about (maybe I’ll write about some of the others later).

As you might have suspected by now, I use the term “design” really loosely – this isn’t a formal specification, there are probably a myriad of holes I can’t even begin to expect, but I’ve also written something I think is so general and flexible it’s either useless or useful.

This post is basically stream-of-conscious writing and notes back of the napkin design of a system that might just be flexible enough to work. I’ve decided to call the system GCAS – the General Cloud Adressing Scheme.

Napkin Side A: What levels of addressing are needed?

First things up is to probably start thinking about the information we might need to represent:

  • Position in Space (absolute-ish, as we currently understand our universe)
  • Planet (on? orbiting?)
  • Lat/Lon
  • Continent
  • Country
  • Area/Region of Country?

There’s a balance of accuracy and human usability that has to be walked here – lat/lon would be great for accuracy for on-earth position, but would be attrocious for readability. I don’t keep up with the latest on where the earth is in our current understanding of space, but I get the feeling that area of science might still have some large discoveries some day. Regardless, this short thought exercise leads me to believe that the only way to build something that would last the next 100 years of human exploration/discovery and scientific progression is something very flexible.

This lead me to think of a - (hyphen) separated scheme might be a good idea – as in <PLANET>-<COUNTRY>-[term]*.

Taking a look at the way current regions are mostly named (i.e. us-east-2, East Asia), it seems like continents and/or countries with some indication of area/region is the only predominant information required for decent human usability (assuming of course that the status quo is at least decently human usable). This makes me think that it might make sense to split countries up into sections – maybe a scheme like EARTH-USA-WEST-1.

But what if someone didn’t know how many regions were offered by the cloud provider in the western part of the USA on earth? I think the best good-enough answer might be solving those kinds of problems with less specificity – if you don’t know which data center you want, then go with EARTH-USA-WEST, and let the data center (or other constraints) choose for you.

Another approach might be to use lat/lon to slice up countries into vertical sections – as country land area varies wildly, it likely makes sense to put lower bounds on slice-size – let’s say France has 1 or 2 sections where the US might have 5 or more. And what about horizontal cross sections of countries? I instinctively think of vertical sectioning but maybe horizontal is just as valid? Should you be able to specify a vertical and horizontal (so now we have a grid) area?

Well the short answer to a lot of these questions is “I don’t know”, so right around this point I started thinking that surely there must be some prior art here.

Searching for prior art

Surely other smarter people have already tried to do this? There’s probably a whole world of research on intergalactic positioning systems that I don’t know about (if you’re out there and you know about this stuff please send me an email!)?

A few hasty google searches later all I could find was an article on using pulsars for interstellar navigation, which was a tittilating read but clearly not quite what I was looking for. Outside of that system only working in our solar system, it was a bit too heavy weight for what I want to accomplish. This did lead me to a reference for a set of standards for planetary data information that NASA developed which was pretty cool.

Right around this point I gave up on being able to completely accurately position the universe, but decided to retain at least the requirement of being able to differentiate compute on a planet versus orbiting a planet (or in some other sort of extra-planetary location).

Napkin Side B: Starting fresh with a focus on simplicity

Alright, since I can’t accurately position the universe, let’s start with named planets and go from there, and have the most granular the system gets be planets for now. How do we distinguish on-planet from off-planet resources? Wait a second, planetary orbits aren’t really stationary in a sense, they change and change in unpredictable ways if human intervention happens.

Are we on or off the planet?

The off-planet resource question seems like a really ornery one, but I can solve it by just changing the semantics (cheating). I can choose to consider off-planet resources moving as roughly equivalent to a group of machines being moved form one data center to another – we’re not measuring static pieces of infrastructure, we’re more measuring areas of availability for some compute – this means that we can punt the problem of a satellite being in the right physical space to be qualified to run a certain workload to the satellites! This would mean that it would be the responsibility of whichever provider had the satellites to make sure that the appropriate satellites did whatever workload stealing or trading was necessary to fulfill the constraints on the given workload(s).

As for the actual scheme, <PLANET>+ORBIT should be just fine – a + could be used to identify a modifier to a preceeding phrase, where as - might mean a descendant EARTH-USA versus EARTH+ORBIT.

Which Continent?

It’s not immediately clear whether the continent designation is very important. While this is vaguely the range at which things are specified now (e.x. ap-northeast-1 is Tokyo’s region in AWS), I’m not sure it should be the dominant paradigm. Turns out we as humans also disagree on how many continents there are as well, so that’s a landmine I don’t want to step on.

While thinking about this I did stumble upon an interesting point though – we might be able to enhance accuracy and understandability at the expensve of legibility if we introduce : (colon) for tagging sections. We could have something a value like PLANET:EARTH-CONTINENT:ANTARCTICA to represent Antarctica, on planet Earth.

Which Country?

We can handle countries with a similar strategy as continents, and use country names (if short enough) or ISO country codes. A value like PLANET:EARTH-CONTINENT:NORTH_AMERICA-COUNTRY:USA would be a decent (if not overly verbose) way of specifying

Sidetrack - It would probably be a good idea to build in some sort of elision mechanism – if there’s only one registered country called USA on PLANET:EARTH, it might make sense to allow elision of CONTINENT:NORTH_AMERICA.

Areas/Regions within countries

This is where things get much trickier – as we discussed before, there’s lots of ways we could usefully split up the land mass of existing countires:

  • vertical slicing (by longitude)
  • horizontal slicing (by latitude)
  • vertial & horizontal slicing
  • points on the compass (east, west, north, south, north-east? south-west?)
  • country specific demarkations (states, provinces prefectures)
  • predefined lat/lon with geospacial bounding boxes?

Of course, country specific demarkations are the most human understandable (generally), but they’re also rife with complication, Tokyo is colloquially called a “city”, but it’s also a “prefecture” of Japan is absolutely not what Americans normally consider a city-sized landmass or population. There are also countries they own/control outside the main mass of the country (for example U.S. territories). And there’s also of course the various land disputes between countries in the asian corridor.

Also, this would be a good time ot realize that we’re very likely going to require this standard to be in UTF8 – it would be unfair to force everyone in the world to translate their country name to english (though it could be used as a fallback). The next realization is probably that the state/province/prefectures almost always have even more categories beneath them like county/city/state/town and even more beneath like neighborhood/street/corner/junctions. Luckily, I think those issues are covered decently with the tagging and specificity “features” we’ve introduced so far – as accuracy goes up readability suffers but doesn’t become impossible, just more tedious (and elision would help in some cases anyway).

Let’s say our current scheme is good enough to encompass the country-specific demarkations, how do we actually solve the problem at hand? Which sub-country addressing scheme should we pick? Luckily we’d already stumbled on to the idea of adding a tagging semantic, which means the answer I’m going to choose here is all of them, but in particular, the right one for your use case.

The easiest for me to try and start defining (and likely the most useful for those who might use this) is country specific demarkations – so I’ll just include some examples of those:

  • COUNTRY:USA-COMPASS:EAST We can elide PLANET:EARTH since the USA only exists on one planet for now).
  • COUNTRY:USA-STATE:TEXAS-COMPASS:SOUTH
  • LAT:40.730610-LON:-73.935242 (we don’t need to worry about difficulties parsing -73 since we have the LON: prefix`)

And for some fun here are some that might be appropriate for compute on satellites/underground bases:

  • EARTH+ORBIT-COUNTRY:USA+ABOVE - any satellite orbiting the earth that is “above” the USA
  • COUNTRY:ENG+BELOW - any underground bunkers in England
  • COUNTRY:USA-STATE:TX+MOBILE - compute that is mobile, anywhere in the state of texas

Some ideas around growth/flexibility

A lot goes into making a standard easy to understand and use (for example, a good reference implementation), and for these back of the napkin thoughts I found it also interesting to consider some other points:

Versioning

Obviously, there’s going to need to be a versioning scheme, and something like an assumed version if none is specified. Maybe requiring the version to be the first tag (if specified) might be enough. It could also be a good way to implement custom specs – for example inside some organization that has codenames for regions, it might make sense to have V:MYORG-LOCATION:PANDA to denote a location. A more realistic example could be V:AWS-COUNTRY:USA-COMPASS:WEST.

UTF8

It’s probably a good idea to make sure the entire scheme uses only UTF8 – as mentioned earlier, it’s a little unreasonable to require everyone to convert their names to english, if not absolutely necessary.

Shortcuts/Aliases

Obviously, some of the best known areas should problably have aliases and shortcuts. Elision rules should help most normal cases (for the forseeable future “california” can probably expand to PLANET:EARTH-COUNTRY:USA-STATE:CA), but it might be nice to be able to define static terms that alias/expand to a very specific area.

Let’s take it for a spin!

Here’s a list of some AWS/GCP/Azure regions and how they match up with this scheme:

Region AWS GCP Azure GCAS
Northern California us-west-1 us-west2 West US STATE:CA-COMPASS:NORTH
Tokyo ap-northeast-1 asia-northeast-1 Japan East COUNTRY:JAPAN-COMPASS:EAST
London eu-west-2 europe-west2 UK South COUNTRY:UK-COMPASS:SOUTH
South America sa-east-1 southamerica-east1 Brazil South CONTINENT:SOUTHAMERICA-COMPASS:SOUTH

The examples above aren’t the most succint possibility, but they’re likely what someone might write as a first pass, which is both relatively readable and has the possibility to be more accurate or general than the current cloud vendor hard-coded values.

Wrapup

Well this was a pretty fun thought experiment for me, and from my armchair the end results even looks usable. I’d love to hear feedback if anyone’s spent time thinking about this.

If you’ve made it all the way down here, thanks for reading!