It's been a while since this was posted. Hopefully the information in here is still useful to you (if it isn't please let me know!). If you want to get the new stuff as soon as it's out though, sign up to the mailing list below.Join the Mailing list
tl;dr - Back of the napkin ideation of what a universal planet-scale (so ensuring satellite compute could be included) addressablility for compute resources. Goodbye
West US (Azure) and hello
STATE:CA-COMPASS:NORTH? Skip to the end for some examples
Wouldn’t it be cool if we had a cloud provider agnostic way of talking about regions? To my knowledge there isn’t one, and with the recent ops-kick I’ve been on the last few years (containers, kubernetes, easy CI/CD with Gitlab), I recently spent a super small amount of time thinking about this and think I’ve come up with a scheme enough so simple that it just might work (and maybe even work well!).
A few edge cases I’ve “designed” for:
These edge cases come from a pretty big belief that I’ve been kicking around in my head for a while which I’ve seen echoed around the internet a little but not a whole lot – the future of the computing has to be a “fog”. My rationale is simple:
There’s a whole lot I could unpack in the predictions above but I won’t bother, let’s just pretend lightning will strike ~5 times. With the assumption that things will play out something like what I suspect above, there’s a lot of things that fall out, one thing is that we’ll likely need a better way to characterize regions – which is what this post is about (maybe I’ll write about some of the others later).
As you might have suspected by now, I use the term “design” really loosely – this isn’t a formal specification, there are probably a myriad of holes I can’t even begin to expect, but I’ve also written something I think is so general and flexible it’s either useless or useful.
This post is basically stream-of-conscious writing and notes back of the napkin design of a system that might just be flexible enough to work. I’ve decided to call the system GCAS – the General Cloud Adressing Scheme.
First things up is to probably start thinking about the information we might need to represent:
There’s a balance of accuracy and human usability that has to be walked here – lat/lon would be great for accuracy for on-earth position, but would be attrocious for readability. I don’t keep up with the latest on where the earth is in our current understanding of space, but I get the feeling that area of science might still have some large discoveries some day. Regardless, this short thought exercise leads me to believe that the only way to build something that would last the next 100 years of human exploration/discovery and scientific progression is something very flexible.
This lead me to think of a
- (hyphen) separated scheme might be a good idea – as in
Taking a look at the way current regions are mostly named (i.e.
East Asia), it seems like continents and/or countries with some indication of area/region is the only predominant information required for decent human usability (assuming of course that the status quo is at least decently human usable). This makes me think that it might make sense to split countries up into sections – maybe a scheme like
But what if someone didn’t know how many regions were offered by the cloud provider in the western part of the USA on earth? I think the best good-enough answer might be solving those kinds of problems with less specificity – if you don’t know which data center you want, then go with
EARTH-USA-WEST, and let the data center (or other constraints) choose for you.
Another approach might be to use lat/lon to slice up countries into vertical sections – as country land area varies wildly, it likely makes sense to put lower bounds on slice-size – let’s say France has 1 or 2 sections where the US might have 5 or more. And what about horizontal cross sections of countries? I instinctively think of vertical sectioning but maybe horizontal is just as valid? Should you be able to specify a vertical and horizontal (so now we have a grid) area?
Well the short answer to a lot of these questions is “I don’t know”, so right around this point I started thinking that surely there must be some prior art here.
Surely other smarter people have already tried to do this? There’s probably a whole world of research on intergalactic positioning systems that I don’t know about (if you’re out there and you know about this stuff please send me an email!)?
A few hasty google searches later all I could find was an article on using pulsars for interstellar navigation, which was a tittilating read but clearly not quite what I was looking for. Outside of that system only working in our solar system, it was a bit too heavy weight for what I want to accomplish. This did lead me to a reference for a set of standards for planetary data information that NASA developed which was pretty cool.
Right around this point I gave up on being able to completely accurately position the universe, but decided to retain at least the requirement of being able to differentiate compute on a planet versus orbiting a planet (or in some other sort of extra-planetary location).
Alright, since I can’t accurately position the universe, let’s start with named planets and go from there, and have the most granular the system gets be planets for now. How do we distinguish on-planet from off-planet resources? Wait a second, planetary orbits aren’t really stationary in a sense, they change and change in unpredictable ways if human intervention happens.
The off-planet resource question seems like a really ornery one, but I can solve it by just changing the semantics (cheating). I can choose to consider off-planet resources moving as roughly equivalent to a group of machines being moved form one data center to another – we’re not measuring static pieces of infrastructure, we’re more measuring areas of availability for some compute – this means that we can punt the problem of a satellite being in the right physical space to be qualified to run a certain workload to the satellites! This would mean that it would be the responsibility of whichever provider had the satellites to make sure that the appropriate satellites did whatever workload stealing or trading was necessary to fulfill the constraints on the given workload(s).
As for the actual scheme,
<PLANET>+ORBIT should be just fine – a
+ could be used to identify a modifier to a preceeding phrase, where as
- might mean a descendant
It’s not immediately clear whether the continent designation is very important. While this is vaguely the range at which things are specified now (e.x.
ap-northeast-1 is Tokyo’s region in AWS), I’m not sure it should be the dominant paradigm. Turns out we as humans also disagree on how many continents there are as well, so that’s a landmine I don’t want to step on.
While thinking about this I did stumble upon an interesting point though – we might be able to enhance accuracy and understandability at the expensve of legibility if we introduce
: (colon) for tagging sections. We could have something a value like
PLANET:EARTH-CONTINENT:ANTARCTICA to represent Antarctica, on planet Earth.
We can handle countries with a similar strategy as continents, and use country names (if short enough) or ISO country codes. A value like
PLANET:EARTH-CONTINENT:NORTH_AMERICA-COUNTRY:USA would be a decent (if not overly verbose) way of specifying
Sidetrack - It would probably be a good idea to build in some sort of elision mechanism – if there’s only one registered country called
PLANET:EARTH, it might make sense to allow elision of
This is where things get much trickier – as we discussed before, there’s lots of ways we could usefully split up the land mass of existing countires:
Of course, country specific demarkations are the most human understandable (generally), but they’re also rife with complication, Tokyo is colloquially called a “city”, but it’s also a “prefecture” of Japan is absolutely not what Americans normally consider a city-sized landmass or population. There are also countries they own/control outside the main mass of the country (for example U.S. territories). And there’s also of course the various land disputes between countries in the asian corridor.
Also, this would be a good time ot realize that we’re very likely going to require this standard to be in UTF8 – it would be unfair to force everyone in the world to translate their country name to english (though it could be used as a fallback). The next realization is probably that the state/province/prefectures almost always have even more categories beneath them like county/city/state/town and even more beneath like neighborhood/street/corner/junctions. Luckily, I think those issues are covered decently with the tagging and specificity “features” we’ve introduced so far – as accuracy goes up readability suffers but doesn’t become impossible, just more tedious (and elision would help in some cases anyway).
Let’s say our current scheme is good enough to encompass the country-specific demarkations, how do we actually solve the problem at hand? Which sub-country addressing scheme should we pick? Luckily we’d already stumbled on to the idea of adding a tagging semantic, which means the answer I’m going to choose here is all of them, but in particular, the right one for your use case.
The easiest for me to try and start defining (and likely the most useful for those who might use this) is country specific demarkations – so I’ll just include some examples of those:
COUNTRY:USA-COMPASS:EASTWe can elide
PLANET:EARTHsince the USA only exists on one planet for now).
LAT:40.730610-LON:-73.935242(we don’t need to worry about difficulties parsing
-73since we have the
And for some fun here are some that might be appropriate for compute on satellites/underground bases:
EARTH+ORBIT-COUNTRY:USA+ABOVE- any satellite orbiting the earth that is “above” the USA
COUNTRY:ENG+BELOW- any underground bunkers in England
COUNTRY:USA-STATE:TX+MOBILE- compute that is mobile, anywhere in the state of texas
A lot goes into making a standard easy to understand and use (for example, a good reference implementation), and for these back of the napkin thoughts I found it also interesting to consider some other points:
Obviously, there’s going to need to be a versioning scheme, and something like an assumed version if none is specified. Maybe requiring the version to be the first tag (if specified) might be enough. It could also be a good way to implement custom specs – for example inside some organization that has codenames for regions, it might make sense to have
V:MYORG-LOCATION:PANDA to denote a location. A more realistic example could be
It’s probably a good idea to make sure the entire scheme uses only UTF8 – as mentioned earlier, it’s a little unreasonable to require everyone to convert their names to english, if not absolutely necessary.
Obviously, some of the best known areas should problably have aliases and shortcuts. Elision rules should help most normal cases (for the forseeable future “california” can probably expand to
PLANET:EARTH-COUNTRY:USA-STATE:CA), but it might be nice to be able to define static terms that alias/expand to a very specific area.
Here’s a list of some AWS/GCP/Azure regions and how they match up with this scheme:
The examples above aren’t the most succint possibility, but they’re likely what someone might write as a first pass, which is both relatively readable and has the possibility to be more accurate or general than the current cloud vendor hard-coded values.
Well this was a pretty fun thought experiment for me, and from my armchair the end results even looks usable. I’d love to hear feedback if anyone’s spent time thinking about this.
If you’ve made it all the way down here, thanks for reading!