Real Time Operating Systems

A big part of designing systems that are reliable and secure enough to be trusted even under the harshest conditions is the concept of timeliness. If you are doing an important task that requires constant diligence and it is critical that your answers are timely in order to be considered correct, then it doesn't matter how perfect your algorithms are if you are operating on incoming data that's just a little bit too old.

Systems that have to adapt and respond in real-time have to take special care not only to ensure that they are reacting as quickly as they can, but that the way they are acting, especially under the presence of faults and uncertainties, produces a reliable and stable outcome no matter what else is happening around it. You may even be surprised to learn that the number one determination of safety in these types of systems is actually not how fast they react, but rather how they ensure they have the most correct response despite dealing with adverse conditions and having a limited time to make decisions.

You can actually think of application blockchains like Ethereum as real time systems. They respond to complicated conditions and dynamically adjust to the presence of faults, almost entirely without any human intervention whatsoever. But let's focus on what makes them actually "real-time" in the sense of being a system that has to maintain liveliness no matter what else is going on, because some of the best properties that make Ethereum an attractive place to use for serious, high-valued applications are the reliability and fault tolerance emanating from this design principle.

Soft Real-Time vs. Hard Real-Time

There are plenty of examples of real-time systems where not making a decision in a timely manner critically affects how safe the system can actually be. Airplanes for instance are some of the most critical and high-reliability systems we interact with on a regular basis. When flying, airplanes have to make quick adjustments to their flight control surfaces in response to unpredictable and often chaotic movements of air around them that if left unchecked for too long would lead the airplane to tumble out of the sky and crash. Quick, timely inputs are absolutely critical to flight safety.

To account for this, aerospace engineers design their systems with hard real-time constraints, meaning that no matter what happens with the software it must always respond within a short time period or it will risk losing control of the entire aircraft. This property holds even if it means abandoning a calculation you have already started but not completed yet, as a result provided after the expected deadline is essentially useless to maintaining control.

Not all real-time systems need to maintain this level of strictness however. Sometimes, while we still do care about timeliness in a relative sense, but we can be a little flexible on the actual deadline so long as it means we are providing a correct answer, as long as that answer doesn't take too long to compute. In fact, we can describe this as more of a spectrum: the longer it takes, the less useful the computation will be to the system, but that usefulness is not black and white like it is with hard real-time systems. We call this type of system a soft real-time system.

We could describe blockchains like Ethereum as being soft real-time systems because while they do generally target consistent block confirmation and slot finalization times, they can occasionally not come to consensus for time periods a little longer than their targets and still maintain adequate safety. Still, their timeliness guarantees are important, because without them the lending and borrowing systems, margin calls, liquidations, oracle price updates, etc. become a little too out of date, and those applications dependent on those system-level guarantees inch closer towards catastrophic outcomes.

Tortoise and the Hare

While speed is nice because with more speed we can fit more computations, as we said before, the primary determination that a system will be successful in it's goals is that it is capable of delivering consistent, reliable, and timely results. If you have a system that produces a burst of computations at random times, and is not capable of scaling to meet that demand within the time period required, it could overrun the expected deadline, which to a soft real-time system may mean that the results you produce could be considered out of date by the time the actions are taken. In the context of a smart contract application that runs on an application blockchain, if we take longer than a full block period to produce an action in response to some conditional trigger that we'd like to land in the next block, then our action may no longer be consistent with the state of the blockchain where we decided to take that action if it takes too long.

How we design off-chain systems that interact with on-chain ones should also reflect this principle. That doesn't mean we should go out and pick the fastest tools for the job always, because what works most of the time may not work in periods of high demand. We should build off-chain systems that scale well in response to events on-chain, so that when those times come, we are ready for them. This may mean picking algorithms and architectures that emphasize more of an approach of performing measurements continuously in parallel with actions that we want to trigger based on those measurements. Searching for a good opportunity to act in a real-time system is actually more about anticipating the best move to make next, vs. blindly reacting to every event that is happening in an effort to keep up with others.

...and Gorrillas, oh my!

No thought piece would be complete without a good shill, and I really started thinking about writing this short article in response to a discussion on the design principles of our upcoming Silverback Application Platform product with my development team. As people are starting to build out their own applications using the fully free and open-source Silverback SDK (you are trying it, aren't you anon?) we hope they will incorporate some of these design principles in their work and remember to always pick the solution that favors timeliness and correctness over speed and algorithmic complexity. Sometimes it really is the slow and steady that wins the race!

Happy hunting apes

Real Time Operating Systems

...for the crypto newbie

0x1C277bD41A276F87D3E92bccD50c7364aa2FFc69

Soft Real-Time vs. Hard Real-Time

Tortoise and the Hare

...and Gorrillas, oh my!