I propose that the whole idea of an operating system that needs to restart in order to update or fix errors is fundamentally flawed and has its roots in the days when processor time and memory use were drastically more limited and more carefully budgeted in programming than they need to be now. If we continue to hold on to programming habits and traditions formed in those early days, future progress may slow or even halt when our technology begins demanding more efficient, dynamic infrastructures on which to think and perform calculations. When such a day comes, we might be left scratching our heads wondering what the big holdup is – unless we are ready.
For what reason do we have files in our systems that the operating system needs to protect from change while running? I believe that this may have originated in the days when the first two digits of the year were dropped to save space in the memory. The programmers didn’t consider that eventually the year wouldn’t start with a nineteen, or that such savings in memory would be trivial in just a few short years. I think that in order to keep the processor free to run programs in the operating system, programmers avoided having to reference certain system files more than once while the operating system is running. By reading them once at startup, then deeming them “untouchable” during operation, they avoided having to read the same information several times while running. Whether I have this particular detail correct or not, I think the basics of the idea are based in processor usage or memory management somehow. Either way, programmers did not realize that one day computer processors would be capable of performing calculations several times faster than most users would require of them and memory would be measured in terabytes rather than kilobytes.
In order to clearly see a need for change, we can travel into the future. Imagine a robotic surgeon performing an emergency surgery in a remote area of the world. This robot may be completely autonomous, or it may be remotely guided in some capacity by a human surgeon. The operation begins, the first slice cleanly revealing the innards of our poor, doomed example subject. A few cuts later and a major organ will be in jeopardy. There will be a thirty second window to take needed precautions to prevent this patient from being seriously injured or even killed. Our multi-limbed machine is fully capable of performing this task in under ten seconds, but suddenly, as the thirty second window opens, a fatal error occurs in the operating system code. A blue screen of death shines in the background while our patient is beginning to die in the foreground. A redundant system realizes the problem and restarts the main computer. Backup processes would have been able to carry out the instructions necessary to save this man’s life, but the data has been corrupted and needs to be restored. Luckily, this is the future, and our system restarts in just under ten seconds. The local data is restored by logging into the robot’s online data cloud and reconstructing the damaged areas. Finally, nearly fifteen seconds in to this critical countdown the surgeon begins saving the dying organ. Unfortunately, the process takes twenty seconds, and fatal damage has already been done. The patient dies.
Admittedly, even more redundant systems would probably exist on such an important machine, several of which redundancies would be fully capable of accessing uncorrupted, collectively managed data and completing the surgery without incident. My point wouldn’t be made quite so clearly though if redundant systems had saved the patient. The question is, why should our systems need to restart when an error is encountered? Shouldn’t proper, modular programming techniques be able to circumvent the need to completely reboot the whole operating system? It is both expensive and impractical to simply give every important computer system several redundant iterations of itself.
With processor speed so high and memory so freely available, I don’t see why we can’t have a certain amount of built-in redundancy for every individual computer without having to install physical clones of the computer. Systems exist that will add redundancy to data storage (like the Drobo), but even these techniques cannot prevent your system from needing to restart after certain errors. In a modular operating system, a fatal error should only require the affected module to restart, not the entire system. If the module that encounters the error is a critical system module that would cripple the entire operating system while disabled, then a redundant module should exist to prevent the user from being affected by the error.
For another silly example from the future, let’s visit the computer system that handles traffic movements. This system monitors traffic needs and controls traffic flow in an entire city. It networks with traffic computers in other cities to gather data about movements between the two cities. It connects with individuals’ calendars to help them get to appointments on time. This computer is a very busy system. Once a week, though, at three am (on Wednesday), it logs in to its manufacturer’s website to check for updates. If it finds any updates, it stops traffic for ten seconds while it reboots.
Ridiculous? Yes. If a computer is controlling traffic, the vehicles may be moving at amazing speeds. Ten seconds holding still, rather than moving at five hundred miles per hour, could mean big financial loss for businesses that rely on the quick movements of goods or people. Ten seconds not moving might seem like an eternity to a mother in labor on the way to the hospital, or a young man who has swallowed his iPhone and can’t breathe.
The traffic computer should be able to make changes to its system files without needing to shut down and restart. A modular system that isn’t afraid to reference itself dynamically should be able to make changes on the fly without exhibiting the archaic behavior of our current computers.
Why should we wait for such changes in operating system philosophy to become necessary? We are ready now, we have the processing power now, let’s do this – now. I call on operating system programmers everywhere (and anyone else who wants to help) to organize and begin redesigning the operating system from the ground up to be modular, redundant, resilient to errors and dynamically capable of updating without ever needing to restart, reboot or shut down.