Jan 20
2012

Old UNIX/IBM control systems: Potential time bombs in Industry

It may be a point of little attention, as the millennium bug came with a lot of hoo-ha and went out with a whimper, but the impact it had on business was small because of all the hoo-ha, not in spite of it. And so it is with some concern that I consider operating system rollover dates as a potential hazard by software malfunction at major industrial operations such as nuclear power stations and warhead controls, which in worst case scenario, could of course have disastrous implications due to out-dated control systems.

The main dates of interest are 19 January 2038 by when all 32-bit Unix operating systems need to have been replaced by at least their 64-bit equivalents, and 17 Sept 2042 when IBM mainframes that use a 64-bit count need to be phased out.

Scare mongering? Perhaps not. While all modern facilities will have the superior time representation, I question if facilities built in the 70s and 80s, in particular those behind the old iron curtain were or ever will be upgraded. This raises a concern that for example the old soviet nuclear arsenal could become a major global threat within a few decades by malfunction if not decommissioned or control systems upgraded. It is one thing for a bank statement to print the date wrong on your latest bill due to millennium bug type issues, but if automated fault tolerance procedures have coding such as ‘if(time1 > time2+N) then initiate counter-measures’ then that is quite a different matter entirely.

I believe this is a topic which warrants higher profile lest it be forgot. Fortunately the global community has a few decades on its hands to handle this particular issue, though all it takes is just one un-cooperative facility to take such a risk rather than perform the upgrades necessary to ensure no such ‘meltdowns’ occur. Tick-tock, tick-tock, tick-tock…

16 Comments so far

Niccolò Tottoli says:

January 20, 2012 at 3:51 am

A global ban on nuclear weapons and nuclear power plants would be the very best.
GaryChurch says:

January 20, 2012 at 2:35 pm

Y2K again?
Puh-leez.
This is NOT a conspiracy theory blog. You and Otto need to start your own website.
Tom Kerwick says:

January 21, 2012 at 3:54 am

Gary — yet again you display your lack of understanding of an issue raised. To put this in context, I’ve worked in firmware engineering for almost 20 years, and find on many occasions what’s termed ‘clock wrap’ is often the cause of the most catastrophic failures in systems — I’m also very familar with control systems which was no small part of my bachelors and masters education, and I would have to say I also know a thing or two about industrial safety procurement too, being the subject matter of my recent doctorate — check my bio and linkedin page if you have your doubts. I know what I’m talking about. Now, please don’t attempt to discredit an important topic based on your ‘shoot from the hip’ angst. Where at any point did I mention a ‘conspiracy’ of some sort? How was Y2K a conspiracy either? Think a bit before you post more drivel comments.
GaryChurch says:

January 21, 2012 at 4:04 pm

” I know what I’m talking about. ”

If you say so.
Brandon Larson says:

January 22, 2012 at 12:47 am

I think the danger of nuclear weapons malfunctions like you describe is insignificant to nonexistent. The Soviets were pretty good at exercising safety measures over their weapons. Keep in mind they were officially Atheist, so they were not in any hurry to go to the next life, unlike a lot of people in the West and Middle East. Second, they had a serious distrust of any form of automation, drawn from their Marxist ideology and the fact that Russian culture is very paranoid at every level (understandable considering their history over the past thousand years). They called automation “the whore of capitalism”. I seriously doubt that any Eastern Block nuke will fire without some kind of manual intervention. Considering the maintenance requirements for any kind of nuclear weapon more advanced than Little Boy and the state of Russia today I would bet that a large part of their nuclear force is inoperable due to maintenance issues. If anything, the kind of computer malfunction you describe would render the effected weapon inoperable rather than causing an accidental launch or detonation.
Tom Kerwick says:

January 23, 2012 at 11:15 am

Brandon — thanks for the feedback. I believe you are correct in suggesting that this kind of computer malfunction would more likely render a weapon control temporarily inoperable than pose any graver risk (and yes in the general case of weapons control would surely involve a certain level of manual control also) and this mainly owes not due to such clockwrap bugs not being present (they are most certainly — as the scenarios are naturally untested) but due to classic control system design being always ‘fail safe’ — so there may be weakness in the operating systems, but the applications on which they run would have been designed to military standards. The greater risk I perceive is on nuclear power plants (albeit also to military design standards) where for example a control system on reactor core temperature could go awok on such clockwrap scenarios. However, I would suspect such a malfunction could be handled long before any such meltdown occurred, so long as the operations engineers identified what they were dealing with as the classic clockwrap failure quickly enough.

I was considering writing a whitepaper on the subject. The risk is low probability, though the unlikely disaster scenario could be unprecedented. As such I would be surprised is such control systems are not fully phased out long before the key clockwrap dates, though you never know — some in industry would lean towards not to investing in such an expensive upgrade… and take that small risk instead, perhaps just have staff on standby to handle any such potential malfunction on the clockwrap dates. A curious one to keep an eye on I’d suggest. Maybe. Tick-tock, tick-tock, tick-tock…
Brandon Larson says:

January 23, 2012 at 6:22 pm

I have some experience with nuclear reactors, and from what I have seen the only automatic controls on a reactor are the safety controls (ie. SCRAM circuits) and all they can do is shut the thing down. Everything else is manually controlled. The latest American reactors were built on 70’s technology and I would imagine reactors in the old Soviet Union are manually controlled because of the state of technology when they were built and for the reasons I outlined in my last post about weapons. One issue with nuclear power is if a reactor does go down hard it takes much longer to bring it back on line than a fossil fuel plant. Without backup from conventional power plants, a system crash could bring down a large part of the grid. I think this is the biggest danger from this type of failure.

It is possible that the newest reactors coming online in the West are dependent on computer control, and that would be worth investigating. Of course, if they are using the latest hardware and software the problems you describe do not apply.
Tom Kerwick says:

January 24, 2012 at 1:25 am

Brandon — thanks for the further feedback. I do not have specific experience with nuclear reactors, so it is good to get an inside opinion. Am surprised that everything is manually controlled, but as you say if these are built on 70s technology, then that makes quite a bit of sense. Could you give me a pointer to how such SCRAM circuits operate on nuclear reactors, are these run on mainframes or largely analog based? I’d guess there is a layer of control system between human operator and the reactors — such as self-regulating circuits — but again these would not be mainframe based control systems, so not in the risk category. I heard that before about reactors being much more difficult to bring back online. As for newer reactors, a lot of more recent IBM mainframes still use a 64-bit count for convenience though I doubt such shortcuts would be taken in nuclear facilities. Perhaps the risk to such old UNIX/IBM control systems lies elsewhere in industry and not in the nuclear sector after all. Thanks again.
Tom Kerwick says:

January 24, 2012 at 4:40 am

Brandon — just a quick follow-on, I may have dismissed the relevance of this to the nuclear industry in my last response. Although the systems are under manual control — there will always be a control system residing between the manual controls and the electrical/mechanical — The operating technician will set the controls, but the control system applies them — in the same way that your car is manually controlled, but with power-steering you have a mini control system between you and the automobile — not the best of example, but you get the idea I’m sure.

If you have a contact in the industry who could assist me in researching on the firmware which interfaces the manual controls to the nuclear plant electrial/mechanicals, please forward on the details to me as it could assist greatly in my study into this — Cheers.
Brandon Larson says:

January 24, 2012 at 9:05 am

My only experience is with Navy light water reactors, but I’m sure the basic control systems are the same. I don’t remember all of the specifics, so maybe someone else with more recent experience can expand on this. Basically, everything is analog. The criticality is measured by neutron flux sensors, and the rod position is set manually based on criticality and other sensor readings. The SCRAM circuit just drops the control rods to the bottom if it detects a problem. This is also analog, based on analog sensor readings. There is no computer control anywhere in the system. It’s all analog sensors and manual controls. From what I understand they now have what is called a Partial Fast Insertion system along with a SCRAM system, which automatically lowers the rods an inch or so to bring the reactor below critical if there is a problem, but this would work the same way as a SCRAM system. The newer reactors may have digital displays, and there may be some provision for automatic rod control to adjust criticality, but the failsafes still work on analog sensor readings without computer control. Like I said my knowledge is about 20 years old so if someone has some more recent experience please correct me if I am wrong about anything.

The biggest problem I can see is if backup systems are computer controlled without adequate monitoring by humans on site. The Fukushima disaster happened because the backup generators were knocked out by the tsunami causing the reactors to overheat. The main concern in a reactor accident is keeping coolant flow through the core. That is why backup power is so important, to keep the coolant pumps on line. You lose coolant flow and you will most likely lose the reactor.
Brandon Larson says:

January 24, 2012 at 5:08 pm

To answer another of your questions, I don’t have any industry connections, but if you contacted some utilities that operate reactors, or someone in the DOE they might be able to help you.
AnthonyL says:

January 26, 2012 at 7:47 am

“Think a bit before you post more drivel comments”

@Kerwick But Tom, Gary’s comments, though they reflect no thought or information on the subject of discussion, are very helpful, because they provoke justification from you for your excellent topic, and clarify its validity for anyone else who shares the same uninformed doubt, but who might be more polite, and cautious about sharing it.

I don’t think you should discourage drivel of this kind if it is short. It shares the same purpose as the valuable posts of Professor Rossler, to allow the issues to be justified, and carried to the forefront of the minds of people who might otherwise dismiss topics as contrary to the headlines of the New York Times, and therefore not worthy of examination.

The is the excellent public service of Lifeboat as a whole, as well as your interesting post. It brings to our attention imaginative speculation which deserves consideration even though the editors of the New York Times have not yet noticed it, and which serves as an early warning of possible dangers which may affect the security of humanity as a whole.
Tom Kerwick says:

January 26, 2012 at 10:07 am

An interesting viewpoint Anthony — it is true that in this case a comment which reflected no thought or information from one individual actually fueled a serious debate amongst others on it which might not otherwise have happened. The unintentional public service brought to you by a ‘Puh-leez’ merchant.

Brandon — thanks for your suggestion. I actually enquired through the media relations personnel at Sellafield Ltd for information but so far they have fallen silent.
AnthonyL says:

January 26, 2012 at 11:19 am

Tom, since few here seem willing to actually read references such as your own paper would you please give us a bottom line summary of a) what you found which leads you to b) suggest the points should be cleared up in a safety conference on the LHC?
Tom Kerwick says:

January 26, 2012 at 11:29 am

Anthony — the topic of this thread has nothing to do with CERN/LHC safety procurement. It is about old industrial control systems in industry — lets discuss LHC in a more relevant thread. As for the topic of this thread, I intend to research it a small bit further before deciding whether it warrants writing a whitepaper on the topic.
AnthonyL says:

January 26, 2012 at 9:58 pm

Whatever thread you prefer, same request, if you wish to.

Comments are closed.

Blog

16 Comments so far