Is the railway sufficiently prepared for a cyber-attack?
Ian Maxwell, Head of Train Control Systems at the UK’s Office of Rail and Road (ORR), provides an insight into the differences between external and internal cyber incidents, and how Britain’s rail industry must adapt in order to protect itself from both.
In the lifespan of Britain’s railways, cyber-attacks are a very new risk for the rail industry to consider as part of its risk assessments. In the days of mechanical technology, clearly there was no issue (and the term ‘cyber’ did not exist). As the rail industry moved to electrical systems and relay-based signalling, the issue still did not exist. Yet, at that stage, the beginnings of digital forms of data transmission were creeping in. The first processor-based multiplexer data systems appeared in non‑safety critical roles, but were still critical to operations.
Solid State Interlocking (SSI) in the mid-1980s was the first big step for the industry into processor-based safety systems. Security was a recognised issue. The technician’s terminal was a useful interrogation facility, but also offered some access to modifying data. Initially, the terminal access was only via a physical connection and located next to the SSI cubicle, but it was not long before the convenience of interrogating the interlocking remotely led to the introduction of a modem link.
‘External’ and ‘internal’ incidents
Whilst cyber-attacks are clearly a concern today, it is also important to remember that software based systems are also vulnerable to errors both at the design stage and, later, when changes are introduced. The first message, therefore, is that, while cyber-security must not be taken lightly, it is not the only way in which serious incidents can arise. If we think of cyber-attacks as an ‘external’ event on the system, and software errors as an ‘internal’ or self-inflicted event, we can compare software (IT) system risks with older technologies.
External events on the railway happen all the time. They have the potential to affect all features and locations on the system. Signalling equipment is normally protected behind secure doors with padlocks and palisade fencing, which is now a normal feature around any building needed for the system operation. These physical features are the equivalent of firewalls and passwords in an IT system.
Equally, in terms of self-inflicted events on the railway, there are examples of damage caused by the actions of staff, mostly in error, but it is occasionally the case that a member of staff chooses to deliberately damage the railway. Luckily, this is very rare, but is recognised as one of the most difficult forms of risk to protect against.
Information Technology (IT) and Operational Technology (OT)
Within the cyber industry, the terms IT and OT are used to refer to Information Technology (IT) and Operational Technology (OT). For me, this separates those parts vulnerable to a cyber-attack from those which are not. As such, I find Table 1 a useful comparison of those types of risks that the railway has lived with for a long time and those which look to be new.
Looking at the comparison in Table 1 between external events in the OT world, there are obvious comparisons with cyber-attacks which are a significant part of external events in the IT world. There is a blurred boundary between self-inflicted and external events when it comes to a member of staff maliciously damaging the system. Whether you categorise this as internal or external, it is often the most problematic type of event to protect against. However, there is a non-malicious version of this type of risk caused when a member of staff leaves a security feature unlocked or open that then allows an external party to attack. This event type would typically be a failure to follow procedure by leaving the system open to attack.
How can we reduce and mitigate against these risk types?
We can write detailed procedures to cover every situation, but errors and omissions can happen. Therefore, as an industry, we must consider how to respond when an incident happens – whether it is on the OT or IT system, and whether it is caused by internal or external causes.
What are our contingency plans for when a damaging event occurs?
I suggest that a contingency plan needs to have two phases; containment and recovery. On the railway, we have plenty of experience of both containment and recovery. Signallers are trained to recognise failure symptoms and what action is needed to determine the scope of the failure. Having contained the problem, they can determine what equipment can be relied upon to work properly and what equipment to assume as failed.
Recovery typically requires staff to attend, assess and repair the damage. Meanwhile, the operation of the railway is modified until the equipment is fully functional. Both technicians and operators have rules and procedures to follow which have been developed over many years. These help to ensure a safe railway during degraded operations and during investigation and repair.
When the IT system becomes defective, the same logic should apply. Containment and recovery are still the principles that need to be followed. However, we do not have years of experience on which to base the procedures. A recent example of the impact of a system failure on the network is when it was found that temporary speed restrictions were not operational on the Cambrian ERTMS area. Questions that were quickly apparent included was there appropriate training or procedures to guide them? How do you contain the problem that appears to affect the entire ERTMS deployment? What equipment should be treated as defective and what can be relied upon? How does the technician investigate and rectify such a defect? That is why the rail industry is looking at recommendations from the Rail Accident Investigation Branch (RAIB) and will, as part of this look, implement the best practice.
This example is not the result of a cyber-attack, but the way we manage the railway is the same whether the event is caused by internal or external sources. From a business risk point of view, we need to somehow compare risks from all sources to help prioritise our efforts. There is no doubt that cyber‑attacks have the potential for causing substantial damage, but so does the weather or a dewirement.
Although cyber-threats are very real, they are not always a malicious attack, and cannot all be stopped by security techniques.”
Although cyber-threats are very real, they are not always a malicious attack, and cannot all be stopped by security techniques. The rail industry needs contingency plans to help contain and recover from such events, much as they do for other railway events. From a business risk viewpoint, the ORR encourages the rail industry to judge how big this risk is compared to all of the other risks within the industry that already exist and will continue to exist, and to manage risks that have the potential to impact safety or performance.
Ian Maxwell leads the ORR’s engagement on the deployment of digital technologies, innovation and R&D within the rail industry. He provides technical advice on all signalling issues experienced by ORR staff, ranging from ETCS and CBTC, to mechanical systems and on associated skills and competences. In total, Ian has over 30 years’ experience in the rail industry as a signal engineer, with a significant period at RSSB engaged in preparing railway standards. Currently, Ian leads the engagement with Network Rail on their R&D activities, ensuring that the work remains focused on long-term benefits to the railway. He also represents the ORR at the senior industry technology coordination group and at the industry ETCS System Authority.