Explaining the EUROCONTROL Network Manager systems outage on 3 April 2018 - Summary briefing released
Much to our regret, the flight plan data in the Network Manager’s (NM) Initial Flight Plan Processing System (IFPS) and Enhanced Tactical Flow Management System (ETFMS) was accidentally deleted on 3 April 2018. As is usually the case with complex systems, the underlying reasons are not that simple! The Summary Brief on this outage is now available.
Background information: the NM systems are structured in two distinct and theoretically separate environments: the "test" environment and the "operation" environment. New software versions are installed and tested on the "test" environment.
In brief, an automated script routinely used is meant to initialise systems by uploading a new set of airspace data. The first step is to erase the content of the existing databases in the operational test domain only.
Through a series of interacting errors and weaknesses, the script was able to promulgate to the operational system and delete the live flight data. The simultaneous deletion of all flights from the live databases of all NM operational instances was the trigger for the system outage that generated the service disruption at the EUROCONTROL Network Manager on 3 April 2018. A further complication also affected the contingency site at the Experimental Centre in Brétigny-sur-Orge, France.
An assessment confirms that the impact on the traffic was kept under control and a safe level of traffic was maintained at all times. EUROCONTROL has taken immediate measures to avoid the repetition of a similar outage: suspension of testing, fixing faulty variable, removal of backdoor.
Communication and coordination
Throughout the incident, the Network Manager Operations Centre (NMOC) coordinated closely with the air navigation service providers (ANSPs) so as to make the best possible use of the remaining capacity.
A teleconference with over 170 participants was held to clarify the situation and answer questions from airspace users, ANSPs and airports.
NMOC also communicated directly with airspace users and airports to try and help them deal with the issues caused by the outage.
The ATFM Procedural Contingency Plan was activated which included precautionary reductions in air traffic control capacities and lower departure rates from airports. Whilst this ensured a safe level of traffic throughout the European ATM Network, by design it had a negative impact on network performance.
Provisional figures show that there were, on average, additional departure delays of around ten minutes - compared with the average expected departure delay of three minutes per flight - between 12.30 and 18.00 UTC on that day.
Only a few flights were cancelled.
From the larger airports the data show notably lower impact on London Heathrow, Paris Charles-De-Gaulle and Frankfurt, with higher impact on Istanbul Atatürk, Amsterdam and some Spanish and Portuguese airports.
The most pressing issue on 3 April 2018 was the additional workload for airspace users, as they had to refile their flight plans, and for the Air Traffic Service Units, which had to enter flight details into their local systems manually.
The outage timeline
The outage occurred at 10.26 UTC. The need to prepare for a contingency plan was announced to stakeholders on the NOP at 11.01. The contingency plan started at 12.26. The cause of the technical failure was identified at 13h00. Flight plans were resubmitted by airspace users and, once this was done, ETFMS returned to operational status at 18.00.
Improving NM system resilience
Further measures have been identified to prevent the occurrence of similar failures and to improve recovery procedures as identified in the internal enquiry launched by Director General Eamonn Brennan. Additionally, an unscheduled audit by EASA was carried out leading to further improvements being implemented as part of a Corrective Action Plan.
Finally, work is ongoing to develop a long-term investment plan of our network technical systems to make them ready for SESAR deployments.