WHAT happened at the New Year’s Day air fiasco should not happen again. In fact, it could have been prevented.
A former senior manager for information management and data services at Airservices Australia, Augusto Verzosa, weighed in on what transpired at the Ninoy Aquino International Airport.
A technical glitch caused a shutdown of operations at NAIA for hours. Schedules of local and international flights were disrupted.
Airlines postponed international flights to prioritize hundreds of trips cancelled, delayed or diverted by air traffic management.
Verzosa had taken on roles of analyzing air traffic data and then eventually managing this function across Australia when he worked at Airservices Australia from 2004 to 2006; and again from 2017 to 2018.
“That means understanding where the data came from, what data is generated and how data is analyzed,” he said.
According to him, what happened at NAIA over the New Year was due to several “fundamental flaws,” one of which was a disaster recovery/contingency management failure, which started with the Uninterruptible Power Systems (UPS).
He said a main UPS powers the hardware to prevent it from losing power at any given time.
“There was supposed to be the main, online UPS and the back-up UPS. Apparently, both failed. The fact that both failed indicated that there was no thorough and regular disaster recovery testing of the two UPS’s,” he said.
Contingency plans are crucial, he said.
“When you have a mission-critical system, and much more so with a life-critical system such as the air traffic management system at NAIA, an effective disaster recovery and contingent management capability that is regularly, diligently and thoroughly tested is an absolute requirement,” he said.
“You must make sure that airplanes are properly spaced in the airspace, both horizontally and vertically, en route and when they take off and land,” he added.
“Effective CNS (communications, navigation, surveillance) is critical. That is why you need an effective air traffic management system,” he further said.
Verzosa said this system’s disaster recovery and contingency management capability must be tested every month.
“If they had been doing so, they would have known that a double failure scenario could happen, and NAIA could have mitigated this risk and prevented the double failure. They would identify that double failure potential when they do a regular test. That requires maybe half an hour to an hour,” he said.
During this test, what is usually done is to cut over to the back-up facility like the UPS and then go back to the main facility to ensure the transition is smooth during any outage of the main facility.
By doing this, aviation officials would know if any failure could be handled by the back up, he said.
“The probability of the main UPS failing and the backup UPS failing at the same time is extremely low, unless you have not tested it. Better yet, a double redundancy with a second back-up UPS would lower the risk of total system failure like what happened,” he said.
Wear and tear
He said equipment can fail and degrade over time as parts get worn out. This is why equipment should be tested.
“The fact that the backup also failed, the equipment was not tested sufficiently. Otherwise, it should never have happened,” he said.
Verzosa also noted that to remediate the double failure, what the CAAP did was to reroute the power to the commercial line.
220 volts vs 380 volts
But the commercial line had a voltage of 380 volts, which was higher than what the system’s hardware could tolerate – 220 volts.
“The higher voltage burned some of the equipment, including the VSAT [Very Small Aperture Satellite]. Your commercial power should match the requirements of your hardware,” he said.
The problem was due to a design flaw, he said.
“If your UPS failed, you should have the capability to switch over to the commercial line automatically,” he said.
“The available commercial line should match the power needs of your air traffic management system hardware,” he added.
Verzosa said there was a “training gap,” as the engineers could have put a power transformer in between to bring down the 380 volts to 220 volts.
“If there were engineers who remembered their basics, they could have addressed the situation in a shorter period of time by putting in the right transformer. They would not have burned the equipment,” he said.
Verzosa suggests that the Philippines adopt something like what Australia has, where the two functions of air safety regulation and air traffic management are organizationally separated.
Each function requires a different set of skills.
Over there Down Under
“We have the Civil Aviation Safety Authority of Australia, a regulatory agency, which manages policies and procedures in relation to air traffic safety,” he said.
“We also have Airservices Australia, a commercial organization managing air traffic all over Australia and even in a large part of the southern hemisphere.”
He said airlines pay revenue to Airservices Australia, which has air traffic controllers, engineers, implementors and people who manage data.
“They manage all the CNS (communications, navigation, surveillance) for air traffic – a whole very specialized organization. They have a very high level of expertise and are very experienced, strong specialists running air traffic,” he said.
Many of them are former pilots with a background in engineering and CNS, he noted.
Verzosa also recommends a very strong disaster recovery and contingency management planning and delivery system, a review of the design of the system, and an upgrade of the human resource staffing expertise.
“Even if the air traffic management system is outdated, which is a matter of opinion, as long as it is operationally working, it should work,” he said.
“The issue is with the disaster recovery, power system design and staff training. They lacked the skills to effectively manage contingencies like these,” he added.