Tokyo Stock Exchange issues report on trading system failure on Oct 1, 2020
The report covers the sequence of events surrounding the incident, the cause of the failure, and measures the Exchange will put in place to prevent a recurrence.
At 7:04 a.m. JST on October 1, a large number of messages were detected showing some difficulties in accessing Device 1 of the Shared Disk Device 1 (Network Attached Storage or “NAS”). After this, TSE’s in-house trading control screens became unavailable and a portion of market information, which is normally disseminated to users after 7:00 a.m., was not disseminated.
On confirming the situation with the system development vendor Fujitsu, at 7:55 a.m. the Exchange became aware that a memory module failure had caused the control unit of Device 1 to fail, and a failure of the automated switch to Device 2 had caused the whole NAS to become unavailable.
Although the Exchange continued working to enable a switchover to Device 2, it was unable to guarantee a schedule for this to happen, and since the correct market data was not being disseminated, at 8:36 a.m. the Exchange decided to halt trading of all listed issues from market open (9:00 a.m.)
At 9:26 a.m., the Exchange succeeded in a manual switchover to Device 2 of NAS, which returned all functions back to normal.
The Exchange then began discussions aimed at rebooting the system in order to resume trading within the day. Although the network had been shut down, orders that were received before 8:54 a.m. had been matched and execution notifications had accumulated within arrowhead without being sent to trading participants.
However, it was likely that the Exchange could not guarantee the fairness and reliability of price formation in the market. Furthermore, a resumption of trading was predicted to cause confusion, for example in the handling by trading participants of orders that they had already received. In light of these factors, at 11:45 a.m., the Exchange decided to halt trading for the whole day.
Regarding the cause of failure, TSE explains that it has a system requirement that operations should continue in the case of a NAS failure by switching over to another device within 30 seconds.
Usually, Fujitsu conducts testing with default settings to check a product functions as described in the manual prior to shipment. This time, however, since the arrowhead settings were not the default settings, the production specifications were checked on paper but no actual testing was conducted at the time of shipment.
TSE did conduct NAS switchover testing, but this focused on checking business continuity after the switchover.
The reason why it took some time to complete the manual switchover of NAS is because the failure response procedures were developed on the basis that the switchover would be conducted automatically.
TSE has implemented various measures to enhance the reliability of arrowhead. For instance, a correction of the NAS switchover settings was completed on October 5, 2020. Comprehensive check of NAS settings is set to be completed by the end of October. There will be switchover tests and drills by January 2021 for NAS.