The National Stock Exchange (NSE) on Monday said failure of telecom links as well as that of storage area network system led to the outage last month and that steps are being taken to address the issues.
Between primary and NDR (Near Disaster Recovery) sites, NSE said it has multiple telecom links with two service providers to ensure redundancy.
In a detailed statement on the outage that happened on February 24, the bourse said various measures have been taken and others are under implementation to address the issues.
"On February 24, 2021 we had instability in links from both service providers primarily due to digging and construction activity along the path between the two sites," the exchange said.
On that day, post link failure, the exchange said it saw unexpected behaviour of the Storage Area Network (SAN) system, with the primary SAN becoming inaccessible to the host servers.
This resulted in the risk management system of NSE Clearing and other systems such as clearing and settlement, index and surveillance systems becoming unavailable.
The bourse noted that the SAN system at the primary data centre stopped functioning, which was completely unexpected.
Subsequent incident analysis showed that the problem was caused by failover logic implemented by the vendor which did not conform to NSE's stated design requirements, coupled with issues in the configuration done by the SAN vendor that triggered the failover logic, it added.
Further, the exchange said the specific failure logic used by the vendor is not documented, was not communicated to NSE, and was not appropriate for NSE's setup.
The resultant SAN failure led to the incident on February 24.
"While there was no impact on the trading system, given that the risk management system was unavailable, allowing trading to continue on NSE posed an unacceptable risk, and hence trading had to be halted," the exchange said.
The SAN is a fault tolerant system that was designed to function seamlessly even in the event of telecom link failures between primary and NDR copies.
One of the features of SAN that was deployed in October 2020 was designed to provide not just zero data loss but also zero down time.
Before deployment, the system was tested against various scenarios including link failures and functioned properly, as per NSE.
Further, the bourse said various steps have already been taken and others are under implementation to address the SAN and telecom link issues.
Post halting of trading, the exchange said it considered all the available alternatives on hand, including invocation of DR (Disaster Recovery), to decide on the course of action that would bring up the market at the earliest with least disruption to market participants and post evaluation, a decision was taken to bring up the systems at the primary site.
This communication was done only after there was visibility and clarity on resumption of services and any prior communication would not have been appropriate, the exchange said.
"We reiterate that we could not have communicated sooner because we did not have the clarity and visibility that was important for making an announcement," it added.
As per NSE, it had already placed orders in January for two additional telecom provider links and has removed the SAN software that caused the incident.
"We are also exploring alternate solutions to de-risk dependency of critical applications to a single storage device," the statement said.
With regard to interoperability, the exchange said NSE Clearing Ltd's (NCL) Risk Management System (RMS) at BSE and MSEI was functioning and cleared trades executed on BSE and MSEI within the collateral levels available at the time NCL's RMS at the primary site became unavailable.
Updation of collateral was not available as part of the design which is being addressed as part of the strengthening of certain aspects of interoperability that all MIIs are collectively working on with Sebi, it added.
The interoperability framework was put in place in 2019 for providing capital and operational efficiencies by enabling market participants to consolidate their clearing and settlement under one clearing corporation while trades could be executed on any exchange.
As part of the interoperability design, all the clearing corporations have set up their slave systems of the primary systems in other exchanges to facilitate risk management.
Trading activity at NSE halted for nearly four hours.
Photograph: Danish Siddiqui/Reuters