An erroneous bulk upload of static routes to a Telstra production network edge router was the cause of last Wednesday’s internet-wide service disruption that saw data traffic take a long detour via Australia, causing performance degradation for other providers in the process.
Telstra senior network engineer Mark Duffell apologised for the error, which meant that 500 internet protocol version 4 (IPv4) prefixes, or subnetworks, were advertised as belonging to Telstra.
The technical error occurred as part of post-verification testing to address a software bug in the Telstra Internet Direct provisioning tools.
After the incorrect configuration was deployed to a single edge router the hundreds of IPv4 prefixes were announced to the global internet through the border gateway protocol (BGP) that supplies route information for network providers.
As adjacent internet peers or autonomous systems (ASs) learned through BGP that the fastest and most efficient routes to certain networks were supposedly via Telstra, they adopted that information and announced it to other providers they connected to, which amplified the error.
“It’s important to understand that the root cause of this interruption was not malicious in nature, the routes were not intentionally hijacked, and no emails or data were breached or lost,” Duffell wrote.
Telstra has temporarily disabled the provisioning testing tools until it can ensure that Wednesday’s accidental route hijacking won’t happen again.
The telco is also modifying its route validation system to prohibit the bulk upload of static routes, which was the initial cause of Wednesday’s problems.
No practical way to prevent BGP hijacking
Telstra has implemented Resource Public Key Infrastructure (RPKI) on its domestic AS 1221 network, and is working on adding the certification techonology to its AS 4637 international network, a spokesperson told iTnews.
With RPKI, providers can cryptographically verify whether or not an organisation is authorised to make BGP route announcements.
If not, the announcements can be deemed as invalid, and filtered out automatically, which would prevent traffic being routed up the wrong network junctions.
But secure email provider Protonmail, which was among the service providers hit by the Telstra routing hijack, pointed out that RPKI is opt-in and will only work if every network on the internet agress to abide by it.
Currently, only 17 percent of the internet uses RPKI validation, which means BGP hijacking has the potential to cause significant damage.
In Wednesday’s incident, approximately 30 percent of the global internet looking for Protonmail was directed to Telstra instead.
Protonmail said that while no data or messages were lost, it “incurred meaningful financial losses” as some services such as its payments system were not functioning for several hours.
The Swiss provider was able to divert all mail and web traffic to unimpacted internet routes, and the only problem its customers experienced was delays in sending and receiving messages.
This turned what Protonmail said was perhaps the most serious BGP hijacking incident ever, affecting more than 1680 IPv4 prefixes, into a minor inconvenience for its customers.
Founder of BGP monitoring service BGPmon Andree Toonk explained RPKI route validation only works on ingress, which is the routes that are learnt from a peer network.
Currently, RPKI route validation isn’t done on egress or outgoing announced routes.