Insights

Take nothing for granted: network resilience is a strategic imperative and competitive differentiator

On 05-01-2024
 
Reading time : 10 minutes

The world has got used to always-on connectivity that provides high-speed, reliable internet access. Consumers have told researchers that telcos should focus on maintaining connectivity quality, reliability, speed, and connection. Let’s take a look at how we ensure the resilience of our wholesale backbone network, the foundation of your own service resilience.

Telecoms networks are essential in enabling communication, data transfer, and access to information in today’s world. They’re the foundation of our digital society, and connect people, businesses, and governments worldwide. 

However, many different types of events can impact networks. There are basic issues like cable damage and power outages that cause networks to fail. Then there are more significant events that can jeopardize communications: for example, a rise in natural disasters like floods, hurricanes and earthquakes that can damage physical infrastructure like cell towers, cables, and data centers. It’s worth noting that there’s been a tenfold increase in the number of natural disasters worldwide from the 1960s to today.

Geopolitical turbulence can put network stability at risk too. Ongoing wars and political tensions, trade wars, restrictions on certain countries’ manufacturers and suppliers, uncertainty around countries that produce large percentages of the world’s semiconductors, cyber-attacks – it all adds up. This increased risk for both terrestrial and submarine networks requires a greater focus on resilience.

What is resilience ?

The International Telecommunication Union (ITU) defines resilience as “the robustness of the network infrastructure and should ensure the continuity of telecommunication services against any damage caused”, by disasters, attacks and other incidents. From a service delivery perspective, it speaks to the ability of the network operator to maintain an acceptable level of service in the face of challenges to normal operations. This includes recovering quickly from disruptions, adapting to changing conditions, and maintaining functionality during and after a disaster or failure.

If your network is resilient, you should be able to minimize the impact of failures, whether due to natural disasters, equipment malfunctions, or cyber-attacks. Operators should be able to reroute traffic, manage network congestion, and even self-repair by automatically identifying and resolving issues.

How do you build resilience ?

Telcos can build resilience into their networks through three main techniques:

  • Redundancy: the backup systems or paths in a network that are used in the event the primary ones fail
  • Diversity: the network should have multiple ways to perform any one function designed into the architecture, so that if one method fails, another one kicks in and takes over. This also applies to network routes
  • Modularity: designing the network in such a way that a failure in one area doesn’t spread out and affect the entire system - to avoid a single point of failure.

Not all networks are created equal

Different countries have different regulations relating to resilience requirements. And different telcos have different views on what constitutes sufficient resilience in their own networks. Research by Analysys Mason in August 2023 found that less than half of network operators (43%) carry out risk analysis or fault simulation, or network element health checks. And only 26% say they perform simulated attacks on their network, while fewer than 20% carry out fault survivability or disaster recovery analysis. 

We take network resilience very seriously. Here are some examples of how we maximize the resilience of our core network backbone. 

Applying resilience by design 

As with cybersecurity, resilience for telcos is essential and should be designed into the network architecture. Orange Wholesale has taken action to ensure we have multiple points of presence (PoP): at least two PoPs per major city, and build-in hardware resilience for a given PoP. This delivers necessary redundancy and backup.

In data centers, the highest risk of a network outage is power failure. So we’ve ensured all our core devices have redundant power feeds that terminate on independent circuit breakers and diverse power sources. We also employ stringent capacity planning techniques to make sure networks don’t get overloaded: we plan for routers and links to be loaded at maximum 50% capacity so they’re able to support all network traffic if a router or link fails. This is resilience by design.

Strong back-up and redundancy routes

Redundancy means having multiple or backup components, paths, or services that can take over in the event case of a failure or disruption. It’s a key action to maximize the resilience of terrestrial or submarine networks. Another sensible move is to make agreements with off-net providers for additional back-up routes. Orange Wholesale has these agreements in place in addition to our own intra-Orange network routes, all intended to guarantee continued connectivity should there be a cable or network failure.

Factoring potential risks into deployment strategy

With natural disasters on the increase and geopolitical uncertainty a continuing threat, it’s essential to put pre-emptive resilience into the transmission network. This can be achieved by using protection features at the transmission level considering the diversity of routes available in meshed networks. This can also be done by deploying submarine cables along routes that avoid potential geopolitical risk zones for example. The same applies to alternative routes – they should be designed to bypass high-risk areas such as the Suez Canal or possible threats in the South China Sea area. 

Satellite connectivity also comes into the resilience picture for mobile backhauling or backup routes for critical remote sites.

In the event of a cable or network failure, these alternative routes keep connectivity maintained.

Invest in hardware, software and skills

We upgrade the software in our network equipment at least once per year, and we also regularly upgrade our hardware to ensure customers are always enabled by the latest technologies. Furthermore, we prioritize skills in our staff: we invest in training for our operational teams so that they are up to speed with all these latest technologies and features in our network.

Enabling resilience through innovation

A major network transformation to expand both footprint and capacity 

A couple of years ago Orange Wholesale recognized that our network needed more flexibility built-in, and that we could do this by combining our two distinct networks. So we pulled together our very large footprint international B2B network, which serves multinational companies, and our wholesale network, which has less of a footprint, but massive bandwidth.

 It was an innovative move that only an organization the scale of Orange could undertake, and improved the resilience of our merged network: we were able to align both networks to the highest requirements, those of B2B customers, who are very demanding in terms of performance and QoS. Unavailability, latency and packet loss are three major KPIs that enterprise customers regularly cite to us for their critical applications. As a result, our wholesale network now benefits from the same performance levels and diversity of routes that our enterprise customers enjoy. Consequently, our wholesale customers’ B2B clients also benefit from this improvement in quality of service. As an example, we have measured a 25ms reduction in round-trip delay on the New York to Singapore route. For some applications, this is a significant enhancement.

Segment routing and Flex-Algo for customized routing

Merging the two networks together also allowed us to deploy some new, innovative functionalities in this new network. Thanks to segment routing (SR-MPLS)* and Flex-Algo** technologies, we’re able to define two different logical routing plans for enterprise and IP Transit wholesale customers. The former benefit from the shortest routes to minimize round-trip delay (RTD) and latency, the latter remains on circuits with the highest capacities. It means we’re able to customize the routing per service, while maximizing the resilience for all the services.

Fast reroute for reduced convergence times

We’re also in the process of rolling out the latest version of the fast reroute functionality, Topology Independent Loop Free Alternate (TI-LFA), which supersedes the RSVP-TE fast reroute we had deployed on our enterprise network since the early 2000s. Fast reroute TI-FLA, combined with segment routing, means we can significantly reduce convergence times after a change in network topology on our merged network. It means that interruption times of potentially three seconds on a link between Asia and Europe can now be reduced to about 100 milliseconds. On shorter links, even less. Such low convergence times mean packet loss is minimal and virtually invisible and doesn’t impact critical applications.

Minimizing risk through commitment to all our customers

At Orange Wholesale, we take the resilience of our network very seriously. It’s a strategic issue that we put investment and resources into, because we know our customers trust us to keep their worlds up and running whatever happens.

Combining our enterprise and wholesale networks means we were able to align our engineering and performance requirements to the most demanding possibilities and events, ensuring minimized risk for all our customers and maximum peace of mind for our wholesale customers and their end B2B and B2C customers. It’s an ongoing commitment: we will continue to invest in the latest technologies to ensure the resilience of our networks and go on delivering world class service.

 


*Segment routing: segment routing is a network technology that simplifies the routing process by allowing packets to be forwarded along a predetermined path. In traditional routing, each router in the network independently determines the best path for a packet to reach its destination. Segment routing, on the other hand, predefines the path at the source node, eliminating the need for intermediate routers to make routing decisions.

**Flex-Algo: Flex-Algo, short for Flexible Algorithm, is a feature in routing protocols that allows network operators to define multiple routing algorithms and assign them to different traffic types or network segments. Each algorithm can have its own set of metrics, constraints, and preferences, allowing for more granular control over traffic engineering and optimization.

 

You may also be interested in these articles: