Register now for better personalized quote!

HOT NEWS

Building Resiliency Guardrails to Isolate Crashes in Cisco Products

Nov, 10, 2021 Hi-network.com

At Heathrow Airport outside of London, more than 600 flights were disrupted or cancelled, and 42,000 pieces of luggage were temporarily lost. In Washington, D.C., acomputer operated by the National Security Agency was offline for three days. In Panama, two dozen patients died after accidentally receiving an overdose of gamma radiation to treat their cancer. Ariane 5, a $7 billion rocket built by the European Space Agency to carry satellites into orbit, exploded less than a minute into its maiden voyage. 

What do all of these events have in common? Software bugs and crashes. 

With 190 million lines of code, Cisco IOS XE, like any other large software stack, can never be crash-proof. But the software engineering team within Cisco Enterprise Networking has developed techniques to dramatically limit the impact of software crashes. Thosetechniques, written into IOS XE code, addtremendous resilience toevery Cisco enterprise networking device.  

From Monolithic OS to ResilientModular Software Stack 

When Cisco IOS was first developed, it was a monolithic operating system. Any fault in any module, including upgrades to different versions, could cause the software to crash. It could then take minutes, hours, or even longer to restart Cisco routers and switches.  

Moving from IOS to Cisco IOS XE, Cisco developers strived to make sure that the user experience was the same while adding techniques to improve the fault isolation of processes running within the system. As a complete networking software stack running on a Linux kernel, IOS XE was designed with separate fault domains so that a fault in one part of the system did not take the rest of the system down. This is demonstrated in systems with separate line cards and forwarding engines such as the Cisco ASR 1000 Series Aggregation Services Routers and the Cisco 8000 Series Customer Edge Routers. The line cards, route processors, and forwarding processors can be reloaded and upgraded independently without an entire system reload. Today, if a Cisco product running IOS XE suffers a crashthe system does not go down because the faults are isolated to specific domains. 

In the latest version of IOS XE, the softwareresiliency is being increased by reducing the fault domains to a single process. This is achieved by creating a process runtime architecture that use three software techniques: work units, transactions, and persistence. 

Work Units Limit the Scope of Faults 

With IOS XE, in the event of a crash or a version upgrade, processes continue operating as if the restart didn't occur. One of the key foundations is that all processes in the system are designed to operate on discrete and independent work units. Crashes

tag-icon Hot Tags : Cisco IOS XE Cisco ASR 1000 Catalyst 8000 Edge Platforms Family

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.