DevOps principles are not exclusive to software development, and some of them can definitely be applied to infrastructure configuration. NetDevOps brings the culture, technical methods, strategies and best practices of DevOps to network management.
Sometimes you might find it referred to by different names, likeDevNetOps, NetOps, orSuperNetOps. But in general it is related to the more generic term Network Reliability Engineer (also coming from the DevOps counterpart Site Reliability Engineering).
Networks exist to provide connectivity for end-systems and applications, so obviously they have a critical role in any type of service.Everythingneeds connectivity, so the network is certainly a fundamental asset in any modern enterprise these days. Its functionality has become so critical that most business nowadays would not be able to survive without connectivity.
However, there is a very common perception that the network is actually fragile.
Key network engineers that have been working long enough on a certain network become gurus. They are the ones that know the why and how of multiple specific configurations: why that had to be done last year on those core routers, how many neighbors should be seen by a certain edge router, or what that propagated BGP community means. Every box has a unique configuration to accommodate whatever was required at a specific point in time: troubleshooting or debugging a certain issue, that small fix in the routing protocol weight to determine the right interface to use, or those interfaces that are down and nobody knows if they should actually be up or not. Sequential and manual provisioning leads into a situation where each network device becomes a snowflake, due to how its configuration has changed organically according to whatever was required along since it was installed.
Without these key engineers there is a fear that network changes will go wrong. So, operations teams tend to minimize the number and frequency of changes in their networks. Nobody wants to affect their precious business traffic and be pointed at by the CTO as the person responsible for that big failure. So, changes rarely happen. And when they happen, they are BIG, because there is a backlog of things to do. The bigger the change, the more possibilities that something will fail. Besides this, teams are not well practiced because changes do not happen often. Fixing an issue while operating a network live, or performing a rollback quickly, requires practice. So now any problem that happens during the maintenance window will lead to the perception that the network configuration change was a failure.
Furthermore, applying network-wide policies becomes a task proportionally tedious to how big the network is. For example, consider a possible Infosec recommendation to change SNMP strings every month. Doing it manually in a big network might require a number of engineers performing those changes simultaneously across the network, maybe during a maintenance window by night to make sure systems can be synchronized next morning. This manual process involves quite some manual interaction, which is definitely prone to errors.
This type of considerations is very similar to the ones they had in classic software development. With their monolith architectures and bi-annual software updates, they suffered from similar challenges. And then they started doing things different, with things like Agile, DevOps, CICD pipelines and automated unit testing.
Applying this same type of principles to network configuration is what we called NetDevOps, and it will provide similar benefits to the ones software developers obtained while implementing this practices in their own environment. But it will require big cultural changes, like:
What ifnetwork engineers started working with network configurations the same way software developers work with their code?
What ifwe could create automated pipelines for those network configurations, that worked like CI/CD does for software development?
What ifthe network could be continuously monitored for health and improvement?
Nowthat would be a game changer. Not only in the way we manage our networks, but also in how we scale up, how we automate repetitive tasks, how different teams collaborate, and how we improve the reliability of our networks. Let