Cisco Data Center Network Manager (DCNM) has long been considered the gold standard of SAN management because of its stability, rich features, and wide adoption. DCNM has been around and serving customers very well for over two decades. It was initially featured along with the launch of the MDS 9000 switches. Back then, it was called Fabric Manager and this name is used in conversations even to this day. That was the time when applications used a monolithic design. However, times have changed, and over the last decade, software development has undergone a total makeover. Instead of a large monolith, these days an application is divided into multiple microservices. Then, each of these services runs in its dedicated containers. This microservices-based design leads to many benefits, such as higher scale, improved security, higher availability, and faster development.
DCNM's monolithic design, despite serving well for two decades, is unable to unleash the full potential of flow visibility, real-time analytics, and higher scale. Instead of trying to retrofit these benefits into DCNM, we rewrote the entire application from scratch using the microservices architecture. During this fresh development, we also gave it a new name called Nexus Dashboard Fabric Controller (NDFC).
Note: The focus of this post is on NDFC SAN Controller.
In January 2023, we launched the third major release (12.1.2) of NDFC SAN Controller. This makes the perfect time to upgrade from DCNM to NDFC not just because three is a magic number, but also because of ten reasons that I am sharing in this post.
We designed NDFC to honor the existing DCNM licenses. This investment protection becomes even more rewarding with the exceptional longevity of Cisco switches because of in-place upgrades to the higher speeds. For example, customers who bought MDS 9710 with DCNM license in 2013 can upgrade to NDFC in 2023 using the same license. Just to be clear, any newly purchased Nexus and MDS switches still require new NDFC licenses. But NDFC will be able to manage any existing switches you have already purchased DCNM licenses for. Essentially, it's free to upgrade from DCNM to NDFC.
The hosting platform of NDFC, called Nexus Dashboard (ND), provides native active-active clustering. You can deploy 3 nodes of Nexus Dashboard in a cluster and then install a single instance of NDFC on it. The user experience for SAN Management remains unchanged regardless of 1-node ND or a 3-node ND cluster. However, with 3-node cluster, ND automatically distributes microservices on multiple nodes. If one node fails, the existing two nodes continue to run NDFC. This achieves an unprecedented reliability for SAN Management and Insights.
Active-Active clustering does not exist in DCNM. It has a different implementation for achieving high availability, called Federation. But because of the dependency on an external Oracle RAC database, DCNM Federation increases the total cost of ownership. In contrast, Nexus Dashboard natively integrates distributed database services for achieving active-active clustering. This design makes high availability of NDFC much more affordable.
To this day NDFC is the only SAN management software to offer a true highly available architecture to drastically improve up-time and availability for SAN operations. In a time when down-time is simply unacceptable, this architecture delivers piece of mind even on the network management side of the business.
NDFC One View provides centralized management and visualization of multiple SAN environments that are managed by different NDFC servers. Using an executive-level dashboard of One View, you can seamlessly navigate to a participating NDFC server using Single Sign-On (SSO). This leads to faster troubleshooting and simplified Day 2 operations.
There is no extra license for NDFC One View. If you already have DCNM/NDFC license for managing the switches, you can start using NDFC One View today with no extra cost.
DCNM does not offer a similar feature. If you have multiple DCNM servers managing different SAN environments, getting a unified view across all these DCNM servers is possible only after upgrading to NDFC.
Starting 12.1.2 release, NDFC can manage up to 40K ports (earlier 20K ports) and up to 160 switches (earlier 80 switches). This increased scale helps to consolidate multiple DCNM servers in fewer NDFC servers, and hence saves time, effort, and money.
Besides the fabric scale, SAN Insights can now monitor up to 1 million I/O flows per NDFC server. When SAN Insights was first added to DCNM 11 in 2018, it could monitor 20K flows. Later, DCNM 11.5 increased this scale to 60K flows. In 2021, the first release of NDFC 12.0.0 increased the number of monitored flows to 250K, which was further increased to 500K with 12.1.1 release. Now, NDFC 12.1.2 can monitor 1 million flows delivering 50x increase over the last five years.
This increased scale provides two key benefits. First, it allows monitoring every flow, which is needed because users do not know when and where a problem may surface. SAN Insights continuously monitors performance of I/O flows proactively and their history is available when an anomaly is detected (more on anomaly detection shortly). The second reason is that even a fabric with only a few switches can have hundreds of thousands of flows. For example, one customer has 180K flows in a fabric of just four switches. When accounting for dual fabrics, the flow count in their environment is approximately 380K. NDFC can monitor these environments in an always-on fashion and can scale event further.
Not just monitoring, NDFC can automatically detect anomalies in any of the million flows. You can set the policies of your choice or simply activate the ready-made policies in NDFC. For example, NDFC can create an anomaly policy to notify the admins when read or write I/O operations to a LUN (for SCSI traffic) or Namespace (for NVMe traffic) takes longer than 500 microseconds. Anomaly detection helps in detecting storage performance issues much earlier than application is affected.
This feature of anomaly detection does not exist in DCNM. Monitoring and the ability to detect anomalies in a million flows is possible only because of the microservices-based architecture of NDFC.
It is not easy, but we are trying hard to exceed the high bar set by DCNM. In the last two years, we have added (and redesigned) most DCNM features to NDFC, such as zoning, performance monitoring, device alias management, interactive topologies, programmable reports, SAN Insights, VM integration, UCS traffic visualization, config backup and restore, switch image management, and a long list of features. For the long-time users of DCNM, especially those who still use its Java GUI (known as thick client), the experience of using NDFC is different. I recommend starting this transition sooner than later. While making this transition, we are open to your ideas for adding new features and enhancing existing features.
Not just adding existing features from DCNM, but we are also developing new features in NDFC. The key points it that these new features are only getting into NDFC. For example, the new Configuration Monitor feature automatically detects any drift from a golden NX-OS config and generates alarms (Figure 1). This detects any unintentional changes and helps in preventing damage to your environment.
Besides new features, the newer 64GFC MDS switches (MDS 9124V, MDS 9148V, and later) can only be managed by NDFC.
We still support DCNM and have not yet announced its end of life. But we are not adding new features and support for newer switches in DCNM. All the innovation is going only to NDFC.
Figure 1: NDFC automatically detects and notifies drift in NX-OS configuration.Physical appliance (called pND) is a new deployment model for NDFC. Like Cisco Nexus and MDS switches where the hardware and software (NX-OS) is fully supported by Cisco, the pND hardware and software are also fully supported by Cisco. This reduces the dependency on other teams for deploying and maintaining NDFC.
NDFC SAN Controller can also be deployed on RHEL (called rND) or as an OVA on ESXi (called vND). We offer this flexibility so that you can choose the best option for your environment. If you see value in using the physical appliance (pND), this is another reason for upgrading from DCNM to NDFC.
NDFC stitches end-to-end flows between VM and LUN (for SCSI traffic). This makes VM-Initiator-Target-LUN (VM-ITL) flow in SAN Insights. For NVMe traffic, a similar flow is called VM-ITN flow where N represents a Namespace.
DCNM provides I/O flows at a granularity of ITL or ITN. Stitching these flows to a VM requires manual correlation in vCenter. NDFC simplifies this step by automatically correlating ITL and ITN flows with the VMs.
This VM-ITL and VM-ITN flow visibility in NDFC leads to much faster resolution of slow performance issues.
Congestion Analysis feature in NDFC (successor of DCNM Slow Drain Analysis) provides a topology visualization of the source, cause, and time of congestion. Another new feature, called Event Analytics, automatically receives congestion alerts from the Port-Monitor feature on the MDS switches. These new features in NDFC allow correlation and investigation of congestion issues using intuitive charts and trends.
Besides detecting congestion, NDFC also allows configuring Dynamic Ingress Rate Limiting (DIRL), which is a unique innovation on MDS switches for preventing congestion. NDFC also visualizes the results of DIRL that further allow optimizing congestion thresholds.
These new enhancements for detecting and preventing congestion are only available in NDFC. You can continue to use Slow-Drain Analysis in DCNM but the full benefit of Congestion Analysis and ability to configure and visualize DIRL is possible only after upgrading to NDFC.
We provide a tool to simplify the upgrade from DCNM to NDFC SAN Controller. This tool takes the backup of the discovered fabric, alarm policies, server setting, and even the performance monitoring data from DCNM and restores these to NDFC.
My personal recommendation is to run DCNM and NDFC in parallel for a while. Then decommission DCNM only after you are comfortable operating NDFC. If your automation infrastructure relies on DCNM, this approach gives enough time to update and verify NDFC RESTful APIs.
Refer to the white paper for a step-by-step guidance for upgrading from DCNM to NDFC.
We continue to support DCNM and have not yet announced its end of life. But new innovations and support for newer switches is available only within NDFC. With the third major release, scale of a million I/O flows, stability, and newer features, now is the perfect time for upgrading from DCNM to NDFC SAN Controller. Starting this transition sooner gives enough time for a graceful upgrade, get familiar with the Nexus Dashboard platform, and verify automation.
If you like to see any new features in NDFC, please reach out to us via your account team or leave a comment below.
To learn more about NDFC, refer to the following resources.