How Many Network Links Do You Have For Mpi Traffic?

SERVERS

If you're a bargain basement HPC user, you might well scoff at the idea of having more than one network interface for your MPI traffic.

"I've got (insert your favorite high bandwidth network name here)! That's plenty to serve all my cores! Why would I need more than that?"

I can think of (at least) three reasons off the top of my head.

I'll disclaim this whole blog entry by outright admitting that I'm a vendor with an obvious bias for selling more hardware. But bear with me; there is an actual engineering issue here.

Here's three reasons for more network resources in a server:

Processors are getting faster
Core counts are rising
NUMA effects and congestion within a single server

Think of it this way: MPI applications tend to be bursty with communication. They compute for a while, and then they communicate.

Since processors are getting faster, the length of computation time between communications can be decreasing. As a direct result, that same MPI application you've been running for years is now communicating more frequently, simply because it's now running on faster processors.

Add to that the fact that you now have more and more MPI processes in a single server. Remember when four MPI processes per server seemed like a lot? 16 MPI processes per server is now commonplace. And that number is increasing.

And then add tothatthe fact that MPI applications have been adapted over the years to assume the availability of high-bandwidth networks. "That same MPI application you've been running for years" isn't really the same - you've upgraded it over time to newer versions that are network-hungry.

Consider this inequality in the context of MPI processes running on a single server:

num_MPI_processes * network_resources_per_MPI_process ?=
network_resources_available

Are the applications running in your HPC clusters on the left or right hand side of that inequality? Note that the inequality refers to overall network resources - not just bandwidth. This includes queue depths, completion queue separation, ingress routing capability, etc.

And then add in another complication: NUMA effects. If you've only got one network uplink from your fat server, it's likely NUMA-local tosomeof your MPI processes and NUMA-remote from other MPI processes on that server.

Remember that all MPI traffic from that remote NUMA node will need to traverse inter-processor links before it can hit the PCI bus to get to the network interface used for MPI. On Intel E5-2690-based machines ("Sandy Bridge"), traversing QPI links can add anywhere from hundreds of nanoseconds to a microsecond of short message half-roundtrip latency, for example. And we haven't even mentioned the congestion/NUNA effects inside the server, which can further degrade performance.

My point is that you need to take a hard look at the applications you run in your HPC clusters and see if you're artificially capping your performance by:

Not having enough network resources (bandwidth is the easiest to discuss, but others exist, too!) on each server for the total number of MPI processes on that server
Not distributing network resources among each NUMA locality in each server

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

How many network links do you have for MPI traffic?

Hot Tags : HPC mpi NUNA process affinity NUMA

Ordering Guide

Resources

About Us

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

How many network links do you have for MPI traffic?

Hot Tags : HPC mpi NUNA process affinity NUMA

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches