Not all bare metal is equal when you need sustained wire-speed throughput. The spec sheet rarely tells the full story — a server with a 10Gbps NIC can still bottleneck at 4 Gbps due to PCIe lane starvation or NUMA locality mismatches. This guide explains what to look for.

1. The NIC Is Only Part of the Story

A 10Gbps NIC plugged into a PCIe x4 slot is limited to ~3.2 Gbps of actual throughput (PCIe 3.0 x4 = 32 Gbps bidirectional, but shared). For a 10Gbps NIC operating near line rate, you need a PCIe x8 slot minimum, or PCIe 4.0 x4.

PCIe ConfigTheoretical BWUsable for 10GbE
PCIe 3.0 x432 GbpsMarginal (shared)
PCIe 3.0 x864 GbpsYes
PCIe 3.0 x16128 GbpsYes (for 25/40GbE)
PCIe 4.0 x8128 GbpsYes (for 25GbE+)

2. NIC Bonding for Redundancy and Throughput

Dual NICs in a bonded configuration provide both failover redundancy and, in active-active mode, doubled throughput. Linux kernel bonding modes relevant for server hosting:

  • Mode 1 (active-backup) — one NIC active, second takes over on failure. Zero throughput gain, maximum reliability.
  • Mode 4 (802.3ad LACP) — requires switch support. Distributes traffic across both NICs. Real throughput gain depends on flow hashing — a single TCP flow can still only use one NIC.
  • Mode 6 (balance-alb) — no switch requirement. TX load balances across NICs, RX uses ARP negotiation. Works without LACP-capable switch.

For streaming use cases, Mode 4 (LACP) with xmit_hash_policy=layer3+4 distributes flows across both NICs based on source/destination IP+port — maximizing actual throughput for many concurrent viewer connections.

3. NUMA Topology and Why It Matters

Modern multi-socket servers have Non-Uniform Memory Access (NUMA) topology — each CPU socket has local memory with low latency and remote memory with high latency. If your NIC is on PCIe lanes connected to Socket 0, and your application runs on Socket 1 cores, every packet crosses the QPI/UPI interconnect — adding 40–80ns of latency and consuming interconnect bandwidth.

Checking NUMA affinity:

On Linux: lspci -vv | grep -A2 "Ethernet" shows the NIC's NUMA node. numactl --hardware shows your topology. Pin your application to the same NUMA node as your NIC using numactl --cpunodebind=0 --membind=0.

4. CPU Architecture for Network-Heavy Workloads

For pure forwarding and streaming (minimal compute), core count matters more than single-core performance. For transcoding + forwarding, single-core performance matters too.

WorkloadPriorityRecommended
Pure HLS serving / proxyCore count, I/OAMD EPYC (many PCIe lanes)
Live transcodingAVX-512, IPCIntel Xeon Scalable (Ice Lake+)
Mixed transcoding + servingBalancedAMD EPYC 7003+
High connection countCore count + RAMAny modern Xeon/EPYC with 64GB+

5. Storage: NVMe vs SATA for Segment Caching

HLS segment files are small (typically 500KB–6MB each). The bottleneck is IOPS, not sequential throughput. SATA SSDs deliver ~80K IOPS; NVMe SSDs deliver 500K–1M IOPS. For a busy edge cache serving thousands of concurrent segment requests, NVMe is not optional — it's the difference between sub-millisecond segment reads and 10ms+ queuing.

Ready to Spec Your Server?

OFFDEDI's dedicated servers are configured with LACP-bonded 10Gbps NICs in PCIe x8 slots, NUMA-optimized placement, and NVMe boot + cache drives. View our server lineup or contact us for a custom build.