Not all bare metal is equal when you need sustained wire-speed throughput. The spec sheet rarely tells the full story — a server with a 10Gbps NIC can still bottleneck at 4 Gbps due to PCIe lane starvation or NUMA locality mismatches. This guide explains what to look for.
1. The NIC Is Only Part of the Story
A 10Gbps NIC plugged into a PCIe x4 slot is limited to ~3.2 Gbps of actual throughput (PCIe 3.0 x4 = 32 Gbps bidirectional, but shared). For a 10Gbps NIC operating near line rate, you need a PCIe x8 slot minimum, or PCIe 4.0 x4.
| PCIe Config | Theoretical BW | Usable for 10GbE |
|---|---|---|
| PCIe 3.0 x4 | 32 Gbps | Marginal (shared) |
| PCIe 3.0 x8 | 64 Gbps | Yes |
| PCIe 3.0 x16 | 128 Gbps | Yes (for 25/40GbE) |
| PCIe 4.0 x8 | 128 Gbps | Yes (for 25GbE+) |
2. NIC Bonding for Redundancy and Throughput
Dual NICs in a bonded configuration provide both failover redundancy and, in active-active mode, doubled throughput. Linux kernel bonding modes relevant for server hosting:
- Mode 1 (active-backup) — one NIC active, second takes over on failure. Zero throughput gain, maximum reliability.
- Mode 4 (802.3ad LACP) — requires switch support. Distributes traffic across both NICs. Real throughput gain depends on flow hashing — a single TCP flow can still only use one NIC.
- Mode 6 (balance-alb) — no switch requirement. TX load balances across NICs, RX uses ARP negotiation. Works without LACP-capable switch.
For streaming use cases, Mode 4 (LACP) with xmit_hash_policy=layer3+4 distributes flows across both NICs based on source/destination IP+port — maximizing actual throughput for many concurrent viewer connections.
3. NUMA Topology and Why It Matters
Modern multi-socket servers have Non-Uniform Memory Access (NUMA) topology — each CPU socket has local memory with low latency and remote memory with high latency. If your NIC is on PCIe lanes connected to Socket 0, and your application runs on Socket 1 cores, every packet crosses the QPI/UPI interconnect — adding 40–80ns of latency and consuming interconnect bandwidth.
Checking NUMA affinity:
On Linux: lspci -vv | grep -A2 "Ethernet" shows the NIC's NUMA node. numactl --hardware shows your topology. Pin your application to the same NUMA node as your NIC using numactl --cpunodebind=0 --membind=0.
4. CPU Architecture for Network-Heavy Workloads
For pure forwarding and streaming (minimal compute), core count matters more than single-core performance. For transcoding + forwarding, single-core performance matters too.
| Workload | Priority | Recommended |
|---|---|---|
| Pure HLS serving / proxy | Core count, I/O | AMD EPYC (many PCIe lanes) |
| Live transcoding | AVX-512, IPC | Intel Xeon Scalable (Ice Lake+) |
| Mixed transcoding + serving | Balanced | AMD EPYC 7003+ |
| High connection count | Core count + RAM | Any modern Xeon/EPYC with 64GB+ |
5. Storage: NVMe vs SATA for Segment Caching
HLS segment files are small (typically 500KB–6MB each). The bottleneck is IOPS, not sequential throughput. SATA SSDs deliver ~80K IOPS; NVMe SSDs deliver 500K–1M IOPS. For a busy edge cache serving thousands of concurrent segment requests, NVMe is not optional — it's the difference between sub-millisecond segment reads and 10ms+ queuing.
Ready to Spec Your Server?
OFFDEDI's dedicated servers are configured with LACP-bonded 10Gbps NICs in PCIe x8 slots, NUMA-optimized placement, and NVMe boot + cache drives. View our server lineup or contact us for a custom build.