Choosing Dedicated Hardware for High-Bandwidth Applications

Not all bare metal is equal when you need sustained wire-speed throughput. The spec sheet rarely tells the full story — a server with a 10Gbps NIC can still bottleneck at 4 Gbps due to PCIe lane starvation or NUMA locality mismatches. This guide explains what to look for.

1. The NIC Is Only Part of the Story

A 10Gbps NIC plugged into a PCIe x4 slot is limited to ~3.2 Gbps of actual throughput (PCIe 3.0 x4 = 32 Gbps bidirectional, but shared). For a 10Gbps NIC operating near line rate, you need a PCIe x8 slot minimum, or PCIe 4.0 x4.

PCIe Config	Theoretical BW	Usable for 10GbE
PCIe 3.0 x4	32 Gbps	Marginal (shared)
PCIe 3.0 x8	64 Gbps	Yes
PCIe 3.0 x16	128 Gbps	Yes (for 25/40GbE)
PCIe 4.0 x8	128 Gbps	Yes (for 25GbE+)

2. NIC Bonding for Redundancy and Throughput

Dual NICs in a bonded configuration provide both failover redundancy and, in active-active mode, doubled throughput. Linux kernel bonding modes relevant for server hosting:

Mode 1 (active-backup) — one NIC active, second takes over on failure. Zero throughput gain, maximum reliability.
Mode 4 (802.3ad LACP) — requires switch support. Distributes traffic across both NICs. Real throughput gain depends on flow hashing — a single TCP flow can still only use one NIC.
Mode 6 (balance-alb) — no switch requirement. TX load balances across NICs, RX uses ARP negotiation. Works without LACP-capable switch.

For streaming use cases, Mode 4 (LACP) with xmit_hash_policy=layer3+4 distributes flows across both NICs based on source/destination IP+port — maximizing actual throughput for many concurrent viewer connections.

3. NUMA Topology and Why It Matters

Modern multi-socket servers have Non-Uniform Memory Access (NUMA) topology — each CPU socket has local memory with low latency and remote memory with high latency. If your NIC is on PCIe lanes connected to Socket 0, and your application runs on Socket 1 cores, every packet crosses the QPI/UPI interconnect — adding 40–80ns of latency and consuming interconnect bandwidth.

Checking NUMA affinity:

On Linux: lspci -vv | grep -A2 "Ethernet" shows the NIC's NUMA node. numactl --hardware shows your topology. Pin your application to the same NUMA node as your NIC using numactl --cpunodebind=0 --membind=0.

4. CPU Architecture for Network-Heavy Workloads

For pure forwarding and streaming (minimal compute), core count matters more than single-core performance. For transcoding + forwarding, single-core performance matters too.

Workload	Priority	Recommended
Pure HLS serving / proxy	Core count, I/O	AMD EPYC (many PCIe lanes)
Live transcoding	AVX-512, IPC	Intel Xeon Scalable (Ice Lake+)
Mixed transcoding + serving	Balanced	AMD EPYC 7003+
High connection count	Core count + RAM	Any modern Xeon/EPYC with 64GB+

5. Storage: NVMe vs SATA for Segment Caching

HLS segment files are small (typically 500KB–6MB each). The bottleneck is IOPS, not sequential throughput. SATA SSDs deliver ~80K IOPS; NVMe SSDs deliver 500K–1M IOPS. For a busy edge cache serving thousands of concurrent segment requests, NVMe is not optional — it's the difference between sub-millisecond segment reads and 10ms+ queuing.

Ready to Spec Your Server?

OFFDEDI's dedicated servers are configured with LACP-bonded 10Gbps NICs in PCIe x8 slots, NUMA-optimized placement, and NVMe boot + cache drives. View our server lineup or contact us for a custom build.

1. The NIC Is Only Part of the Story

2. NIC Bonding for Redundancy and Throughput

3. NUMA Topology and Why It Matters

Checking NUMA affinity:

4. CPU Architecture for Network-Heavy Workloads

5. Storage: NVMe vs SATA for Segment Caching

Ready to Spec Your Server?

Related Articles

How to Build a Production-Grade Streaming Server

IPTV Infrastructure at Global Scale