What the analysis is:
Millisampler is one among Meta’s newest characterization instruments and permits us to watch, characterize, and debug community efficiency at high-granularity timescales effectively. This light-weight community visitors characterization device for continuous monitoring operates at tremendous, configurable timescales. It collects time collection of ingress and egress visitors volumes, variety of lively flows, incoming ECN marks, and ingress and egress retransmissions. Moreover, Millisampler can be capable of establish in-region visitors and cross-region visitors (longer RTT). Millisampler runs on our server fleet gathering quick, periodic snapshots of this information at 100us, 1ms, and 10ms time granularities, shops it in native disk, and makes it out there for a number of days for on-demand evaluation. Because the information is barely aggregated flow-level header info, it doesn’t comprise any personally identifiable info (PII). Even with the minimal quantity of knowledge it collects, Millisampler information has confirmed very helpful in observe, notably when mixed with present coarser-grained information — we’re capable of see clearly how change buffers or host NICs, for instance, is perhaps unable to deal with the ingress visitors sample.
The way it works:
Millisampler includes userspace code to schedule runs, retailer information, and serve information, and an eBPF-based tc filter that runs within the kernel to gather fine-timescale information. The person code attaches the tc filter and allows information assortment. A tc filter is among the many first programmable steps on the receipt of a packet and close to the final step on transmission. On ingress, which means the eBPF code executes on the CPU core that’s processing the gentle irq (backside half) because the packet is directed towards the proudly owning socket. As a result of processing occurs on many CPU cores, to keep away from locks, we use per-CPU variables, which improve the reminiscence requirement to remove threat of competition. To reduce overhead, we pattern periodically and for brief durations of time. Userspace subsequently configures two parameters in Millisampler: the sampling interval and the variety of samples. We schedule runs with three sampling intervals: 10ms, 1ms, and 100μs, with a set variety of samples to 2,000 for all sampling intervals. Because of this our commentary durations vary from 200ms (100μs sampling fee) to 20s (10ms sampling fee), permitting us to watch occasions at sub-RTT to cross-region RTT time scales, and, on the similar time, repair the reminiscence footprint of every run to 2,000 64-bit counters per CPU core for every worth we measure.
Millisampler collects a wide range of metrics. It computes ingress and egress whole bytes and ingress ECN-marked bytes from the lengths and CE bits of the packets. Millisampler additionally soundsTTLd marked retransmits. Millisampler makes use of a 128-bit sketch to estimate the variety of lively (incoming and outgoing) connections. Utilizing the sketch ends in an approximation of the connection rely that’s exact as much as a dozen connections and saturates at round 500 connections per sampling interval. Though there may be house for extra precision, in observe, greater than the precise variety of connections, the qualitative variation between just a few connections to dozens or lots of of connections has been useful towards figuring out patterns of visitors with extra connections (heavy incast) versus extra visitors with fewer connections.
Why it issues:
Millisampler is a strong device for troubleshooting and efficiency evaluation. Two contrasting community efficiency faults that we solved at Meta in the previous few years relied on our needing a fine-grained view of visitors. The primary drawback featured synchronized visitors bursts at tremendous time scales, and seeing this motivated us to construct and deploy Millisampler to catch it rapidly if it occurred once more. The second, which an early Millisampler prototype helped root-cause, featured a NIC driver bug that precipitated it to cease delivering packets for milliseconds at a time, thereby proving the worth of Millisampler in complicated investigations. Whereas Millisampler (or Millisampler-like information) performed an necessary function in these investigations, it was solely as a part of our wealthy ecosystem of information assortment instruments that observe a dizzying array of metrics throughout hosts and a community.
Past such incidents, Millisampler information has additionally confirmed helpful in characterizing and analyzing visitors traits of providers, permitting us to design and deploy a variety of options to assist enhance their efficiency. For instance, we’ve got been capable of characterize the character of bursts throughout a lot of providers as a way to perceive the depth of incast and tune transport efficiency accordingly. Now we have additionally been in a position to take a look at complicated interactions between short-RTT and long-RTT flows and perceive how bursts of both have an effect on equity for the opposite. In a following put up, we are going to take a look at an extension of Millisampler — Syncmillisampler — the place we run Millisampler synchronously throughout all hosts in a rack and use that information to establish buffer competition within the top-of-rack ASICs.
Learn the complete paper:
Ehab Ghabashneh, Cristian Lumezanu, Raghu Nallamothu, and Rob Sherwood additionally contributed to the design and implementation of Millisampler.