Pingmesh

Chuanxiong Guo, Hua Chen, Zhi-Wei Lin, Varugis Kurien, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang
2015 Computer communication review  
Can we get network latency between any two servers at any time in large-scale data center networks? The collected latency data can then be used to address a series of challenges: telling if an application perceived latency issue is caused by the network or not, defining and tracking network service level agreement (SLA), and automatic network troubleshooting. We have developed the Pingmesh system for largescale data center network latency measurement and analysis to answer the above question
more » ... irmatively. Pingmesh has been running in Microsoft data centers for more than four years, and it collects tens of terabytes of latency data per day. Pingmesh is widely used by not only network software developers and engineers, but also application and service developers and operators. In this paper we use the term "network latency" from application's point of view. When an application A at a server sends a message to an application B at a peer server, the network latency is defined as the time interval from the time A sends the message to the time B receives the message. In practice we measure roundtrip-time (RTT) since RTT measurement does not need to synchronize the server clocks.
doi:10.1145/2829988.2787496 fatcat:tsfgh5c2jvf2bd3hc4chc7etpu