Home > Research > Publications & Outputs > Characterising a grid site's traffic
View graph of relations

Characterising a grid site's traffic

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Publication date2010
Host publicationHPDC '10 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Place of PublicationNew York
PublisherACM
Pages707-716
Number of pages10
ISBN (print)978-1-60558-942-8
<mark>Original language</mark>English
EventThe Third International Workshop on Data Intensive Distributed Computing (DIDC'10) - Chicago, IL, USA
Duration: 1/01/1900 → …

Conference

ConferenceThe Third International Workshop on Data Intensive Distributed Computing (DIDC'10)
CityChicago, IL, USA
Period1/01/00 → …

Conference

ConferenceThe Third International Workshop on Data Intensive Distributed Computing (DIDC'10)
CityChicago, IL, USA
Period1/01/00 → …

Abstract

Grid computing has been widely adopted for intensive high performance computing. Since grid resources are distributed over complex large-scale infrastructures, understanding grid site data traffic behaviour is important for efficient resource utilisation, performance optimisation, and the design of future grid sites as well as traffic-aware grid applications. In this paper, we study and analyse the traffic generated at a grid site in the Large Hadron Collider (LHC) Computing Grid (LCG). We find that most of the generated traffic is TCP-based and that a small set of grid applications generate significant amounts of the data. Upon analysing the different traffic metrics, we also find that the traffic exhibits long-range dependence and self-similarity. We also investigate packet-level metrics such as throughput, packet rate, round trip time (RTT) and packet loss. Our study establishes that these metrics can be well represented by Gaussian mixture models. The findings we present in this paper will enable accurate grid site traffic monitoring and potentially on-the-fly traffic modelling and prediction. It will also lead to a better understanding of grid site’s traffic behaviour and contribute to more efficient grid site planning, traffic management, data transmission protocol optimisation, and data-aware grid application design.Grid computing has been widely adopted for intensive high performance computing. Since grid resources are distributed over complex large-scale infrastructures, understanding grid site data traffic behaviour is important for efficient resource utilisation, performance optimisation, and the design of future grid sites as well as traffic-aware grid applications. In this paper, we study and analyse the traffic generated at a grid site in the Large Hadron Collider (LHC) Computing Grid (LCG). We find that most of the generated traffic is TCP-based and that a small set of grid applications generate significant amounts of the data. Upon analysing the different traffic metrics, we also find that the traffic exhibits long-range dependence and self-similarity. We also investigate packet-level metrics such as throughput, packet rate, round trip time (RTT) and packet loss. Our study establishes that these metrics can be well represented by Gaussian mixture models. The findings we present in this paper will enable accurate grid site traffic monitoring and potentially on-the-fly traffic modelling and prediction. It will also lead to a better understanding of grid site’s traffic behaviour and contribute to more efficient grid site planning, traffic management, data transmission protocol optimisation, and data-aware grid application design.