The Need for Online and Offline Speed When Analyzing the Biggest Scientific Dataset: Particle Physics at the Large Hadron Collider
Many thousand-person, multi-billion dollar experiments at the Large Hadron Collider (LHC) are generating enormous datasets to investigate the smallest distance scales of nature. We are currently preparing to increase the data rate by a factor of 10 over the next 10 years to fully exploit the power of the LHC. This poses significant challenges at all stages of data taking and analysis. Online algorithms must be fast, simple, and robust due to hardware and radiation constraints. Enormous simulated datasets are also required to perform inference on the data; their generation speed is becoming a limiting factor. Our current computing and simulation models will not scale to the this high-luminosity LHC era and will thus require heavy use of HPCs and other solutions. Modern machine learning will continue to play a growing and critical role in all aspects of this work, including accelerating simulation with techniques like Generative Adversarial Networks (GANs). Integrating modern tools into aging workflows is a significant challenge. This talk will review the timescales involved in the various parts of the LHC workflow and how we are trying to cope with the big data (rate) challenge.
Anyone who would like to give one of the weekly seminars on the RTDM program can fill in the survey at https://goo.gl/forms/Li5jQ0jm01DeYZVC3.