Abstract

At Yahoo, our journey in using sketches in production at large scale started in 2012 with a single sketch. We open-sourced our sketching library in 2015, joined Apache (ASF) in 2018, and became a Top-Level-Project in December, 2020. Over this time we have released about 15 different sketches, many with multiple variations, and in three different languages: Java, C++ and Python. All of these have been engineered to be "binary compatible" across languages, with minimal dependencies to enable straightforward integration across multiple platforms, and comprehensively characterized for performance, accuracy and size. Our mission has been to translate the theoretical work represented by this audience and published in numerous theoretical papers into practical implementations that can be used in production at very large scales. In this talk I would like to share with you some of our insights and what we have learned about what it takes to engineer a sketch for production use at scale.

Video Recording