Abstract

The use of random sampling can greatly enhance the scalability of complex data analysis tasks. Samples serve as versatile summaries that can be applied directly or integrated as a component in the data analysis process. In this talk, I will review some of my favorite big ideas in the design and applications of weighted and coordinated sampling schemes.  The tutorial will particularly emphasize algorithmic simplicity and practicality and the context of streamed or distributed data.