Mikkel Thorup (University of Copenhagen)
Calvin Lab Auditorium
Hash functions are used everywhere in computing. Originally, hashing was invented to store and look up elements in computer memory, but more recently it has become a crucial component in the analysis of Big Data applications.
We think of a hash function as a random function, assigning random hash values to elements; thus if we use hash values to place elements in computer memory then we expect them to spread out nicely. At the same time, the hash function has to be fixed as soon as we start using it; otherwise can't find the elements again. In the context of Big Data, we use hash functions to create small sketches of large sets, allowing us to quickly measure how similar they are, and if they are similar, what exactly are the differences. The sets can be processed independently as long as we use the same hash function to create the sketch.
I will talk about some of these applications of random hash functions, about the probabilistic properties they are required to have, and about how they can be generated to work efficiently on huge data sets.
Light refreshments will be served before the lecture at 3:30 p.m.