Abstract
I will discuss our experience in learning vector representations of sentences that reflect paraphrastic similarity. That is, if two sentences have similar meanings their vectors should have high cosine similarity. Our goal is to produce a function that can be used to embed any sentence for use in downstream tasks, analogous to how pretrained word embeddings are currently used by practitioners in a broad range of applications. I'll describe our experiments in which we train on large datasets of noisy paraphrase pairs and test our models on standard semantic similarity benchmarks. We consider a variety of functional architectures, including those based on averaging, long short-term memory, and convolutional networks. We find that simple architectures are easier to train and exhibit more stability when transferring to new domains, though they can be beaten by more powerful architectures when sufficient tuning and aggressive regularization are used.