Andrés Felipe Barrientos (Duke University)
Many data producers seek to provide users access to confidential data without unduly compromising data subjects' privacy and confidentiality. One general strategy requires users to perform analyses without seeing the confidential data; for example, analysts only get access to synthetic data or query systems that provide disclosure-protected outputs of statistical models. With synthetic data or redacted outputs, the analyst never really knows how much to trust the resulting findings. If users perform the same analysis using the confidential and synthetic data, would regression coefficients of interest be statistically significant in both analyses? Do regression coefficients from both analyses fall within given intervals? We present algorithms for addressing these questions while satisfying differential privacy. We describe conditions under which some of the algorithms provide adequate answers. We illustrate the properties of the proposed methods using artificial and genuine data.