Abstract
We consider transfer learning of the nonparametric least squares estimators under covariate shift. While convergence properties of empirical risk minimizers can be conveniently expressed in terms of the associated population risk, to derive bounds for the performance under covariate shift, however, pointwise convergence rates are required. Under weak assumptions on the design distribution, we show that the nonparametric least squares estimator over 1-Lipschitz functions is also minimax rate optimal with respect to a weighted uniform norm, where the weighting accounts in a natural way for the non-uniformity of the design distribution. This implies that although least squares is a global criterion, the LSE adapts locally to the size of the design density. This has several consequences for transfer learning and explicit convergence rates can be obtained for a number of benchmark pairs of source/target densities.