We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors.
This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-valued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying
convergence results for vector-valued regression to the embedding problem we derive minimax convergence rates which are O(log(n)=n) – compared to current state of the art rates of O(n􀀀1=4) – and are valid under milder and more
intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement
learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition.