This paper studies optimal motion planning of multiple mobile robots with collision avoidance. We develop a distributed reinforcement learning algorithm which ensures suboptimal goal reaching and anytime collision avoidance simultaneously. Theoretical results on the convergence of neural network weights, the uniform and ultimate boundedness of system states of the closed-loop system, and anytime collision avoidance are established. Numerical simulations for single integrator and unicycle robots illustrate the effectiveness of our theoretical results.