We study the performance of four magnetohydrodynamic models (BATS-R-US, GUMICS, LFM, OpenGGCM) in the Earth's magnetosphere. Using the Community Coordinated Modeling Center's Run-on-Request system, we compare model predictions with magnetic field measurements of the Cluster, Geotail and Wind spacecraft during a multiple substorm event. We also compare model cross polar cap potential results to those obtained from the Super Dual Auroral Radar Network (SuperDARN) and the model magnetopause standoff distances to an empirical magnetopause model. The correlation coefficient (CC) and prediction efficiency (PE) metrics are used to objectively evaluate model performance quantitatively. For all four models, the best performance outside geosynchronous orbit is found on the dayside. Generally, the performance of models decreases steadily downstream from the Earth. On the dayside most CCs are above 0.5 with CCs for Bx and Bz close to 0.9 for three out of four models. In the magnetotail at a distance of about −130 Earth radii from Earth, the prediction efficiency of all models is below that of using an average value for the prediction with the exception of Bz. Bx is most often best predicted and correlated both on the dayside and the nightside close to the Earth whereas in the far tail the CC and PE for Bz are substantially higher than other components in all models. We also find that increasing the resolution or coupling an additional physics module does not automatically increase the model performance in the magnetosphere.