Quantitative modeling of risk and hazard from flooding involves decisions regarding the choice of model and goal of the modeling exercise, expressed by some measure of performance. This paper shows how the subjectivity in the choices of performance measures and observation sets used for model calibration inevitably results in variability in the estimation of flood hazard. We compare the predictions of a 2D flood inundation model obtained using different global and local evaluation criteria. It is shown that traditional area averaging performance measures are inadequate in the face of model imperfection, especially when such models are calibrated for flood hazard studies. In this study we include flood risk weighting into the performance measure of the model. This allows us to calibrate the model to places that are important, e.g. location of houses. The quantification of the importance of places requires the necessity of engaging stakeholders into the model calibration process.