We examine different conceptions of land surface model benchmarking and illustrate the importance of internationally standardized evaluation experiments that specify data sets, variables, metrics and model resolutions. We additionally show how essential the definition of a priori expectations of model performance can be, based on the complexity of a model and the amount of information being provided to it, and give an example of how these expectations might be quantified. Finally, we introduce the Protocol for the Analysis of Land Surface models (PALS), a free, online land surface model benchmarking application, and show how it is structured to meet both of these goals.