## Which Factors are Risk Factors in Asset Pricing? A Model Scan Framework

A key question for understanding the cross-section of expected returns of equities is the following: which factors, from a given collection of factors, are risk factors, equivalently, which factors are in the stochastic discount factor (SDF)? Though the SDF is unobserved, assumptions about which factors (from the available set of factors) are in the SDF restricts the joint distribution of factors in specific ways, as a consequence of the economic theory of asset pricing. A different starting collection of factors that go into the SDF leads to a different set of restrictions on the joint distribution of factors. The conditional distribution of equity returns has the same restricted form, regardless of what is assumed about the factors in the SDF, as long as the factors are traded, and hence the distribution of asset returns is irrelevant for isolating the risk-factors. The restricted factors models are distinct (nonnested) and do not arise by omitting or including a variable from a full model, thus precluding analysis by standard statistical variable selection methods, such as those based on the lasso and its variants. Instead, we develop what we call a *Bayesian model scan* strategy in which each factor is allowed to enter or not enter the SDF and the resulting restricted models (of which there are 114,674 in our empirical study) are simultaneously confronted with the data. We use a Student-t distribution for the factors, and model-specific independent Student-t distribution for the location parameters, a training sample to fix prior locations, and a creative way to arrive at the joint distribution of several other model-specific parameters from a single prior distribution. This allows our method to be essentially a scaleable and tuned-black-box method that can be applied across our large model space with little to no user-intervention. The model marginal likelihoods, and implied posterior model probabilities, are compared with the prior probability of 1/114,674 of each model to find the best-supported model, and thus the factors most likely to be in the SDF. We provide detailed simulation evidence about the high finite-sample accuracy of the method. Our empirical study with 13 leading factors reveals that the highest marginal likelihood model is a Student-t distributed factor model with 5 degrees of freedom and 8 risk factors.