Causal Factor Investing

Shih-hung Wang
Sep 9
14 min read

Updated: Sep 10

Shih-Hung Wang & Lorena Rodriguez Pineda [with guidance from Milind Sharma]*

Both authors are MSCF candidates at Carnegie Mellon and would like to thank Matvei Lukianov for editorial input.

Introduction

Marcos López de Prado presents a comprehensive critique of the current state of factor investing. In his work "Causal Factor Investing: Can Factor Investing Become Scientific?", first published online in October 2023, he argues that it largely operates in a phenomenological stage rather than as a truly scientific discipline. His primary criticism revolves around the denial or ignorance of the causal content inherent in factor models, leading to findings that are likely false due to various methodological errors. The key points of his critique are:

Conflation of Association and Causation
1. Associational Claims vs. Causal Content
  Factor investing literature predominantly makes associational claims, focusing on correlations, rather than identifying the underlying causal mechanisms. However, factor models inherently possess causal content, as their specifications and estimation methods imply directed relationships (e.g., X causes Y) rather than just co-dependence.
2. Misunderstanding of Econometric Models Many econometric textbooks and practitioners, including factor researchers, mistake causality for association and misunderstand the meaning of regression coefficients.
3. Misuse of Granger Causality While Granger causality is a widely cited concept in econometrics for investigating causal relations in time series, it relies on a prediction error reduction approach. However, López de Prado notes that it's often misused as a definitive causal statement when it's primarily a tool for directional dependence between unconfounded variables, and more sophisticated causal discovery methods exist.
Absence of Causal Theory and Falsifiable Mechanisms
1. Lack of "Why": Authors typically do not identify the causal graph consistent with observed phenomena, justify model specifications based on correlations, or propose experiments to falsify causal mechanisms. This means factor investing studies fail to explain why observed phenomena occur, leaving investors without a scientific basis for understanding performance. While most factor-investing studies rely on correlations and lack causal justification, several notable exceptions challenge this critique. For example, Lettau and Ludvigson (2001) link factor returns to macroeconomic risk through the consumption–wealth ratio, offering a structural explanation for time-varying risk-premia. Behavioral models by Barberis et al. (1998) and Daniel et al. (1998) propose testable psychological mechanisms behind momentum and reversal effects. More recently, Gu, Kelly, and Xiu (2020) apply machine learning in ways that can support causal interpretation when combined with economic theory.
2. Unfalsifiable Explanations: Proposed economic rationales for factors often lack the rigor of scientific theories because they don't declare causal relationships, elucidate ideal interventional studies, or propose methods to estimate causal effects from observational data, rendering them experimentally unfalsifiable. Finance isn't an exact science, and markets are influenced by complex, adaptive behavior, examples stated in a) show that it is still possible to build theories that go beyond descriptive correlations, offering clear predictions that can be evaluated and potentially refuted through observation.
Proliferation of Spurious Claims (False Discoveries)
López de Prado categorizes spurious claims into two types, which prevent factor investing from advancing scientifically.
1. Type-A Spuriosity (False Association / Noise mistaken for Signal)
  1. P-Hacking
    Researchers make numerous subjective decisions when building models (e.g., data cleaning, variable choice, dates) and often run multiple regressions, reporting minimal p-values without adjusting for selection bias. This practice is compounded by publication bias, leading to many claimed findings being likely false and the results not being replicable out-of-sample.
  2. Backtest Overfitting Historical simulations (backtests) are presented as evidence of causal effects but are neither controlled experiments nor natural experiments. It is trivial to overfit a backtest through selection bias, making it hard to distinguish signal from noise.
2. Type-B Spuriosity (True but Noncausal Association / Incorrect Specification Choices)
  1. Under-Controlling (Missing Confounders) Omitting relevant confounding variables biases factor estimates, leading to incorrect performance attribution, poor risk management, and the appearance of time-varying risk-premia, misleading investors.
  2. Over-Controlling (Controlling for Mediators or Colliders) Including variables that should not be controlled for (like mediators or colliders) distorts total effects mistakenly opens noncausal paths of association, biasing estimates and leading to time-varying risk-premia. This is the "deadliest sin in causal inference".
  3. Specification-Searching Choosing model specifications based on explanatory power (e.g., higher R-squared - an associational concept) rather than on a pre-defined causal graph specified causal graph leads to factor mirages.
Poor Performance and Lack of Transparency
1. Disappointing Out-of-Sample Performance: Despite academic claims, broad multi-factor indices have shown statistically insignificant Sharpe ratios over long periods, especially when considering transaction costs and fees. The QUMN TMX index is a counterexample to this claim. It delivers extraordinary performance (gross of fees), finishing at just over three times the initial value, while benefiting from interest earned on the short proceeds and despite incurring transaction costs. Broad, rules-based multi-factor indexes (e.g., MSCI Diversified Multiple-Factor, S&P Quality-Value-Momentum) have delivered positive long-run Sharpe ratios in multiple regions. It is therefore too strong to state that such indexes have “statistically insignificant Sharpe ratios over long periods.”
2. "Black-Box" Nature: The lack of declared causal graphs and mechanisms makes factor investing strategies "black-boxes". This contradicts scientific principles and fiduciary duties. A direct counterexample would be vanilla Multi-Factor-Models, like QMIT’s Val+Mom (GARP). These are clearly not black boxes since the ingredients are completely transparent and the performance rather easy to explain based on the underperformance of value. One straightforward explanation is that the zero-interest rate policy (ZIRP) and subsequent aggressive fiscal stimulus during the COVID period encouraged investors to seek higher yields thereby fueling speculative behavior and risk-taking in markets. Indeed, vanilla multi-factor models (e.g., value + momentum + quality) implemented by index providers which publish transparent rules and weights are interpretable even without explicit causal graphs. Further, U.S. fiduciary standards require advisers to act in clients’ best interest and provide full and fair disclosure. They do not mandate causal diagrams per se.

Overall, although interesting perspectives have been presented in favor of a causal framework, there is limited practical value and insufficient justification for requiring factor investing to adopt a strictly causal approach. Given the probabilistic, adaptive nature of financial markets, predictive usefulness and economic intuition often carry more weight than formal causal identification. For many investors, the ability to generate robust, repeatable signals is more important than establishing cause-and-effect relationships.

Pearl’s Causality

López de Prado's perspective aligns with Pearl’s causal framework. Granger causality and Pearl's causality both represent fundamentally different theories of causation. Granger causality is inherently predictive and model-dependent, focusing on temporal precedence rather than any deep causal mechanism. In contrast, Pearl's approach uses structural causal models (SCMs) and directed acyclic graphs (DAGs) to represent objective causal relationships and supports counterfactual reasoning. Unlike Granger, Pearl’s causality is interventionist and realist, concerned with identifying the effect of deliberately changing a variable rather than merely observing statistical associations over time.

A variable X is considered a cause of a variable Y if Y in any way relies on X for its value, implying a form of "listening" where Y decides its value based on what it "hears" from X. More formally, X is a direct cause of Y if X appears in the function that assigns Y's value; X is a cause of Y if it is a direct cause of Y, or of any cause of Y. The formal method to rigorously define causal assumptions and modeling how nature assigns values to variables and how they interact is the Structural Causal Model (SCM). Every SCM is linked to a graphical causal model (often a directed acyclic graph or DAG), where nodes represent variables and directed edges represent functional dependencies. For example, If Y is a child of X in a graph, X is a direct cause of Y; if Y is a descendant of X, X is a potential cause of Y.

Confounders and Colliders

Two main motivations for incorporating causal inference into traditional factor investing are to address confounding biases and collider biases, both of which can lead, according to López de Prado, to incorrect inferences about the relationship between factors and returns when relying solely on association.

In this research, we focus on causal discovery on different Enhanced Smart Betas (ESB) used by QMIT. Causal discovery, also referred to as causal learning or structure learning, is the process of inferring causal structures from observational data.

To further explore causal relationships among ESB and other variables we utilized the online tool DAGitty (Texter et al., 2016). This tool requires the selection of exposures and outcome variables which we also change according to the model specifications we want to test. We ran the Peter Clark (PC) algorithm on 11 variables, then we utilized DAGitty to estimate the causal effect of ten variables on the outcome val_mom_C.

A confounder is a variable that is a cause of both an independent variable (treatment) and the dependent variable (outcome). In the resulting causal graph (Figure 1), the confounders are Analyst Ratings & Targets (ART), Price Momentum (PMOM), and Relative Value (RV). These also have a direct causal path to val_mom_C

Confounding bias refers to the error or distortion in estimating the true causal effect between two variables, typically a treatment (X) and an outcome (Y), when their observed statistical association is influenced by other extraneous factors. This bias tends to "confound our reading" and distort the estimate of the effect being studied. In our case the paths are correctly accounted for, so there are no "spurious associations" or "backdoor paths" between covariates and val_mom_C.

Generally, a collider is a node with two arrows pointing into it (e.g., X→Z←Y). Conditioning on a collider or its descendants opens a non-causal path and induces bias. In our graph, nodes such as [ARS/ART/SIZE/DIV/DV] were flagged as colliders; avoiding adjustment for these nodes (and their descendants) prevents collider bias. Note that collider parents are independent only absent other open paths - a property that must be verified in the full DAG. None of these are downstream of the dependent Value Momentum composite variable (since val_mom_C which has no descendants) so there are no spurious paths or collider bias in our example.

Collider bias arises when one conditions on this collider or its descendants. This conditioning can create a spurious (non-causal) association between two variables that were otherwise independent, leading to biased estimates of causal effects.

Typically, the parents of a collider are unconditionally independent of each other (i.e., X and Y are independent in X → Z ← Y if there are no other paths between them), for example MOM and RV and DV as the collider in our example. This means that knowing the value of one parent (PMOM) tells you nothing about the value of the other parent (DV).

We can also identify different mediators in our example as per Figure 1 below. A variable Z is a mediator when it lies on the causal path between X and Y, meaning X causes Z, and Z causes Y (X → Z → Y). For example, ART is a mediator in the causal path from Historical Growth (GROH) → ART → val_mom_C and RISK → ART → Value Momentum composite (val_mom_C).

Additionally, there are common causes in the system. For instance, MOM is a confounder for DV, ART, ARS and the Value Momentum composite (val_mom_C). Identifying and adjusting for such common causes is crucial toward avoiding bias in causal estimates.

While effective, the standard PC algorithm can fail in the presence of hidden (unobserved) variables, such as SIRF and QUAL. In light of the multi-collinearity documented in Chapter 3.8, we choose to omit the 6 ESBs components of QUAL which may well and likely be confounders in this picture; but we did so deliberately in order to get a cleaner DAG. In the upcoming section, we provide a technical description of both the PC and LiNGAM algorithms.

Figure 1. Estimating causal graph with PC algorithm. Ten covariates and outcome val_mom_C

PC Algorithm and LiNGAM Model

We consider two widely used algorithms for causal discovery, Peter-Clark (PC) which is an independence-based method and Linear Non-Gaussian Acyclic Models (LiNGAM) which is a more recent approach to exploit information about the joint distribution beyond conditional independencies. The functionality of these methods rests on key assumptions about the relationship between graphical structures and probability distributions.

PC Algorithm

The primary goal of the PC algorithm is to estimate the correct Markov equivalence class of the underlying Directed Acyclic Graph (DAG), that is Partially Directed Graph (PDAG) compatible with the observed conditional independences. This means it aims to identify graphical structures that represent the same set of conditional independences.

Assumptions

It assumes that the observed data distribution is Markovian and faithful with respect to the underlying DAG.

The Markov condition relates conditional independencies in the distribution to graphical separation statements (d-separation) in the graph.
Faithfulness implies a one-to-one correspondence between d-separation statements in the graph and conditional independence statements in the distribution.

Mechanism

Start with a fully connected undirected graph (Figure 2)
Step-by-step removes edges by checking for conditional independence between pairs of variables. For each pair of nodes (X, Y), it searches for a set of other variables (A) that d-separates them (i.e., renders them conditionally independent given A). If such a set is found, the edge between X and Y is removed. The size of the conditioning set A is incrementally increased, starting from zero (order-0).
By applying the Fisher z-test we get:
1. Order-0 (Marginal independence)
  1. mom_C and val_C — p-value = 0.131 → Fail to reject H₀ → likely independent, edge removal. Even though mom_C and val_mom_C are negatively correlated, the p-value is not statistically significant to reject the null hypothesis.
  2. QUMN and val_C — p-value ≈ 1e-118 → Reject H₀ → dependent
  3. QUMN and mom_C — p-value ≈ 1e-49 → Reject H₀ → dependent
2. Order-1 (Conditional independence)
  1. val_C ⊥ mom_C | QUMN — p-value ≈ 1e-31 → Reject H₀ → dependent
  2. QUMN ⊥ mom_C | val_C — p-value ≈ 1e-78 → Reject H₀ → dependent
  3. QUMN ⊥ val_C | mom_C — p-value ≈ 1e-148 → Reject H₀ → dependent
After identifying adjacencies based on conditional independencies, it orients edges where possible to form a partially directed graph as per Figure 12.2 below. PC looks at the triplet {mom_C, QUMN, val_C}, finding that mom_C and val_C are not adjacent. Then, the PC checks whether the middle node was in the separating set used to remove the edge from mom_C to val_C. Since mom_C and val_C were found to be independent without conditioning (empty set), the separator did not include QUMN. This means we can orient it as a collider.

Figure 2. A simple example using the PC algorithm

LiNGAM Model

LiNGAM stands for Linear Non-Gaussian Acyclic Model. It is a model class and an associated algorithm for causal discovery that specifically exploits the non-Gaussian characteristics of the noise terms. Importantly, LiNGAM always outputs directed edges, identifying the direction of causality between variables. The model aims to distinguish between cause and effect (causal directionality) directly from observational data, even in the bivariate case where traditional conditional independence tests are not useful.

Assumptions

The algorithm is built on three core assumptions

Linearity Each variable is a linear function of its direct causes. While usually not true in real life, it is also an assumption of linear MFMs in factor investing.
Non-Gaussian, independent noise The error terms are mutually independent and follow a non-Gaussian distribution. This assumption is critical for identifying the true causal direction.
Causal sufficiency All common causes of the observed variables are included in the dataset (i.e., no hidden confounders).

Additionally, these core assumptions imply the following:

Causal Markov Condition is automatically satisfied under the assumption of independent noise, which implies that each variable is conditionally independent of its non-descendants given its direct causes. This condition is fundamental for applying do-calculus, as it allows us to factorize the joint distribution into local causal mechanisms. In the context of our LiNGAM model, satisfying the Markov Condition means that causal relationships among the selected variables can be represented and manipulated algebraically, providing the foundation for interventional reasoning.
Faithfulness assumption is automatically satisfied in LiNGAM because non-Gaussian noise prevents path cancellations, avoiding coincidental independence and ensuring that observed statistical dependencies reflect true causal relationships.

These properties make LiNGAM identifiable from the joint distribution and robust in distinguishing X → Y from Y → X.

Mechanism

Core principle – exploiting non-Gaussianity for identifiability Consider a simple linear causal model X → Y with an additive noise term: Y = aX + ε, where ε is an independent noise term. If ε follows a non-Gaussian distribution and is independent of X, then the model is identifiable — meaning we can determine the correct causal direction from observational data alone. In contrast, with Gaussian noise, X → Y and Y → X produce identical statistical dependencies, making the direction unidentifiable without further assumptions.
Independent Component Analysis (ICA) decomposition For multivariate data, LiNGAM assumes: x = Bx + ε, where B encodes the causal structure and ε are independent non-Gaussian noise terms. By rearranging: x = ( I - B )⁻¹ ε. This is a linear ICA model, where the goal is to recover both the mixing matrix ( I - B )⁻¹ and the independent components ε. ICA finds a decomposition such that the recovered components are statistically independent — crucial for identifying the order of variables and causal edges.
Testing residual independence for causal direction In the bivariate case, LiNGAM compares two regression models:
1. Regress Y on X, obtain residuals εY, and check if εY ⊥ X
2. Regress X on Y, obtain residuals εX, and check if εX ⊥ Y
The correct causal direction is the one where the residuals are independent of the predictor. In the multivariate setting, LiNGAM finds an ordering of variables that satisfies the independence condition for all regressions, producing a fully directed acyclic graph.
Why it works The key is that non-Gaussian independent errors break the symmetry of linear regression, allowing one direction to produce independent residuals while the reverse does not. This property holds even with small samples or just two variables, making LiNGAM effective in domains like factor investing, where datasets may have high correlations and limited sample sizes.

Identifiability

Under the assumptions of linearity and non-Gaussian additive noise, the causal graph structure is identifiable from the joint distribution.

Randomness

While LiNGAM provides a principled framework for causal discovery, its output may vary across runs due to the randomness introduced by Independent Component Analysis (ICA). ICA algorithms typically involve iterative optimization methods that rely on random initializations and can converge to different local optima, especially when the data is high-dimensional or weakly non-Gaussian. Consequently, the inferred causal ordering or edge directions may differ across runs, even on the same dataset. To address this, it is common practice to use multiple random seeds, bootstrapping (resampling with replacement), or ensemble methods to evaluate the robustness of the estimated causal structure and identify consistently appearing edges.

LiNGAM Model Results

We first applied the LiNGAM model to the 18 ESBs along with the QUMN EMN hedge fund index as the outcome, as shown in Figure 3. The resulting causal graph looks plausible, suggesting that QUMN was the outcome of a series of causal influences from those ESBs. However, this result was based on a single run and is sensitive to the randomness inherent in ICA. Moreover, the presence of highly collinear variables - such as enmom and mom - violates the model’s assumptions and may compromise the reliability of the inferred causal structure.

To address the issues of randomness and collinearity, we applied a bootstrap approach and narrowed our analysis to QUMN and a subset of eight variables - sirf, rev, size, mom_c, div, val_c, risk, qual_c –for which collinearity was reasonably under control. We performed 1,000 bootstrap trials, generating 1,000 causal graphs using the LiNGAM model. To ensure robustness, we retained only those edges that appeared in more than 70% of the runs, forming the final causal graph.

Figure 4 presents four representative causal graphs generated during the bootstrap procedure, illustrating the randomness introduced by the ICA method. Notably, some trials produced opposite causal directions for the same pair of variables–for example, edges between size and risk appeared in both directions. This highlights the importance of using the bootstrap approach to filter out unstable edges and retain only statistically robust causal relationships.

Figure 3. Example of a LiNGAM causal graph from a single run

Figure 4. Four representative LiNGAM causal graphs from bootstrap trials

Figure 5 summarizes the 1,000 bootstrap trials, retaining only edges that appeared with a frequency greater than 70%. The graph suggests that QUMN is directly influenced by rev, mom_c, val_c, and qual_c, while val_c itself is a common effect (collider) of div and risk. Additionally, qual_c is directly influenced by risk, indicating that risk-related factors may have both direct and indirect effects on QUMN through multiple pathways. In this structure, rev may act as a potential confounder for the relationship between size and QUMN, as it causally affects both. Notably, sirf does not appear in the final graph, indicating that its connections were either weak or unstable across bootstrap trials. Recognizing such colliders and confounders is essential for correctly interpreting the causal pathways and avoiding biased inferences.

Figure 5. LiNGAM causal graph retaining edges with ≥70% bootstrap frequency

Conclusion

In summary, unlike PC algorithms that infer causal structures from conditional independence patterns, LiNGAM exploits the non-Gaussianity of noise in linear models to directly identify causal directions. This property makes it especially powerful for determining cause–effect relationships, even in settings with only two variables or relatively small sample sizes, as demonstrated in Figure 5. In the context of factor investing, such capability is particularly valuable for uncovering directional relationships among factors, helping distinguish true predictive signals from spurious correlations.

References

Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. John Wiley & Sons.
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: Foundations and learning algorithms. The MIT Press.
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.
Wiedermann, W., & von Eye, A. (Eds.). (2016). Statistics and causality: Methods for applied empirical research. John Wiley & Sons.
López de Prado MM. Causal Factor Investing: Can Factor Investing Become Scientific? Cambridge University Press; 2023.
Marcos López de Prado & Vincent Zoonekynd. A Protocol for Causal Factor Investing. ADIA Lab (May 2025)

QMIT.ai

Causal Factor Investing

Introduction

Pearl’s Causality

Confounders and Colliders

PC Algorithm and LiNGAM Model

PC Algorithm

Assumptions

Mechanism

LiNGAM Model

Assumptions

Mechanism

Identifiability

Randomness

LiNGAM Model Results

Conclusion

References

Recent Posts