Causality on longitudinal data: Stable specification search in constrained structural equation modeling

the Alzheimer’s Disease Neuroimaging Initiative

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.
Original languageEnglish
Pages (from-to)3814-3834
JournalStatistical Methods in Medical Research
Volume27
Issue number12
DOIs
Publication statusPublished - 2018
Externally publishedYes

Cite this

the Alzheimer’s Disease Neuroimaging Initiative. / Causality on longitudinal data: Stable specification search in constrained structural equation modeling. In: Statistical Methods in Medical Research. 2018 ; Vol. 27, No. 12. pp. 3814-3834.
@article{993137e354544f968b8c1e8d7a5cc334,
title = "Causality on longitudinal data: Stable specification search in constrained structural equation modeling",
abstract = "A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.",
author = "{the Alzheimer’s Disease Neuroimaging Initiative} and Ridho Rahmadi and Perry Groot and {van Rijn}, {Marieke H. C.} and {van den Brand}, {Jan A. JG} and Marianne Heins and Hans Knoop and Tom Heskes",
year = "2018",
doi = "10.1177/0962280217713347",
language = "English",
volume = "27",
pages = "3814--3834",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "12",

}

Causality on longitudinal data: Stable specification search in constrained structural equation modeling. / the Alzheimer’s Disease Neuroimaging Initiative.

In: Statistical Methods in Medical Research, Vol. 27, No. 12, 2018, p. 3814-3834.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Causality on longitudinal data: Stable specification search in constrained structural equation modeling

AU - the Alzheimer’s Disease Neuroimaging Initiative

AU - Rahmadi, Ridho

AU - Groot, Perry

AU - van Rijn, Marieke H. C.

AU - van den Brand, Jan A. JG

AU - Heins, Marianne

AU - Knoop, Hans

AU - Heskes, Tom

PY - 2018

Y1 - 2018

N2 - A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.

AB - A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85040708221&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/28657454

U2 - 10.1177/0962280217713347

DO - 10.1177/0962280217713347

M3 - Article

VL - 27

SP - 3814

EP - 3834

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 12

ER -