-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmanuscript_v2.tex
2442 lines (1824 loc) · 293 KB
/
manuscript_v2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\PassOptionsToPackage{svgnames,x11names}{xcolor}
\documentclass[twoside, 11pt]{article}
% general styling
\usepackage[utf8]{inputenc}
\usepackage{setspace}
\usepackage{palatino}
\usepackage{graphicx}
\usepackage{float}
\usepackage{titling} % drop vertical space before the title
\usepackage{multirow}
\usepackage{lscape}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{subcaption}
\usepackage{scalefnt} % for matrices font smaller
\usepackage{mathtools}
\usepackage{blkarray, bigstrut}
\usepackage{setspace}
% specify margin (make it symmetric for both even/odd pages)
\usepackage[
top=1.3in,
bottom=1in,
left=1.2in,
right=1.2in,
bindingoffset=0.1in, %not sure if left&right margins are equal.?
heightrounded,
]{geometry}
\fontfamily{ppl}\selectfont
\usepackage[american]{babel}
% inhibit footnotes being splitted
\interfootnotelinepenalty=10000
% linespace
\renewcommand{\baselinestretch}{1.5}
% BIBLIOGRAPHY
\usepackage{natbib}
% fix "&" to "and" in citet
\usepackage{xstring}
\makeatletter
\let\origNAT@nmfmt\NAT@nmfmt
\def\NAT@nmfmt#1{%
\unless \ifNAT@swa
\exploregroups\expandarg
\StrSubstitute{#1}{ \& }{ and }[\mynatfmttemp]%
\else
\def\mynatfmttemp{#1}%
\fi
\origNAT@nmfmt{\mynatfmttemp}%
}
\makeatother
% \bibliographystyle{apalike}
% \bibliographystyle{abbrvnat} %doi shows but the style is messed up
% \setcitestyle{authoryear}
\setlength{\bibsep}{-0pt} % no line space
\setlength\bibhang{0.5in} % indent
\bibliographystyle{kyuriapa}
% \usepackage[natbibapa]{apacite} % to enable '\citet' and '\citep' macros
% for reference list doi styling
% \newcommand*{\doi}{}
% \makeatletter
% \newcommand{\doi@}[1]{\urlstyle{same}\url{https://doi.org/#1}}
% \DeclareRobustCommand{\doi}{\hyper@normalise\doi@}
% \makeatother
% for header in title page
\usepackage{fancyhdr}
\pagestyle{fancy}
% set even/odd page header
\fancyhf{}
\setlength{\headheight}{14pt}
\fancyhf[EHR]{PARK, WALDORP, AND RYAN}
\fancyhf[OHL]{DISCOVERING CYCLIC CAUSAL MODELS}
\fancyhf[EHL,OHR]{\thepage}
% first page specific
\renewcommand{\headrulewidth}{0.7pt}
\fancypagestyle{firstpage}{%
\lhead{\large{SPECIAL ISSUE ON NETWORK PSYCHOMETRICS}}
\setlength{\headheight}{14pt}
\renewcommand{\headrulewidth}{1.2pt}
\rhead{}
}
\usepackage{txfonts} %for edge-endpoints
\usepackage{tikz}
\usetikzlibrary{arrows.meta}
% \usepackage[capposition=bottom]{floatrow}
\usepackage[format=plain,
labelfont={it}, singlelinecheck=false,
justification = raggedright,
labelsep = period,
figureposition=top]{caption}
\usepackage{algorithm}
\usepackage{algcompatible}
\usepackage{algpseudocode}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{mathtools}
% for appendix title
\usepackage[title]{appendix}
% to remove space between enumerate items
\usepackage{enumitem}
% to specify a different margin for a page
\usepackage{afterpage}
% for table
\usepackage{tabularx, multirow,booktabs,setspace}
% for captions
\usepackage{caption, ragged2e}
\captionsetup{justification=RaggedRight}
\usepackage{tabularray}
%\usepackage[hmargin=1cm]{geometry}
% for adding affiliation
\usepackage{authblk}
% for the adjustwidth environment
\usepackage{changepage}
% for space between text and footnote
\setlength{\skip\footins}{1.2pc plus 5pt minus 2pt}
% for better spacing in reference
\usepackage[hyphens,spaces,obeyspaces]{url}
% hyperlink
\usepackage{hyperref}
\hypersetup{
colorlinks=true,
linkcolor= MidnightBlue,
filecolor= MidnightBlue,
urlcolor= MidnightBlue,
pdftitle={Research Report Kyuri Park},
citecolor= MidnightBlue,
pdfpagemode=FullScreen
}
%for author listing
\usepackage{authblk}
\raggedbottom
%% for (in)dependence symbols
\makeatletter
\newcommand\multiline[1]{\parbox[t]{\dimexpr\linewidth-\ALG@thistlm}{#1}}
\makeatother
%% (in)dependence symbol
\usepackage{unicode-math}
\makeatletter
\newcommand*{\indep}{%
\mathbin{%
\mathpalette{\@indep}{}%
}%
}
\newcommand*{\nindep}{%
\mathbin{% % The final symbol is a binary math operator
%\mathpalette{\@indep}{\not}% \mathpalette helps for the adaptation
\mathpalette{\@indep}{/}%
% of the symbol to the different math styles.
}%
}
\newcommand*{\@indep}[2]{%
% #1: math style
% #2: empty or \not
\sbox0{$#1\perp\m@th$}% box 0 contains \perp symbol
\sbox2{$#1=$}% box 2 for the height of =
\sbox4{$#1\vcenter{}$}% box 4 for the height of the math axis
\rlap{\copy0}% first \perp
\dimen@=\dimexpr\ht2-\ht4-.2pt\relax
% The equals symbol is centered around the math axis.
% The following equations are used to calculate the
% right shift of the second \perp:
% [1] ht(equals) - ht(math_axis) = line_width + 0.5 gap
% [2] right_shift(second_perp) = line_width + gap
% The line width is approximated by the default line width of 0.4pt
\kern\dimen@
\ifx\\#2\\%
\else
\hbox to \wd2{\hss$#1#2\m@th$\hss}%
\kern-\wd2 %
\fi
\kern\dimen@
\copy0 % second \perp
}
\makeatother
%%%%%%%%%%%%%
%% dotted underline
\newcommand{\udensdot}[1]{%
\tikz[baseline=(todotted.base)]{
\node[inner sep=1pt,outer sep=1pt] (todotted) {#1};
\draw[densely dotted, thick] (todotted.south west) -- (todotted.south east);
}%
}%
%% dotted underline
\newcommand{\udot}[1]{%
\tikz[baseline=(todotted.base)]{
\node[inner sep=1pt,outer sep=0pt] (todotted) {#1};
\draw[dotted, thick] (todotted.south west) -- (todotted.south east);
}%
}%
%% make cross sign for the table
\newcommand{\tikzxmark}{%
\tikz[scale=0.23] {
\draw[line width=0.7,line cap=round] (0,0) to [bend left=6] (1,1);
\draw[line width=0.7,line cap=round] (0.2,0.95) to [bend right=3] (0.8,0.05);
}}
% %% for circle-arrow edge-endpoints
% \def\leftcirc#1{% Parameter, arrow style
% \xdef\y{0}
% \foreach \left in {white} {
% {
% \draw[fill=\left] (0,\y) circle (2pt);
% \draw[shorten <=2pt, shorten >=2pt, arrows=#1] (0,\y) -- (0.7, \y);
% \pgfmathparse{\y+1}
% \global\let\y\pgfmathresult
% }
% }
% }
% \def\rightcirc#1{% Parameter, arrow style
% \xdef\y{0}
% {
% \foreach \right in {white} {
% \draw[fill=\right] (0.7,\y) circle (2pt);
% \draw[shorten <=2pt, shorten >=2pt, arrows=#1] (0,\y) -- (0.7, \y);
% \pgfmathparse{\y+1}
% \global\let\y\pgfmathresult
% }
% }
% }
%% various edge end-points representation
\newcommand{\starstar}{%
\begin{tikzpicture}[baseline=-3pt]
\draw [{Rays[n=6]}-{Rays[n=6]}] (0,0) -- (0.55,0);
\end{tikzpicture}
}
\newcommand{\circstar}{%
\begin{tikzpicture}
\draw [{Circle[open]}-{Rays[n=6]}] (0,0) -- (0.55, 0);
\end{tikzpicture}
}
\newcommand{\starcirc}{%
\begin{tikzpicture}
\draw [{Rays[n=6]}-{Circle[open]}] (0,0) -- (0.55, 0);
\end{tikzpicture}
}
\newcommand{\stararrow}{%
\begin{tikzpicture}
\draw [{Rays[n=6]}-{Straight Barb[length=2.5pt]}] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\arrowstar}{%
\begin{tikzpicture}
\draw [{Straight Barb[length=2.5pt]}-{Rays[n=6]}] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\circarrow}{%
\begin{tikzpicture}
\draw [{Circle[open]}-{Straight Barb[length=2.5pt]}] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\tailcirc}{%
\begin{tikzpicture}[baseline=-3pt]
\draw [-{Circle[open]}] (0,0) -- (0.4, 0);
\end{tikzpicture}
}
\newcommand{\circtail}{%
\begin{tikzpicture}[baseline=-3pt]
\draw [{Circle[open]}-] (0,0) -- (0.4, 0);
\end{tikzpicture}
}
\newcommand{\circirc}{%
\begin{tikzpicture}[baseline=-3pt]
\draw [{Circle[open]}-{Circle[open]}] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\tailstar}{%
\begin{tikzpicture}[baseline=-3pt]
\draw [-{Rays[n=6]}] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\startail}{%
\begin{tikzpicture}
\draw [{Rays[n=6]}-] (0,0) -- (0.5, 0);
\end{tikzpicture}
}
\newcommand{\tailarrow}{%
\begin{tikzpicture}
\draw [-{Straight Barb[length=2.5pt]}](0,0) -- (0.4, 0);
\end{tikzpicture}
}
\newcommand{\arrowtail}{%
\begin{tikzpicture}
\draw [{Straight Barb[length=2.5pt]}-](0,0) -- (0.4, 0);
\end{tikzpicture}
}
\newcommand{\arrowarrow}{%
\begin{tikzpicture}
\draw [{Straight Barb[length=2.5pt]}-{Straight Barb[length=2.5pt]}](0,0) -- (0.4, 0);
\end{tikzpicture}
}
\newcommand\stackedarrows{%
\mathrel{\vcenter{\mathsurround0pt
\ialign{##\crcr
\noalign{\nointerlineskip}$\arrowtail$\crcr
\noalign{\nointerlineskip}$\tailarrow$\crcr
}
}}%
}
%% thick dashed loine
\newcommand{\thickdash}{%
\begin{tikzpicture}
[baseline=-0.7mm]
\draw [line width= 0.6pt, dash pattern={on 6pt off 2pt}] (0,0) -- (0.5,0);
\end{tikzpicture}
}
% to extend hypoerref link to Figure and a, b, c
\newcommand*{\figref}[2][]{%
\hyperref[{fig:#2}]{%
Figure~\ref*{fig:#2}%
\ifx\\#1\\%
\else
#1%
\fi
}%
}
% \theoremstyle{definition}
\newtheorem{definition}{Definition}
%use next two lines instead for non-italic alternative
%\newtheorem{preremark}{Definition}
%\newenvironment{mydef}{\begin{preremark}\upshape}{\end{preremark}}
% for the footnotes without numbering
\newcommand\blfootnote[1]{%
\begingroup
\renewcommand\thefootnote{}\footnote{#1}%
\addtocounter{footnote}{-1}%
\endgroup
}
\onehalfspacing
\setlength{\droptitle}{-4em} %% Don't touch
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SET THE TITLE
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TITLE:
\title{Discovering Cyclic Causal Models in Psychological Research
}
% AUTHOR:
\author[1*]{Kyuri Park}
\author[2]{Lourens J. Waldorp}
\author[3]{Ois\'{i}n Ryan}
\affil[1]{Computational Science Lab, Informatics Institute, University of Amsterdam}%
\affil[2]{Department of Psychology, University of Amsterdam}%
\affil[3]{Department of Data Science and Biostatistics, Julius Centre, University Medical Centre Utrecht}%
% \author[{\Large{Kyuri Park\textsuperscript{1}}\thanks{Correspondence concerning this paper should be addressed to: Kyuri Park, Department of Methodology and Statistics, Utrecht University, Padualaan 14, 3584 CH Utrecht, The Netherlands. e-mail: \href{mailto:kyurheep@gmail.com}{kyurheep@gmail.com}} \\
% }
% DATE:
\date{ }
% \date{May 8, 2023}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\thispagestyle{firstpage}
\begin{adjustwidth}{1.5cm}{1.5cm}
\noindent \textbf{\textit{Abstract }} \\
\noindent Statistical network models have become popular tools for analyzing multivariate psychological data. In empirical practice, network parameters are often interpreted as reflecting causal relationships – an approach that can be characterized as a form of causal discovery. Recent research has shown that undirected network models are likely to perform poorly as causal discovery tools in the context of discovering acyclic causal structures, a task for which many alternative methods are available. However, acyclic causal models are likely unsuitable for many psychological phenomena, such as psychopathologies, which are often characterized by cycles or feedback loop relationships between symptoms. A number of cyclic causal discovery methods have been developed, largely in the computer science literature, but they are not as well studied or widely applied in empirical practice. In this paper, we provide an accessible introduction to the basics of cyclic causal discovery for empirical researchers. We examine three different cyclic causal discovery methods and investigate their performance in typical psychological research contexts by means of a simulation study. We also demonstrate the practical applicability of these methods using an empirical example and conclude the paper with a discussion of how the insights we gain from cyclic causal discovery relate to statistical network analysis.
\vspace{1.5mm}
\noindent\textbf{{\textit{Keywords: }}}%
Constraint-based, cyclic causal discovery, directed cyclic graph (DCG), partial ancestral graph (PAG), statistical network model\\
\end{adjustwidth}
\blfootnote{\textsuperscript{\scriptsize{*}}Correspondence concerning this paper should be addressed to: Kyuri Park, Computational Science Lab, Informatics Institute, Faculty of Science, University of Amsterdam, PO Box 94323, 1090 GH Amsterdam, The Netherlands. e-mail: \href{mailto:kyurheep@gmail.com}{kyurheep@gmail.com}}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% BODY OF THE DOCUMENT
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% --------------------
\section{Introduction}
% --------------------
A fundamental task in various disciplines of science is to understand the mechanisms or causal relations underlying the phenomena of interest. In psychology, one of the core questions is how psychopathology comes about, with the network theory positing that mental disorder is produced by a system of direct causal interactions between symptoms \citep{BorsboomCramer2013}. Given this theoretical framework, statistical network models have become popular tools for analyzing multivariate observational data \citep{robinaugh2020, epskamp_estimating_2018}. In practice, empirical researchers often interpret the conditional statistical relationships estimated in a network model as reflecting the causal relationships between symptoms of mental disorder and/or other psychological variables --- an approach that can be characterized as a form of causal discovery \citep{spirtes2000, peters_elements_2017, Ryan2022}. However, it has been shown that network models are likely to perform poorly as causal discovery tools; relations in the network may not reflect the direct causal effects that researchers aim to discover, as they may be produced by, amongst other inferential issues, unwittingly conditioning on common effects \citep{dablander2019node, Ryan2022}.
In the field of causal discovery, one class of methods that utilizes patterns of statistical (in)dependence estimated from observational data to infer causal structures is known as \textit{constraint-based} causal discovery \citep{spirtes_algorithm_1991}. \cite{Ryan2022} suggest that network models could be replaced by purpose-built constraint-based causal discovery methods. However, the most popular and well-studied constraint-based methods assume that causal relationships are \textit{acyclic}; that is if X causes Y, then Y does not cause X \citep{Glymour2019}. Although so-called Directed Acyclic Graphs (DAGs) \citep{pearl2009causality} are popular tools for causal modeling \citep{tennant2021use}, the acyclicity assumption is problematic when studying a phenomena such as mental disorders, since \textit{cyclic} relationships or \textit{feedback loops} are critical to the theoretical understanding of psychopathology \citep{borsboom_network_2017, haslbeck_modeling_2021}. For example, \cite{wittenborn_2016} suggest that several different causal feedback loops, such as \textit{perceived stress $\tailarrow$ negative affect $\tailarrow$ rumination $\tailarrow$ perceived stress} play a key role in sustaining depression. Such theoretical expectations necessitate the use of \textit{cyclic causal discovery} methods.
Although some constraint-based cyclic causal discovery algorithms have been developed \citep{mooij_classen2020, richardson1996, strobl2019}, they have not been as well studied as their acyclic counterparts. In part, this is due to the conceptual and practical difficulties in fitting and interpreting cyclic causal models. Conceptually, a number of researchers in the causal modeling literature have shown that, under certain conditions, cyclic causal models fit to cross-sectional data may be interpreted as reflecting causal relations between \textit{equilibriums} or resting states of a dynamic system \citep{iwasaki1994causality, dash2005restructuring, strotz1960recursive, spirtes1995directed, mooij2013ordinary, weinberger2021intervening, bongers2022causal}. From this perspective, cyclic causal relations should be interpreted as a kind of coarse-grained or time-averaged representation of (reciprocal) causal relations between processes that evolve over time; for a detailed treatment of cyclic equilibrium causal models in the context of psychological modeling, we refer readers to \cite{ryan2022equilibrium}. On the practical side, in the context of structural equation modeling, it is well known that all acylic path-models (i.e. containing independent error terms and no latent variables) are statistically identified, whereas this is not generally the case for path models which contain cycles \citep{Bongers2021, bollen_structural_1989}.
% Unlike acyclic models, cyclic models are not always identified --- they can have either too many solutions or no solutions at all \citep{Bongers2021}.
Furthermore, interpreting the output of cyclic causal discovery algorithms is significantly more challenging than for their acyclic counterparts: Even in ideal scenarios, the same pattern of statistical dependence that can be used to deduce a direct causal link in a DAG may not reflect a direct causal link in a cyclic graph \citep{bongers2018theoretical, hyttinen_discovering_2013}.
Recent work has explored the application of \textit{invariance-based} algorithms --- another type of causal discovery methods capable of estimating cycles --- to psychological data \citep{Kossakowski2021}. However, these methods require multiple datasets that measure the same variables but in different settings, such as a mix of observational and experimental data, which is a disadvantage compared to constraint-based methods that can be applied using only a single observational dataset \citep{peters_causal_2016, Glymour2019}.
% Although recent work has applied so-called \textit{invariance-based} cyclic causal discovery algorithms to psychological research \citep{Kossakowski2021}, these methods require a mix of experimental data from different settings, which is a disadvantage when compared to constraint-based methods.
% Recently, other types of cyclic causal discovery techniques have also been applied in psychological research \citep{Kossakowski2021}. However, these methods require a mix of experimental data from different settings, a disadvantage when compared to constraint-based methods.
To our knowledge, little research has been done on the applicability of constraint-based cyclic causal discovery methods in psychology, and much remains unknown about their performance.
Therefore, in this paper, we aim to address the following question: How well do constraint-based cyclic causal discovery methods perform in typical psychological research contexts? First, we will provide an accessible overview of the different cyclic causal discovery methods, including the assumptions under which they are expected to work and how the output of these methods should be interpreted. Second, we investigate, by means of a simulation study, how well these different methods perform under various circumstances. In the simulation study, we study more and less ideal situations, by varying the sample size, the size and density of the underlying network, and the presence or absence of unobserved confounding variables. Third, we demonstrate the practical applicability of these methods by applying them to empirical data \citep{mcnally_co-morbid_2017} and discussing how the insights we gain relate to the statistical network analysis of the same data.
% The goal of this study is threefold. First, we provide an accessible overview of different cyclic causal discovery methods. Second, we investigate how well each of these methods works by means of a simulation study. Third, we demonstrate their applicability in practice by testing them on empirical data.
% Hence, in this paper, we aim to examine the suitability of cyclic causal discovery methods for psychological research. The goal of this study is twofold. First, we provide an accessible overview of different cyclic causal discovery methods. Second, we investigate how well each of these methods works by means of a simulation study.
% \vspace{-3mm}
% In this paper, we investigate the suitability of cyclic causal discovery methods for psychological research. The goal of this study is twofold; First, we will provide an accessible overview of different cyclic causal discovery methods, including the assumptions under which they are expected to work. Second, we will investigate how well each of these methods works by means of a simulation study, where we vary the sample size, network size, network density, and presence of unobserved confounding.
% Such theoretical expectations necessitate the use of more complex \textit{cyclic causal discovery} algorithms. Although a number of cyclic causal discovery algorithms have been developed \citep{mooij_classen2020}, they have not been as well studied as their acyclic counterparts. In part this is due to practical difficulties in fitting and interpreting cyclic causal models, since the properties, which map statistical to causal (in)dependencies are more complex than for acyclic models. While other cyclic causal discovery techniques have recently been applied in psychological settings \citep{Kossakowski2021}, these methods require a mix of experimental data from different settings, a disadvantage when compared to constraint-based methods. To our knowledge, little to no research has been done on the applicability of constraint-based cyclic causal discovery algorithms in psychological research.
% In this paper, we investigate the suitability of cyclic causal discovery methods for psychological research. The goal of this study is twofold; First, we will provide an accessible overview of different cyclic causal discovery methods, including the assumptions under which they are expected to work. Second, we will investigate how well each of these methods works by means of a simulation study, where we vary sample size, network size, density of the causal relations, and the presence of unobserved confounding.
% Third, we will demonstrate their applicability in practice by testing them on empirical psychological data.
% We choose three algorithms that can handle cyclic structure: cyclic causal discovery (CCD), cyclic causal inference (CCI), fast causal inference (FCI).
\section{Background}
In this section, we will establish the basic concepts of graphical models and examine how constraint-based causal discovery methods operate. We will first introduce different types of graphical models, while demonstrating the differences between statistical and causal graphical models using example graphs. Then, we will explain the assumptions that underlie constraint-based causal discovery methods, as well as the technical difficulties that arise in the presence of cycles. Lastly, we will illustrate each step of the constraint-based causal discovery procedure using a simple example of a directed graph.
\subsection{Graphical Models}
% A graphical model is a set of multivariate joint distributions that show certain conditional independencies among random variables. Each model is associated with a graph $G = (V, E)$, where $V$ represents a set of vertices (nodes) and $E$ represents a set of edges that encode the conditional independencies between each pair of vertices (ref).
% We use the term causal graphical model to denote a graphical model that describes the causal mechanisms of a system.
% Causal graphical models can capture the probabilistic and causal properties of multivariate distributions, under the two following assumptions.
A graph is a pair $\mathcal{G} = (\mathcal{V}, \mathcal{E})$, where $\mathcal{V}$ is a set of vertices and $\mathcal{E}$ is a set of edges, describing the connections between the vertices. A probabilistic graphical model uses a graph to express the conditional (in)dependencies between random variables, where the vertices represent random variables, and the edges encode the conditional dependencies that hold among the set of variables \citep{lauritzen1996graphical}.
There exist various graphical models that differ in the types of edges they allow (e.g., directed vs. undirected) and how edges are interpreted in terms of statistical relationships. One commonly known graphical model in psychology is the \textit{Pairwise Markov Random Field} (PMRF) --- an undirected graph where edges indicate a statistical association between variables after controlling for all other variables --- which forms the basis of statistical network models such as the Gaussian Graphical Model (GGM) \citep{epskamp_gaussian_2018, epskamp_tutorial_2018}. In PMRFs, edges are strictly undirected, and the presence of an edge $A-B$ indicates that $A$ and $B$ are statistically dependent, conditional on the set of all other variables in the network \citep{borsboom_network_2021}. Models like the GGM and Ising network model can be seen as parameterizations of the PMRF. These models assume particular distributions for the variables involved, and, in the case of the GGM, assume that conditional (in)dependence relations can be captured by linear dependence parameters such as the partial correlation.
In a \textit{causal} graphical model, on the other hand, the edges describe \textit{causal} relationships between variables; the edges are typically directed, with $A \tailarrow B$ denoting that intervening on $A$ results in a change in the probability distribution of $B$ \citep{geiger_logic_1990}. The simplest and most commonly used causal graph is a \textit{directed acyclic graph} (DAG), which is commonly utilized to represent a \textit{Bayesian network}. A DAG consists of directed edges without any cycles \citep{pearl_probabilistic_1988}. When a causal graph contains cycles, it is referred to as a \textit{directed cyclic graph} (DCG). Two examples of causal graphical models are shown in \figref{1}. \figref[(a)]{1} does not contain any cycles, whereas \figref[(b)]{1} does; hence they are called a DAG and DCG, respectively. In \figref[(b)]{1}, the cycle $X_2 \rightleftarrows X_3$ denotes that $X_2$ is a direct cause of $X_3$ and $X_3$ is a direct cause of $X_2$.
\begin{figure}[!t]
\centering
\caption{Example causal graphical models and corresponding PMRF models.}
\includegraphics[width=0.65\textwidth]{figures/Fig1.pdf}
\vspace{3mm}
\caption*{\small{\textit{Note.} (a) is the example directed acyclic graph (DAG). (b) is the example directed cyclic graph (DCG). (c) is the PMRF (Pairwise Markov Random Field) corresponding to the DAG in (a). (d) is the PMRF corresponding to the DCG in (b). }}
\label{fig:1}
\end{figure}
Causal graphical models also describe patterns of statistical independencies, which can be read off from structure of the graph using Pearl's \textit{d-separation criterion} \citep{geiger_d-separation_1990}.
The idea of this criterion is to associate \textit{dependence} with \textit{connectedness} (i.e., the path between $A$ and $B$ is activated by $C$; \textit{d-connected} by $C$) and \textit{independence} with \textit{separation} \citep[i.e., the path between $A$ and $B$ is blocked by $C$, and so these variables are \textit{d-separated} by $C$; for a detailed formal treatment of d-separation, see][]{tian1998finding}. Formally, two variables $A$ and $B$ are said to be d-separated given $C$ if and only if all paths between $A$ and $B$ are \textit{blocked} when conditioning on $C$ \citep{pearl2009causality}. Different types of directed paths in a graph are either blocked or unblocked by conditioning on variables along them. For instance, in \figref[(a)]{1}, we see a \textit{chain} structure $X_1 \tailarrow X_2 \tailarrow X_3$, which implies that $X_1$ and $X_3$ are marginally dependent ($X_1 \nindep X_3$) but independent conditional on $X_2$ ($X_1 \indep X_3 \mid X_2$). More formally, we would say $X_1$ and $X_3$ are \textit{d-connected} given the empty set but \textit{d-separated} by $X_2$. A \textit{fork} structure $X_3 \arrowtail X_2 \tailarrow X_4$ implies the same pattern of independencies; $X_3$ and $X_4$ are marginally dependent ($X_3 \nindep X_4$) but independent conditional on $X_2$ ($X_3 \indep X_4 \mid X_2$). However, a \textit{collider} structure $X_2 \tailarrow X_4 \arrowtail X_5$ implies a contrasting pattern; here $X_2$ and $X_5$ are marginally independent (i.e. d-separated when conditioning on the empty set, $X_2 \indep X_5$) but dependent conditional on $X_4$ ($X_2 \nindep X_5 \mid X_4$). This distinguishing characteristic of colliders is crucial when identifying the directions of causal relationships, as will be shown in Section \ref{primer}. In principle, the d-separation criterion can be applied to both acyclic and cyclic causal graphs, as long as certain conditions, discussed in Section \ref{acyclicvscyclic}, are met.
Having established the basics, we can now examine how PMRF-based statistical network models relate to different types of directed causal models. In \figref[(c)]{1}, we show the PMRF model that corresponds to the DAG in \figref[(a)]{1}, where an additional edge is introduced between $X_2 - X_5$ due to conditioning on the common effect (i.e., collider), $X_4$. In \figref[(d)]{1}, the PMRF model corresponding to the DCG in \figref[(b)]{1} is shown. Two spurious edges are induced in the PMRF network (e.g., $X_1 - X_3$ and $X_2 - X_4$) because of conditioning on the colliders $X_2$ and $X_3$. These examples illustrate the limitations of statistical network models in inferring patterns of directed causal relationships. PMRFs can contain spurious edges resulting from conditioning on common effects, and the possibility of producing collider structures is likely to be higher with the presence of cycles, exacerbating this issue. Notably, while the mapping from a causal graph to the statistical network we show here is unique, the two statistical networks presented may have been generated by various distinct causal graphs, including those with or without cycles. For more details on the relationship between PMRF-based networks and causal graphs, we refer readers to \cite{Ryan2022}.
% These examples serve to highlight the inherent limitations of using statistical network models to make inferences about patterns of directed causal relationships. That is, PMRFs may contain spurious edges induced by conditioning on common effects, and the possibility of producing collider structures is likely to be higher with the presence of cycles, exacerbating the problem.
% Despite this limitation, in practice, PMRFs have often been interpreted as a \textit{causal skeleton} --- the undirected version of a causal graph --- which can be highly misleading, especially in cyclic graphs where numerous colliders may exist \citep{Ryan2022}.
% Therefore, statistical network models are not ideal for discovering the underlying causal structure.
Despite this limitation, in practice, PMRFs have often been interpreted as a \textit{causal skeleton} --- the undirected version of a causal graph \citep{haslbeck_how_2018}.
Further elaborating on this, \cite{Ryan2022} demonstrated how PMRF-based network models can, in fact, be used to identify a so-called \textit{equivalence class} of causal graphs under certain assumptions. However, these models are prone to suboptimal performance, as the equivalence class they identify is likely much larger than that of custom-built causal discovery methods. Consequently, the authors suggest that causal discovery methods specifically designed for this task are likely to outperform statistical network models in learning the underlying causal structure, indicating that network models may not be desirable tools for discovering causal relationships.
% Further elaborating on this, \cite{Ryan2022} suggest that causal discovery methods specifically designed for this task are likely to perform better than statistical network models.
In the following sections, we will focus on how constraint-based causal discovery methods recover the causal structure while looking into the assumptions they require. Additionally, we will explore the practical and conceptual difficulties involved in performing causal discovery in the presence of causal cycles.
% Therefore, statistical network models are not desirable tools for discovering the underlying causal structure. Further elaborating on this, \cite{Ryan2022} suggest that causal discovery methods specifically designed for this task are likely to perform better than statistical network models. In the following sections, we will focus on how constraint-based causal discovery methods recover the causal structure, while looking into the assumptions they require. Additionally, we will explore the practical and conceptual difficulties involved in performing causal discovery in the presence of causal cycles.
% Similar problems also emerge when there are unobserved common causes (i.e., latent confounders); not accounting for the latent confounders can produce spurious associations (ref).
% The spurious edges induced by conditioning on common effects (i.e., colliders) are well-known problems of using PMRF-based statistical network models to infer acyclic causal structures \citep{dablander2019node}. The same problem carries over to the cyclic case, as shown in \figref[d]{1}; two spurious edges are induced in the PMRF network (e.g., $X_1 - X_3$ and $X_2 - X_4$) when conditioning on the colliders $X_2$ and $X_3$. This clearly manifests the limitation of using statistical network models for causal discovery. \cite{Ryan2022} discussed this issue in detail and conclude that purpose-built causal discovery methods are likely to outperform statistical network models as causal discovery tools. Hence, in the remainder of the paper, we focus on constraint-based causal discovery methods while examining how they recover the underlying causal structure and under what assumptions they can be expected to work.
\subsection{Acyclic vs. Cyclic Causal Graphs} \label{acyclicvscyclic}
The d-separation criterion described above applies to all acyclic graphs, but for graphs with cycles, it applies only under certain conditions.
To understand these conditions, we first need to introduce some graph terminology. In the field of graphical models, we use kinship terminology to describe a graph structure, as follows:
\begin{equation*}
\text{if }
\begin{rcases}
\begin{dcases}
A \rightarrow B \\
A \leftarrow B \\
A \rightarrow \cdots \rightarrow B \,\,or \,\,A = B \\
A \leftarrow \cdots \leftarrow B \,\,or \,\,A = B
\end{dcases}
\end{rcases}
\text{ in } \mathcal{G} \text{ then } A \text{ is a }
\begin{rcases}
\begin{dcases}
\text{parent} \\
\text{child} \\
\text{ancestor} \\
\text{descendant}
\end{dcases}
\end{rcases}
\text{ of } B \text{ and}
\begin{rcases}
\begin{dcases}
A \in pa_{\mathcal{G}}(B) \\
A \in ch_{\mathcal{G}}(B) \\
A \in an_{\mathcal{G}}(B) \\
A \in de_{\mathcal{G}}(B)
\end{dcases}
\end{rcases}
\text{.}
\end{equation*}
\noindent Also, when there exists an edge between two vertices $A - B$, $A$ and $B$ are said to be \textit{adjacent}. For example, in \figref[(b)]{1}, $X_1 \in pa_{\mathcal{G}} (X_2)$, $\,X_2 \in ch_{\mathcal{G}} (X_1)$, $\,\{X_1, X_2, X_3, X_4\} \in an_{\mathcal{G}}(X_3)$, $\,\{X_1, X_2, X_3\} \in de_{\mathcal{G}} (X_1)$, and $X_2$ is adjacent to $X_1$ and $X_3$. With this terminology in place, we can define the conditions that relate patterns of causal dependency in a causal graph to patterns of statistical dependency between random variables. First, we introduce the \textit{global Markov} condition, which states that d-separation relations represented in causal graphs can be used to read off statistical independence relations such that:
% Second, we need to define the condition under which the d-separation criterion holds. This is known as the \textit{global Markov} condition, which states that the d-separation relations represented in causal graphs correspond to the statistical (in)dependencies such that:
$$ \text{if } X_A \indep_{\mathcal{G}} X_B \mid X_C \Longrightarrow X_{A} \indep X_{B} \mid X_{C} \text{ for all disjoint subsets of } X_A, X_B, X_C, $$
\noindent where $\indep_{\mathcal{G}}$ refers to d-separation in graphs, and $\indep$ refers to statistical independence between random variables. If causal graphs are \textit{acyclic}, such as DAGs, then the \textit{global Markov} condition holds regardless of the functional forms of causal relations and the distributions of variables involved \citep{lauritzen1996graphical}. In addition, in DAGs, the \textit{global Markov} condition entails the \textit{local Markov} condition, which states that a variable is independent of its non-descendants given its parents \citep{lauritzen2000graphical}. The fact that one Markov property implies the other comes in handy when reading off conditional independencies from a graph.
In contrast to the acyclic case, the situation is not as straightforward in \textit{cyclic} graphs. In DCGs, the global Markov property does not always hold. \cite{spirtes1994} showed that this property does hold when causal relations are \textit{linear} and the error terms are \textit{independent}. However, even in this case, the global Markov property does not imply the local Markov property. For example, in \figref[(b)]{1}, the global Markov property is preserved ($ X_1 \indep_{\mathcal{G}} X_4 \mid \{X_2, X_3\} \Longrightarrow X_{1} \indep X_{4} \mid \{X_2, X_3\}$), but the
local Markov property is violated as $X_2 \nindep_\mathcal{G} X_4 \mid X_3$ (i.e., $X_2$ is \textit{not} independent of its non-descendant $X_4$ given its parent $X_3$). This is because $X_3$ serves as both a parent of $X_2$ and a collider on the path $X_2 \tailarrow X_3 \arrowtail X_4$ at the same time. In the current paper, we limit the scope of our study to cyclic causal graphs that represent \textit{linear} causal relations with jointly \textit{independent} error terms, so for which the global Markov condition is satisfied. This type of assumption is common in many statistical modeling traditions in psychology and social sciences. For example, structural equation models and popular statistical network models, such as the GGM, also rely on similar assumptions about the linearity of statistical relations \citep{epskamp_gaussian_2018, bollen1993testing}.
In addition to the above, constraint-based causal discovery methods typically make use of two additional assumptions \citep{pearl2009causality, spirtes2000}. The first is known as \textit{faithfulness}, which is essentially the reverse of the global Markov condition, stating that statistical independencies map onto the structure of causal graphs such that:
$$ X_{A} \indep X_{B} \mid X_{C} \Longrightarrow X_A \indep_{\mathcal{G}} X_B \mid X_C.$$
In other words, it postulates that if two variables are (conditionally) statistically independent of each other, then this implies that they are causally independent of each other, ruling out, for instance, that different paths in a causal system exactly cancel one another out.\footnote{For a discussion of this assumption in the context of psychological network analysis, see \cite{Ryan2022}.}
The second assumption, known as \textit{causal sufficiency}, relates to the absence of \textit{unobserved} (i.e. \textit{latent}) confounding variables --- that is, any unobserved common causes of the variables within the causal graph. This assumption ensures that the statistical dependence between two variables can be explained by the patterns of causal dependence among the observed variables. Without this assumption, unobserved confounders can induce an edge between variables when there is no direct causal relation between them \citep{lauritzen1996graphical, spirtes2000}. Crucially, not all causal discovery algorithms require the assumption of causal sufficiency; we will revisit a discussion of these methods in Section \ref{cdalgo}, but in the example below we assume sufficiency holds.
% \subsection{Causal Modeling Assumptions}
% In order to map the relations represented in causal graphical models $\mathcal{G}$ to the structural functions in SCMs and vice versa, a couple of assumptions need to be satisfied; causal Markov assumption and faithfulness assumption.
% \subsubsection{Causal Markov Assumption} \label{markov}
% The causal Markov assumption is required when inferring a set of conditional independencies from a causal graphical model. It entails three Markov properties in specifics \citep{lauritzen1996graphical}. Given a directed graph $\mathcal{G}$ and a joint distribution $\mathcal{P}$, this distribution is said to satisfy:
% \begin{enumerate}[nolistsep]
% \item \textit{global Markov} property if
% $$ A \indep_{\mathcal{G}} B \mid C \Longrightarrow X_{A} \indep_{\mathcal{P}} X_{B} \mid X_{C}$$ for all subsets of $A$, $B$, $C$ (also known as the \textit{d-separation criterion}).
% \item \textit{local Markov} property if each variable is independent of its non-descendants given its parents.
% \item \textit{Markov factorization} property if $$P(X_1, X_2, ..., X_n) = \prod_{i}^{n}P(\,X_i \,\,|\,\, pa_{i}^{\mathcal{G}}\,),$$ where $pa_{i}^{\mathcal{G}}$ denotes the \textit{parents} of node $i$ (factorization is equivalent to global Markov property when the distribution over $X$ has a density).
% \end{enumerate}
% \noindent The elegance of DAG is rooted in the equivalence of these various versions of Markov properties. When a DAG satisfies one of these Markov properties, it instantly implies that it satisfies the rest \citep{lauritzen2000graphical}. However, in directed cyclic graphs that is not the case. In fact, \cite{spirtes1994} showed that both global and local Markov properties may fail in directed graph with cycles.
% In the directed cyclic graph from Figure \ref{fig:1} (b) for example, it can be observed that global Markov property holds ($ X_1 \indep_{\mathcal{G}} X_4 \mid \{X_2, X_3\} \Longrightarrow X_{1} \indep_{\mathcal{P}} X_{4} \mid \{X_2, X_3\}$), but the
% local Markov property is violated as $X_1 \nindep_\mathcal{G} X_4 \mid X_2$ (i.e., $X_1$ is \textit{not} d-separated from its non-descendant $X_4$ given its parent $X_2$). On the contrary, it can be easily seen that all three properties hold in the DAG from Figure \ref{fig:1} (a).
% \subsubsection{Faithfulness Assumption}
% It states the reverse of causal Markov assumption. That is, a distribution $\mathcal{P}$ is faithful to a graph $\mathcal{G}$ if the conditional independencies imply the associated graph structure such that:
% $$ X_{A} \indep_{\mathcal{P}} X_{B} \mid X_{C} \Longrightarrow A \indep_{\mathcal{G}} B \mid C.$$
% Together with the global Markov property (i.e., d-separation criterion), faithfulness assumption is of key importance to infer causal relations from graphical models \citep{Bongers2021}.
% \subsubsection{Causal Models in Current Study} \label{currentscope}
% As the distribution $\mathcal{P}$ may not obey the global Markov property when $\mathcal{G}$ contains cycles, there has to be additional constraints imposed on $\mathcal{P}$ such that we can ensure that the property holds.
% Spirtes (1995) showed that if $\mathcal{P}$ obeys a linear SCM with jointly independent error terms ($\varepsilon$), then it satisfies the global Markov property with respect to the directed cyclic graph $\mathcal{G}$.
% Accordingly, here we limit the scope of our study to the subclass of cyclic SCMs, where the relationships between the variables are linear and the error terms are independent of each other. In addition, we assume that the error terms are normally distributed, which is one of common assumptions in psychological research \citep{zhang_statistical_2022}.
\subsection{A Primer on Constraint-Based Causal Discovery} \label{primer}
Under the aforementioned assumptions, constraint-based methods seek to recover the underlying causal structure by testing for conditional independence relations between variables from observational data \citep{scheines_tetrad_1998, peters_elements_2017, pearl_probabilistic_1988}. Assuming linear relations with additive Gaussian errors, conditional independence can be tested using partial correlations \citep{lawrance_conditional_1976}, although notably a number of non-parametric conditional independence tests can also be used under less stringent assumptions \citep{li_fan2020, huang_sun_white_2016}. Constraint-based methods typically employ a two-step procedure; first, establishing the \textit{skeleton} --- an undirected version of the underlying graph --- and second, attempting to assign directions to the edges. In general, constraint-based techniques, much like any methods relying on observational datasets, are unable to uniquely identify the underlying causal graph, but instead return a set of causal graphs that imply the same statistical independence relations \citep{spirtes2000}.
% By constraint-based, we mean a two-step procedure in which observed patterns of conditional independencies between variables are (1) first used to establish the \textit{skeleton} --- an undirected version of the underlying graph --- and (2) second, more detailed patterns are used to assign directions to edges.
\begin{figure}[t]
\centering
\caption{Steps of a constraint-based method.}
\includegraphics[width=1.0\textwidth]{figures/Fig2.pdf}
\vspace{0.1mm}
\caption*{\small{\textit{Note.} (a) shows the fully-connected graph for the example DAG from \figref[(a)]{1}, which is the initial starting point of the algorithm. (b) shows the estimated \textit{skeleton} --- an undirected graph of the underlying causal structure --- after the first step. (c) shows the resulting graph after the second step, which represents the \textit{Markov equivalence class} of DAGs (i.e., a set of DAGs that entail the same set of conditional independencies).}}
\label{fig:2}
\end{figure}
To develop an intuition for how constraint-based methods work, we will examine how they operate on data generated by a system for which the causal graph is represented by the relatively simple DAG shown in \figref[(a)]{1}. The method begins with a fully-connected graph, as shown in \figref[(a)]{2}.
In the first step, the \textit{skeleton} is estimated by testing for conditional independence; if two variables are independent when conditioning on \textit{any} subset of the remaining variables (e.g., $X_1 \indep X_3 \mid X_2$, $X_1 \indep X_4 \mid X_2$, $X_1 \indep X_3 \mid \{X_2, X_4\}, \, etc.$), then the edge between those two variables is removed (see \figref[(b)]{2}). This is based on the principle that, in acyclic graphs, two variables are always statistically dependent (regardless of any conditioning set), if and only if a direct causal relation exists between them. In other words, we can identify the presence of direct causal paths between variables by testing whether they are statistically dependent given any subset of the remaining variables. However, while this principle allows us to detect the presence of an edge, it does not tell us the direction of that edge (e.g., $X_2 \tailarrow X_4$ or $X_2 \arrowtail X_4$). In the second step, we attempt to orient the edges in the skeleton by searching for \textit{colliders} that induce distinctive patterns of independencies (e.g., $X_4$ is identified as a collider since $ X_2 \indep X_5 \text{ and } X_2 \nindep X_5 \mid X_4$, thus $X_2 \tailarrow X_4 \arrowtail X_5$ is oriented; see \figref[(c)]{2}). This procedure we have described is essentially the PC algorithm \citep{spirtes2000}, and the output of the PC algorithm (\figref[(c)]{2}) is called a \textit{complete partially directed acyclic graph} (CPDAG).
% the algorithm searches for \textit{colliders} that induce distinctive patterns of (in)dependencies (e.g., $ X_2 \indep X_5 \text{ and } X_2 \nindep X_5 \mid X_4$) to orient edges (e.g., $X_2 \rightarrow X_4 \leftarrow X_5$; see \figref[c]{2}).
Note that the resulting CPDAG is not identical to the original true graph $\mathcal{G}$, as the two edges between $X_1 - X_2$ and $X_2 - X_3$ remain undirected. There are, in fact, three DAGs implied by the CPDAG, which are obtained by assigning directions to the undirected edges, excluding the combinations that introduce new colliders. These DAGs are called \textit{Markov equivalent}, meaning that they encode the same conditional (in)dependencies (i.e., the same d-separation relations hold), and we call such a set of equivalent graphs a \textit{Markov equivalence class}, denoted by $Equiv(\mathcal{G})$ \citep{spirtes2000}. These Markov equivalent DAGs are shown in the right-hand side of \figref{3}, which summarizes the constraint-based causal discovery procedure that we just described. This highlights a general difficulty in constraint-based causal discovery that relies solely on observational data, namely that there are typically multiple graphs that are consistent with an observed set of statistical independencies. Notice that the Markov equivalence class contains both the true graph shown in \figref[(a)]{1} as well as two other distinct causal graphs; the algorithm is correct, in that the true graph is captured in this equivalence class, however, we cannot distinguish it from the other, equally plausible, members of the equivalence class.
\begin{figure}[t]
\centering
\caption{Summary of the constraint-based causal discovery procedure.}
\includegraphics[width=1.0\textwidth]{figures/Fig3.pdf}
\vspace{0.1mm}
\caption*{\small{\textit{Note.} A constraint-based algorithm starts with performing a series of conditional independence tests on observational (\textit{i.i.d.}: independent and identically distributed) data. Under the faithfulness and global Markov assumption, the algorithm estimates a graph structure based on the observed statistical independence patterns. The output is a \textit{partially directed} graph (as some edges remain undirected). It can represent multiple graphs that are \textit{Markov equivalent}, meaning that they imply the same statistical independence relations. This set of equivalent graphs is referred to as a \textit{Markov equivalence class}, and in this example, it consists of three different DAGs.}}
\label{fig:3}
\end{figure}
Constraint-based methods for cyclic causal discovery operate under similar principles as those described earlier. However, cyclic causal discovery is in general more challenging, and the problem of having multiple graphs that are Markov equivalent is often exacerbated when cycles are allowed \citep{richardson1996automated}. Consider how, in the DAG example above, we could identify the presence of direct causal relations between variables when they are statistically dependent given any subset of the remaining variables. Now suppose we apply the same rule to the DCG shown in \figref[(b)]{1}. In this cyclic graph, there is no direct causal path between $X_1$ and $X_3$, but instead there is the path $X_1 \,\tailarrow\, X_2 \stackedarrows X_3$. When testing for conditional independencies, we find that $X_1$ and $X_3$ are not marginally independent because of the causal chain $X_1 \tailarrow X_2 \tailarrow X_3$. However, unlike in the acyclic case, we also find that $X_1$ and $X_3$ are not conditionally independent given $X_2$, since $X_2$ also acts as a collider on the path $X_1 \tailarrow X_2 \arrowtail X_3$. In cyclic graphs, two variables can be statistically dependent conditional on every subset of the remaining variables, even when there is no direct causal relation between them. Therefore, when a cycle exists, a constraint-based method often cannot directly identify parental relations but recovers only up to \textit{ancestral} relations, typically leading to a larger Markov equivalence class. This also means that the estimated skeleton of a cyclic graph represents ancestral relations, and is thus called the \textit{ancestral skeleton}.
Although the same principles of causal discovery that we described for DAGs can be adapted and used for DCGs, some additional constraints and orientation rules are required to address the complexities arising from the presence of cycles. In the next section, we will further elaborate on these constraints and rules while introducing several constraint-based cyclic causal discovery algorithms.
% In the following section, we will further elaborate on this while introducing several constraint-based cyclic causal discovery methods.
% While the principles of causal discovery that apply to DAGs can be adapted and used for DCGs, the presence of cycles introduces additional complexities. To address these complexities, cyclic causal discovery methods employ additional constraints and orientation rules, which will be elaborated further on in Section \ref{cdalgo}.
% Although the same principles of causal discovery that we described for DAGs are adapted and used for DCGs, some additional constraints and orientation rules are employed to address the complexities arising from the presence of cycles, which will be elaborated in Section \ref{cdalgo}.
% Notably, our ability to recover the skeleton or ancestral skeleton from data in practice also relies on making additional assumptions about unobserved confounding variables, known as \textit{causal sufficiency}. If we relax that assumption, then causal discovery of course becomes more challenging. In the next section, we will look into these assumptions, and algorithms that relax those assumptions, in more detail.
% The problem is exacerbated when cycles are allowed; when variables influence each other and consist of a causal cycle, the whole group of variables involved in a cyclic structure acts like one functional unit, which makes it challenging to learn the causal relations connected to the cyclic structure. For example in \figref[b]{1}, since $X_2$ and $X_3$ form a cycle and behave as if they are a single unit, it becomes difficult to tell whether $X_1$ is a parent of $X_2$ or $X_3$, while it is possible to make a weaker statement such that $X_1$ is an ancestor of $X_2$ and $X_3$. Therefore, when a cycle exists, a constraint-based method often cannot directly identify parental relations but recovers only up to ancestral relations, typically leading to a larger Markov equivalence class. This also means that the estimated skeleton of a cyclic graph represents ancestral relations --- hence called \textit{ancestral skeleton}. Although the same principles that we described here for DAGs are adapted and used for DCGs\footnote{See \figref{3} for a summary of the constraint-based approach procedure.}, some additional constraints and orientation rules are utilized due to the complications arising from presence of cycles, which we will explain in the following section \ref{cdalgo}.
% Here, we described the constraint-based approach procedure for DAGs, but the same principles are adapted and used for cyclic graphs, which will be the focus of the remainder of this paper. See \figref{3} for a summary of the constraint-based approach procedure.
% Note that in this example we assumed another condition other than Markov and faithfulness assumptions, which was not explicitly stated above. We also relied on the assumption that there was no unobserved latent variables (i.e., \textit{causal sufficiency}). This illustrates the general difficulties of causal discovery methods; There are usually multiple causal graphs that are all consistent with the observed statistical independencies, and these methods often require more assumptions to work as they are intended. The causal discovery methods for \textit{cyclic} models also share these limitations and on top of that, relaxing acyclicity assumption adds more challenges. For instance, due to the fact that local Markov property does not hold in general in cyclic graphs, cyclic causal discovery methods often cannot infer the direct parental relationships, but only up to the ancestral relationships \citep{Bongers2021}.
% \pagebreak
% \newgeometry{left=2.5cm,right=2.55cm,top=3cm,bottom=3cm}
% Also, we assumed that we had access to the correct conditional independence information, although in practice it might not be the case even when we perform an appropriate conditional independence test due to sampling error.
% \begin{figure}[t]
% \centering
% \caption{Steps of a constraint-based method.}
% \includegraphics[width=1.0\textwidth]{figures/constraintstep.png}
% \vspace{0.1mm}
% \caption*{\small{\textit{Note.} (a) shows the fully-connected graph for the example DAG from \figref[a]{1}, which is the starting point. (b) shows the estimated \textit{skeleton} --- an undirected graph of the underlying causal structure --- after the first step. (c) shows the resulting graph after the second step, which represents the \textit{Markov equivalence class} of DAGs (i.e., a set of DAGs that entail the same set of conditional independencies).}}
% \label{fig:2}
% \end{figure}
% \begin{figure}[H]
% \centering
% \caption{Markov equivalence set of DAGs}
% \includegraphics[width=1.0\textwidth]{figures/dag_equiv.png}
% \label{fig:3}
% \end{figure}
% restore original geometry
% \clearpage
% \restoregeometry
% \newpage
\section{Causal Discovery Algorithms} \label{cdalgo}
In the previous section, we introduced the key concepts of graphical models and the fundamental principles of constraint-based causal discovery methods. In the following section, using the key concepts of constraint-based causal discovery methods introduced above, we provide a detailed description of three different constraint-based algorithms for cyclic graphs: \textit{cyclic causal discovery} (CCD) \citep{Richardson1996a}, \textit{fast causal inference} (FCI) \citep{spirtes_causal_1995}, and \textit{cyclic causal inference} (CCI) \citep{strobl2019}. We will discuss their assumptions, steps involved, and output graphs along with their interpretation.
% In Section \ref{simulation}, we will study the performance of these algorithms with a simulation study, and in Section \ref{emp-example}, we will apply them to empirical data.
% In the previous section, we have introduced the basic concepts of graphical models and the foundational principles of constraint-based causal discovery methods. We have shown that constraint based causal-discovery works by testing for patterns of conditional (in)dependence. The output of such a causal discovery algorithm is an equivalence class of causal graphs, which all entail the same patterns of statistical (in)dependence, and we typically represent that equivalence class using a graph of some kind. In section \ref{primer}, we illustrated this process using the PC algorithm, which consisted of two steps, resulting in a CPDAG. For cyclic causal graphs, constraint-based algorithms work in a similar way, with some caveats. First, the cyclic causal discovery algorithms work under slightly different assumptions. Second, the specific steps taken are more involved than those of the PC algorithm. Third, the output of these algorithms is not a CPDAG, but instead a different representation of equivalent cyclic graphs, known as a \textit{partial ancestral graph} (PAG). In the following, we will first describe the difference in the assumptions under which each of the algorithms is expected to work. We will then explain the output of each algorithm, the PAG, and its interpretation, before illustrating the specific steps that are taken to estimate the PAG.
\subsection{Assumptions of Algorithms} \label{assumptions}
% In this paper, we compare the performance of three different constraint-based algorithms for cyclic graphs using a simulation study: \textit{cyclic causal discovery} (CCD) \citep{Richardson1996a}, \textit{fast causal inference} (FCI) \citep{mooij_classen2020}, and \textit{cyclic causal inference} (CCI) \citep{strobl2019}.
All three algorithms build upon the same principles and hence can be seen as extensions of the PC algorithm described in Section \ref{primer}, but they entail slightly different assumptions. The CCD algorithm assumes \textit{causal sufficiency}, described in Section \ref{acyclicvscyclic}.
The other two algorithms, FCI and CCI, relax this sufficiency assumption and account for the possibility of latent confounders. In practice, this means that the output of these algorithms will often be more conservative than that of the CCD or PC algorithm, as statistical dependence between two variables, conditional on all other possible subsets of observed variables, may be induced by the presence of an unobserved confounder. However, similar to how the PC algorithm can use collider structures to determine the direction of causal relations, these algorithms can sometimes use particular patterns of multivariate dependencies to identify that some statistical dependence relations must be induced by direct or ancestral causal relations; for more detail on the general principles of causal discovery without sufficiency, we refer readers to \cite{spirtes2000}. That these algorithms do not rely on causal sufficiency makes them potentially more promising for psychological research, where the assumption of unobserved confounding is rarely warranted \citep{rohrer_thinking_2018}.
Another closely related concept to sufficiency is \textit{selection bias}. Selection bias occurs when one conditions on an unobserved collider, for example, by selectively excluding a particular subgroup of samples, which leads to a similar problem of inducing spurious causal relations \citep{versteeg_local_2022, haslbeck_sum_2022}. While the CCD algorithm assumes no selection bias, the FCI and CCI algorithms account for the possibility of selection bias. Although the FCI algorithm was initially designed for acyclic causal structures \citep{spirtes_causal_1995}, it has been shown to perform well in cyclic settings under a more generalized Markov condition, while ruling out the presence of selection bias \citep{mooij_classen2020}. Thus, we consider FCI as one of the cyclic causal discovery algorithms, but note that the suggested conditions by \cite{mooij_classen2020} hold under limited situations excluding linear and discrete cases. \autoref{tab:1} summarizes the set of assumptions made by each of the algorithms, including the fundamental assumptions of global Markov condition, faithfulness, and acyclicity, as well as those related to the functional forms of causal relations and error terms.
% All three algorithms can learn cyclic causal structures from observational data but under different assumptions. The CCD algorithm assumes \textit{causal sufficiency}, which means that all common causes of variables involved have been measured, and hence no unobserved confounding exists. The other two algorithms, FCI and CCI, relax this assumption and account for the possibility of latent confounders. We believe that this makes them potentially promising for psychological research in which the assumption of unobserved confounding is rarely warranted. Note that the FCI algorithm was not initially designed for cyclic causal discovery, but \cite{mooij_classen2020} showed that it performs equally well in the cyclic settings, and thus is considered as one of the cyclic causal discovery algorithms. For the sake of simplicity, in the current paper, we exclude the possibility of selection bias. An overview of the assumptions made by each algorithm can be found in \autoref{tab:1}.
% Additionally, the CCD and CCI algorithms assume linear causal relationships, while the FCI algorithm can handle non-linear relationships.
% In this section, we provide a detailed description of one of the considered algorithms, CCD, and illustrate how we conduct the simulation study. The other two algorithms, FCI and CCI, work in a similar way, except that they do not require the assumption of \textit{no unobserved confounding} needed by the CCD algorithm.\footnote{The assumption that we have measured all common causes of variables involved.} (See \hyperref[algCCD]{Appendices} for the specifics of each of these three algorithms).
% except they do not assume that there are no unobserved latent confounders. See Appendices for the specifics of each of the three algorithms.
% In this paper, we compare the performance of three different constraint-based algorithms for cyclic models using a simulation study: \textit{cyclic causal discovery} (CCD) \citep{Richardson1996a}, \textit{fast causal inference} (FCI) \citep{mooij_classen2020}, and \textit{cyclic causal inference} (CCI) \citep{strobl2019}. In what follows, we first introduce the considered algorithms by describing the corresponding output and step-by-step tracing the algorithm. Then, we illustrate how we perform the simulation study, including the data generation process and evaluation metrics by which the performance is assessed. Given that all three algorithms work almost the same way, below we only provide a detailed description of the CCD algorithm\footnote{See Appendices for the specifics of each of the three algorithms.}.
% Table \ref{tab:1} summarizes the set of assumptions made by each of the algorithms.
\renewcommand{\tabularxcolumn}[1]{>{\centering\arraybackslash}p{#1}}
\renewcommand{\arraystretch}{1.3}
\begin{table}[!t]
\caption{Assumptions of cyclic causal discovery algorithms.}
\label{tab:1}
\begin{tabularx}{\textwidth}{p{5cm}*{3}{X}}
\toprule
& CCD & FCI & CCI \\
\midrule
Global Markov condition & \checkmark & \checkmark & \checkmark \\
Faithfulness & \checkmark & \checkmark & \checkmark \\
Acyclicity & \tikzxmark & \textendash $^a$ & \tikzxmark\\
Causal sufficiency & \checkmark & \tikzxmark & \tikzxmark \\
% Linearity & \checkmark & \tikzxmark & \checkmark \\
Absence of selection bias & \ \checkmark & \textendash $^a$ & \tikzxmark\\
% Independent errors & \checkmark & \checkmark & \checkmark\\
Linearity & \checkmark & \textendash $^a$ & \checkmark \\
\bottomrule
\end{tabularx}
\bigskip
\small\textit{Note}. $^a$ FCI was originally designed to infer causal structure in the presence of selection bias assuming acyclicity, but a recent study has proposed that it performs comparably well in the cyclic settings under certain conditions \citep{mooij_classen2020}. Specifically, these conditions require that selection bias is \textit{absent} and variables share \textit{non-linear} relations.
\end{table}
In Section \ref{primer}, we learned that constraint-based causal discovery works by testing for patterns of conditional (in)dependence, resulting in an equivalence class of causal graphs that convey the same statistical (in)dependencies. The PC algorithm was used to illustrate this process, which involves two steps and yields a CPDAG as the output. For cyclic causal graphs, constraint-based algorithms follow a similar approach, but with some caveats. First, the specific steps taken are more complex than those of the PC algorithm. Second, the output of these algorithms is not a CPDAG, but a different representation of equivalent cyclic graphs known as a \textit{partial ancestral graph} (PAG).
% In the subsequent sections, we will elaborate on each cyclic causal discovery algorithm, beginning with an overview of its output, the PAG, and how it can be interpreted. We will then describe the specific steps taken to estimate the PAG.
% The output of such a causal discovery algorithm is an equivalence class of causal graphs that encode the same statistical (in)dependence patterns. We illustrated this process using the PC algorithm, which consisted of two steps, resulting in a CPDAG.
% For cyclic causal graphs, constraint-based algorithms follow a similar approach, but with some caveats. First, the specific steps taken are more involved than those of the PC algorithm. Second, the output of these algorithms is not a CPDAG but instead a different representation of equivalent cyclic graphs known as a \textit{partial ancestral graph} (PAG). The subsequent sections will elaborate on each algorithm, beginning with an overview of its output, the PAG, and how it can be interpreted. We will then describe the specific steps taken to estimate the PAG.
% In what follows, we will describe each algorithm, first explaining its output, the PAG, along with its interpretation, and then illustrate the specific steps taken to estimate the PAG.
\subsection{CCD Algorithm}
The CCD algorithm is considered relatively simple among the three algorithms, as it assumes that there is no unobserved latent confounding (i.e., \textit{causal sufficiency}). The basic operation of CCD is summarized in \figref[]{4}. The fundamental principles on which the CCD algorithm operates are similar to those of the PC algorithm, as illustrated in \figref[]{3}. However, the output graph for CCD is a PAG, which represents the common features of equivalent directed cyclic graphs (DCGs).
In what follows, we will explain how to interpret a PAG using the example PAG shown in \figref[]{4}.
% The CCD algorithm is considered relatively simple among the three algorithms, as it assumes that there is no unobserved latent confounding (i.e., \textit{causal sufficiency}). Thus, it can be seen as the base causal discovery method for cyclic graphs and the other two algorithms are essentially built on top of the base with additional steps.
% In what follows, we introduce the type of output generated by the CCD algorithm in detail and trace the algorithm step-by-step using an example.
% In what follows, we explain how the CCD algorithm works by introducing the type of output generated by the algorithm and step-by-step tracing the algorithm using an example.
% CCD is a relatively simple causal discovery algorithm for cyclic models, as it assumes that there is no unobserved latent variables (i.e., \textit{causal sufficiency}). CCD algorithm can estimate the cyclic causal structure up to the equivalence class of graphs with the asymptotic correctness \citep{Richardson1996a}.
% \vspace{5mm}
\begin{figure}[!t]
\centering
\caption{Summary of CCD algorithm operation.}
\includegraphics[width=0.9\textwidth]{figures/Fig4.pdf}
\vspace{3mm}
\caption*{\small{\textit{Note.} Given the observed statistical independencies, CCD constructs a partial ancestral graph (PAG), which represents the \textit{ancestral} features that are common to every directed cyclic graph (DCG) in a Markov equivalence class. In this particular example, the Markov equivalence class consists of two different DCGs.}}
\label{fig:4}
\end{figure}
\subsubsection{CCD Output Representation: Partial Ancestral Graph (PAG)} \label{CCDPAG}
% Previously in section \ref{primer}, we showed how a constraint-based method estimated a Markov equivalence class of DAGs. Here, we are interested in directed \textit{cyclic} graphs (DCG).
As was the case for DAGs, there typically exist multiple DCGs that imply the same statistical independencies and so are statistically indistinguishable from one another. To represent a set of equivalent DCGs, the CCD algorithm uses a PAG that characterizes the common features shared by all equivalent DCGs, $Equiv(\mathcal{G})$. As discussed in Section \ref{primer}, the causal semantics of edges in PAGs become more complicated due to the presence of cyclic relations.
In a CPDAG, an edge represents a direct causal relation between the corresponding vertices, while no edge implies its absence. In a PAG, the absence of an edge still indicates the absence of a direct causal relation, but the presence of an edge indicates causal \textit{ancestry}, with $A \tailarrow B$ meaning that $A$ is an \textit{ancestor} of $B$.
In the causal graphs we have looked at so far, the types of edges are limited to directed ($\tailarrow$) and undirected edges (\textemdash). In PAGs, however, three different types of edge-endpoints $\{ \circ, > , - \}$ are utilized to represent the ancestral relations in $Equiv(\mathcal{G})$. The interpretation of each edge-endpoint in a PAG is as follows:\footnote{In the description of the semantics for PAGs \citep{Richardson1996a}, $*$ is used as a \textit{meta-symbol}, indicating one of the three possible edge-endpoints. For instance, $A \tailstar B$ indicates any of the following edges: $A$ \textemdash\,$B$, $A\tailarrow B$, or $A \tailcirc B$.}
% directed edges in PAGs denote causal \textit{ancestry} (i.e., $A \tailarrow B$ means $A$ is an \textit{ancestor} of $B$), while the absence of an edge in PAGs represents the absence of a direct causal relation between the corresponding vertices. In the causal graphs we have looked at so far, the types of edges are limited to directed ($\tailarrow$) and undirected edges (\textemdash). In PAGs, however, three different types of edge-endpoints $\{ \circ, > , - \}$ are utilized to represent the ancestral relations in $Equiv(\mathcal{G})$. The interpretation of each edge-endpoint in a PAG is as follows:\footnote{In the description of the semantics for PAGs \citep{Richardson1996a}, $*$ is used as a \textit{meta-symbol}, indicating one of the three possible edge-endpoints. For instance, $A \tailstar B$ indicates any of the following edges: $A$ \textemdash\,$B$, $A\tailarrow B$, or $A \tailcirc B$.}
% As was the case with the DAGs shown in Section \ref{primer}, there typically exist multiple directed cyclic graphs (DCGs) that imply the same statistical independencies and so are statistically indistinguishable from one another. To represent a set of equivalent DCGs, the CCD algorithm uses a PAG that characterizes the common features shared by all equivalent DCGs, $Equiv(\mathcal{G})$. As discussed in Section \ref{primer}, the causal semantics of edges in PAGs become more complicated due to the presence of cyclic relations; directed edges in PAGs denote causal \textit{ancestry} (i.e., $A \tailarrow B$ means $A$ is an \textit{ancestor} of $B$). In the causal graphs we have looked at so far, the types of edges are limited to directed ($\tailarrow$) and undirected edges (\textemdash). In PAGs, however, three different types of edge-endpoints $\{ \circ, > , - \}$ are utilized to represent the ancestral relations in $Equiv(\mathcal{G})$. The interpretation of each edge-endpoint in a PAG is as follows:\footnote{In the description of the semantics for PAGs \citep{Richardson1996a}, $*$ is used as a \textit{meta-symbol}, indicating one of the three possible edge-endpoints. For instance, $A \tailstar B$ indicates any of the following edges: $A$ \textemdash\,$B$, $A\tailarrow B$, or $A \tailcirc B$.}
% the possibility of cyclic relations makes the causal semantics of edges in PAGs more complicated; directed edges in PAGs denote causal \textit{ancestry} (i.e., $A \tailarrow B$ means $A$ is an \textit{ancestor} of $B$). In the causal graphs we have looked at so far, the types of edges are limited to directed ($\tailarrow$) and undirected edges (\textemdash). In PAGs, however, three different types of edge-endpoints $\{ \circ, > , - \}$ are utilized to represent the ancestral relations in $Equiv(\mathcal{G})$. The interpretation of each edge-endpoint in a PAG is as follows:\footnote{In the description of the semantics for PAGs \citep{Richardson1996a}, $*$ is used as a \textit{meta-symbol} indicating one of the three possible edge-endpoints. For instance, $A \tailstar B$ indicates any of the following edges: $A$ \textemdash\,$B$, $A\tailarrow B$, or $A \tailcirc B$.}
\begin{enumerate}[nolistsep]
\item $A \stararrow B$ is interpreted as $B$ is \textit{not an ancestor} of $A$ in every graph in $Equiv(\mathcal{G})$.
\item $A \startail B$ is interpreted as $B$ \textit{is an ancestor} of $A$ in every graph in $Equiv(\mathcal{G})$.
\item $A \starcirc B$ is interpreted as the ancestral relation of B with regard to A is undetermined or varies across graphs in $Equiv(\mathcal{G})$.
\end{enumerate}
\noindent The PAG output of the CCD algorithm can also include a solid or dotted underlining to provide additional information about the causal relations in triplets. If there is a solid underlining $A \starstar \underline{B} \starstar C$, it indicates that $B$ is an ancestor of (at least one of) $A$ or $C$ in every graph in $Equiv(\mathcal{G})$. If there is a dotted underlining added to a collider structure such as $A \tailarrow \udot{B} \arrowtail C$, it indicates that $B$ is \textit{not} a descendant of a common child of $A$ and $C$ in every graph in $Equiv(\mathcal{G})$. For example, from the PAG shown in \figref[]{4}, we can read off the following:
\begin{enumerate}[nolistsep]
\item $X_2$ and $X_3$ are not ancestors of $X_1$ and $X_4$ in every graph in $Equiv(\mathcal{G})$.
\item $X_1$ and $X_4$ are both ancestors of $X_2$ and $X_3$ in every graph in $Equiv(\mathcal{G})$.
\item $X_2$ is an ancestor of $X_3$ and $X_3$ is an ancestor of $X_2$ in every graph in $Equiv(\mathcal{G})$, indicating the presence of a cyclic relationship between them.
\item $X_2$ and $X_3$ are not descendants of a common child of $X_1$ and $X_4$ in every graph in $Equiv(\mathcal{G})$.
This means that it is not possible for both $X_1 \tailarrow X_2$ and $X_4 \tailarrow X_2$, or both $X_1 \tailarrow X_3$ and $X_4 \tailarrow X_3$ to coexist in any graph in $Equiv(\mathcal{G})$. For instance, if $X_1$ were to be a parent of $X_2$, and considering that $X_2$ and $X_3$ are ancestors/descendants of each other, $X_4$ cannot also be a parent of $X_2$; otherwise this condition would be violated.
% This means that no graph in $Equiv(\mathcal{G})$ contains both $X_1 \tailarrow X_2$ and $X_4 \tailarrow X_2$, or $X_1 \tailarrow X_3$ and $X_4 \tailarrow X_3$ at the same time. In other words, if $X_1$ were to be a parent of $X_2$, and considering that $X_2$ and $X_3$ are ancestors/descendants of each other, $X_4$ cannot also be a parent of $X_2$; otherwise this condition would be violated.
\end{enumerate}
\noindent Given the causal ancestral relations represented by the example PAG described above, we can correspondingly derive the Markov-equivalent DCGs, which are shown in the right-hand side of \figref[]{4}.
% In the following definition that provides the semantics for PAGs \citep{Richardson1996a}, $*$ is used as a \textit{meta-symbol} indicating one of the three possible edge-endpoints. For instance, $A \tailstar B$ indicates any of the following edges: $A$ \textemdash\,$B$, $A\tailarrow B$, or $A \tailcirc B$.
% Previously in Section \ref{primer}, we showed how a constraint-based method estimates a Markov equivalence class of DAGs. Here, we aim to find directed \textit{cyclic} graphs (DCG) and for that, we employ a \textit{partial ancestral graph} (PAG) to represent the Markov equivalence class of DCGs \citep{richardson1996}. Like a CPDAG, a PAG provides only \textit{partial} information on directions of the relations (i.e., some edges remain undirected) as there are many equivalent graphs exist based on the found statistical patterns.
% To represent the equivalent set of cyclic graphs, we need a richer formalism than typical directed graphs. Accordingly, a PAG consists of a set of vertices, edges, and \textit{edge-endpoints} that are drawn from $\{ \circ, -, >, < \}$. $*$ is a so-called \textit{meta-symbol} that indicates one of the four possible edge-endpoints. In addition, a pair of edge-endpoints can be connected by either \textit{solid underlining} or \textit{dotted underlining}.
% \begin{definition} [PAG] \label{def: def2}
% \textup{$\Psi$ is a PAG for a directed cyclic graph $\mathcal{G}$ iff:}
% \begin{enumerate}[nolistsep]
% \item \textup{There is an edge between $A$ and $B$ in $\Psi$ iff $A$ and $B$ are d-connected in $\mathcal{G}$ given any subset of the remaining vertices.}
% \item \textup{If there is an edge $A \tailstar B$ in $\Psi$, then $A$ is an ancestor of $B$ in every graph in an equivalence class, $Equiv(\mathcal{G})$.}
% \item \textup{If there is an edge $A \stararrow B$ in $\Psi$, then $B$ is \textit{not} an ancestor of $A$ in every graph in $Equiv(\mathcal{G})$.}
% \item \textup{If there is a solid underlining $A \starstar \underline{B} \starstar C$ in $\Psi$, then $B$ is an ancestor of (at least one of) $A$ or $C$ in every graph in $Equiv(\mathcal{G})$.}
% \item \textup{If there is a collider $A \tailarrow B \arrowtail C$, a dotted underlining is added $A \tailarrow \udot{B} \arrowtail C$ iff $B$ is \textit{not} a descendant of a common child of $A$ and $C$ in every graph in $Equiv(\mathcal{G})$.}
% % \udot{\textgreater B \textless}
% \item \textup{Any edge-endpoint not marked in one of the above ways is left with a circle $\circstar$.}
% \end{enumerate}
% \end{definition}
\subsubsection{Steps of CCD Algorithm}
In this section, we will provide a description of the CCD algorithm, which is the first theoretically well-founded constraint-based method that can be applied in a cyclic setting. The FCI and CCI algorithms share essentially the same structure, but differ in specific orientation rules in the latter part of the algorithm. As such, here we provide a high-level overview of the CCD algorithm in so far as it shares features with the other two more complex algorithms. For this reason, we omit some of the more technical details for simplicity, and refer readers to \hyperref[algCCD]{Appendix A} for a more in-depth and comprehensive description of the CCD algorithm.
% In this section, we will provide a thorough demonstration of the CCD algorithm, which is the first theoretically well-founded constraint-based method that can be applied in a cyclic setting. While FCI and CCI share a similar procedure, they differ in their specific orientation rules. Therefore, we will not cover the steps of FCI and CCI as extensively as we do for CCD but rather highlight any changes or additions made by these algorithms.
% Note that some of the steps described below can be technical. While it is helpful to have a good understanding of the whole CCD procedure, it is not essential for comprehending the rest of the paper. Readers may choose to skim through some of the steps, particularly those that outline specific orientation rules of CCD (e.g., step 3, step 5, and step 6).\footnote{For a more concise and technical description of the CCD algorithm, please refer to \hyperref[algCCD]{Appendix A}.}
% we will provide a detailed illustration of each step in the CCD algorithm, given that CCD is the first theoretically well-founded constraint-based algorithm that can be applied in a cyclic setting. Although FCI and CCI use a similar approach to CCD, they differ in their specific orientation rules. Therefore, we will not cover the steps of FCI and CCI as extensively as we do for CCD, but instead focus on highlighting what has been added or changed in the other two algorithms.\footnote{
% Note that the steps described below can be quite technical. While it is helpful to have a good understanding of the whole CCD procedure, it is not essential for comprehending the rest of the paper. Thus, readers can opt to skim through some of the steps, particularly those that outline specific orientation rules for CCD (e.g., step 3, step 5, and step 6).
The CCD algorithm consists of six steps. We illustrate each step using the example DCG from \figref[(a)]{5}, which is the same example DCG that we previously introduced in \figref[(b)]{1}. The algorithm starts with a fully-connected PAG with circle endpoints, as shown in \figref[(b)]{5}, which implies that the direction has not been determined yet. As it proceeds, (some) circles will be replaced by either an arrow head or a tail.
\textbf{Step 1.} This step is identical to the first step of the PC algorithm described above in Section \ref{primer}; the algorithm tests whether two vertices, $A$ and $B$, are statistically independent given any subset of the remaining variables. When such a subset is found, the algorithm removes $A \circirc B$. Since $X_1$ and $X_4$ are marginally independent in our example DCG, $X_1 \circirc X_4$ is removed, resulting in \figref[(c)]{5}.\footnote{The resulting graph is referred to as an \textit{ancestral} skeleton --- an undirected graph of ancestral relations implied by the underlying structure.}
% \textbf{Step 1.} The \textit{ancestral} skeleton --- an undirected graph of ancestral relations implied by the underlying structure --- is estimated based on conditional independencies. This step is identical to the first step of the PC algorithm described above in Section \ref{primer}; the algorithm tests whether two vertices, $A$ and $B$, are statistically independent given any subset ($S$) of the remaining variables, and when such a subset is found (i.e., $A$ and $B$ are \textit{d-separated} given $S$), then the algorithm removes $A \circirc B$ and records $S = \mathbf{Sepset} \langle A, B \rangle = \mathbf{Sepset} \langle B, A \rangle$. A $\mathbf{Sepset}$, short for \textit{separation set}, is a set of variables that d-separates two other sets of variables in a graph. Since $X_1 \indep X_4 \mid \varnothing$ in our example DCG, $X_1 \circirc X_4$ is removed and $\mathbf{Sepset} \langle X_1, X_4 \rangle = \mathbf{Sepset} \langle X_4, X_1 \rangle = \varnothing$ is recorded, which results in \figref[(c)]{5}.
\textbf{Step 2.} Again, the algorithm proceeds in a similar manner to the PC algorithm by searching for collider structures in triplets $A \starstar B \starstar C$. Once the algorithm identifies $B$ as a collider, the triplet is oriented as $A \tailarrow B \arrowtail C$. Given that $X_2$ and $X_3$ are colliders in our example, $X_1 \circirc X_2 \circirc X_4$ and $X_1 \circirc X_3 \circirc X_4$ are oriented respectively as $X_1 \tailarrow X_2 \arrowtail X_4$ and $X_1 \tailarrow X_3 \arrowtail X_4$, resulting in \figref[(d)]{5}.
% \textbf{Step 2.} Again, the algorithm proceeds in a similar manner to the PC algorithm by searching for collider structures. In technical terms, the algorithm looks for triplets $A \starstar B \starstar C$ where $A$ and $B$ are not directly connected ($A \indep B \mid S$) but $B \notin \mathbf{Sepset}\langle A, C \rangle$, meaning that $A \nindep C \mid B$. Once the algorithm identifies $B$ as a collider, the triplet is oriented as $A \tailarrow B \arrowtail C$. Given that $X_2 \notin \mathbf{Sepset} \langle X_1, X_4 \rangle$ and $X_3 \notin \mathbf{Sepset} \langle X_1, X_4 \rangle$ in our example, $X_1 \circirc X_2 \circirc X_4$ and $X_1 \circirc X_3 \circirc X_4$ are oriented respectively as $X_1 \tailarrow X_2 \arrowtail X_4$ and $X_1 \tailarrow X_3 \arrowtail X_4$, resulting in \figref[(d)]{5}.
\textbf{Step 3.} The algorithm then checks for a different pattern of d-separating relations in triplets to perform additional orientations. It seeks adjacent variables $A \starstar B$ for which it can find a third variable, $C$, which a) is not directly connected to either $A$ or $B$, and b) is not d-separated from $B$ given $A$. In our example, no such structures can be found, since $X_1$ and $X_4$ are the only variables not adjacent at this point. Hence, no further orientations are performed in step 3.
% \textbf{Step 3.} The algorithm then checks for additional d-separating relations in triplets to perform additional orientations.
% If the algorithm finds a triplet $\langle A, B, C \rangle$ such that:\\
% \textit{(i)} $A$ is not adjacent to $B$ or $C$,
% \textit{(ii)} $B$ and $C$ are adjacent, and
% \textit{(iii)} $B \notin \mathbf{Sepset}\langle A, C \rangle$,\\
% then $B \starstar C$ is oriented as $B \arrowtail C$. This rule is based on the inference that when there is a d-connecting path from $A$ to $B$ given $\mathbf{Sepset}\langle A, C \rangle$, and there is no d-connecting path from $A$ to $C$ given $\mathbf{Sepset}\langle A, C \rangle$, then $B$ cannot be an ancestor of $C$. If it were, the d-connecting path between $A$ and $B$ could be \textit{extended} to a d-connecting path between $A$ and $C$ given $\mathbf{Sepset}\langle A, C \rangle$. As we find no such d-separating relations in our example, no further orientations are performed in step 3.
% This rule infers from the existence of a d-connecting path from $A$ to $B$ given $\mathbf{Sepset}\langle A, C \rangle$ and the absence of a d-connecting path from $A$ to $C$ given $\mathbf{Sepset}\langle A, C \rangle$ when $B$ is not an ancestor of $C$. In other words, if $B$ were to be an ancestor of $C$, then the d-connecting path between $A$ and $B$ could be \textit{extended} to a d-connecting path between $A$ and $C$. As we have found no additional d-separating relations in our example, no further orientations are performed in step 3.
% such triplets are found in our example, no further orientations are performed in step 3.
% Check for additional d-separating relations in each triplet $\langle A, B, C \rangle$ such that:\\
% \textit{(i)} $A$ is not adjacent to $B$ or $C$,
% \textit{(ii)} $B$ and $C$ are adjacent, and
% \textit{(iii)} $B \notin \mathbf{Sepset}\langle A, C \rangle$.
% If such triplets exist, orient $B \starstar C$ as $B \leftarrow C$. As no additional d-separating relations are found in our example, no further orientations are performed in step 3.
% % such triplets are found in our example, no further orientations are performed in step 3.
\textbf{Step 4.} In this step, the algorithm tries to refine the causal graph by introducing underlinings to collider structures. To do this, it searches for a $\mathbf{Supset}$ (super separation set), a set of variables that d-separate two endpoint vertices in a collider structure when conditioning on the collider. For each collider structure $A \tailarrow B \arrowtail C$, the algorithm examines the presence of any $\mathbf{Supsets}$, and if one is found, a dotted-underlining $A \tailarrow \udot{B} \arrowtail C$ is added. Since $X_2$ and $X_3$ are identified as a $\mathbf{Supset}$ in our example, they are dotted-underlined as $X_1 \tailarrow \udot{$X_2$} \arrowtail X_4$ and $X_1 \tailarrow \udot{$X_3$} \arrowtail X_4$, resulting in \figref[(e)]{5}.\footnote{Recall that underlinings in PAGs can convey additional information on causal relations in triplets, as mentioned in Section \ref{CCDPAG}. In this example, $X_1 \tailarrow \udot{$X_2$} \arrowtail X_4$ and $X_1 \tailarrow \udot{$X_3$} \arrowtail X_4$ together indicate that $X_2$ and $X_3$ are not descendants of a common child of $X_1$ and $X_4$.}
% \textbf{Step 4.} In this step, the algorithm tries to refine the causal graph by adding underlinings to collider structures. Recall that underlinings in PAGs can convey additional information on causal relations in triplets, as mentioned in Section \ref{CCDPAG}. Exploring the possibility of adding underlinings, the algorithm searches for $\mathbf{Supsets}$, which stands for \textit{super separation sets}. They are a set of variables that d-separates two endpoint vertices in a collider structure when conditioning on the collider. Hence, for each collider structure $A \tailarrow B \arrowtail C$, the algorithm checks if there is any $\mathbf{Supset}$, that is, a set $T$ including $B$ that d-separates $A$ and $C$. When such exists, $T = \mathbf{Supset} \langle A, B, C \rangle$ is recorded and a dotted-underlining $A \tailarrow \udot{B} \arrowtail C$ is added. Since $X_1 \indep X_4 \mid \{X_2, X_3\}$ in our example, $\mathbf{Supset}\langle X_1, X_2, X_4 \rangle = \mathbf{Supset}\langle X_1, X_3, X_4 \rangle = \{X_2, X_3\}$ is recorded and each collider is dotted-underlined as $X_1 \tailarrow \udot{$X_2$} \arrowtail X_4$ and $X_1 \tailarrow \udot{$X_3$} \arrowtail X_4$, resulting in \figref[(e)]{5}. This indicates that $X_2$ and $X_3$ are not descendants of a common child of $X_1$ and $X_4$.
\begin{figure}[!t]
\centering
\caption{Trace of CCD algorithm.}
\includegraphics[width=.7\textwidth]{figures/Fig5.pdf}
\vspace{3mm}
\caption*{\small{\textit{Note.} (a) shows the true directed cyclic graph. (b) shows the fully-connected PAG, which is the starting point of the algorithm. (c) shows the \textit{ancestral} skeleton (i.e., an undirected version of the PAG) estimated in step 1. (d) shows the state of the PAG after step 2, where some of the edges are oriented given the identified colliders. (e) shows the state of the PAG after step 4, where the \textit{Supsets} are identified and the corresponding colliders are dotted-underlined. (f) shows the final state of the PAG after step 5, where an additional edge between $X_2$ and $X_3$ is oriented.}}
\label{fig:5}
\end{figure}
\textbf{Step 5.} The last two steps of CCD concern additional orientation of the remaining undirected edges by examining $\mathbf{Supsets}$ in the context of quadruplets $\langle A, B, C, D \rangle$. In this step, the algorithm identifies quadruplets where $B$ and $D$ serve as colliders for $A$ and $C$, while each being part of the $\mathbf{Supsets}$ in triplets involving $A$ and $C$. When such structure exists, and $B$ and $D$ are connected, the algorithm orients that edge $B \starstar D$ as $B \startail D$.
In our example, a quadruplet matching this criteria is found, resulting in the orientation of $X_2 \circirc X_3$ as $X_2$\textemdash $X_3$, as depicted in \figref[(f)]{5}.\footnote{This indicates that $X_2$ and $X_3$ are ancestors of each other, implying a cyclic causal relationship between them.}
% \textbf{Step 5.} The last two steps of CCD concern additional orientation of the remaining undirected edges by examining $\mathbf{Supsets}$ that are found in the previous step 4. Now we expand and look for quadruplets --- four ordered vertices $\langle A, B, C, D \rangle$ --- where:\\
% \textit{(i)} $A \tailarrow \udot{B} \arrowtail C$, \textit{(ii)} $A \tailarrow D \arrowtail C$ or $A \tailarrow \udot{D} \arrowtail C$, and
% \textit{(iii)} $B$ and $D$ are adjacent.
% If $D \in \mathbf{Supset} \langle A, B, C \rangle$ in such quadruplets, orient $B \starstar D$ as $B \startail D$, or else orient $B \starstar D$ as $B \tailarrow D$. This essentially makes use of the fact that $\mathbf{Supset} \langle A, B, C \rangle$ is not a descendant of a common child of $A$ and $C$; if $D$ belongs to the $\mathbf{Supset} \langle A, B, C \rangle$, then $D$ is not a descendant of $B$, the common child of $A$ and $C$, and hence is oriented as $B \startail D$. In our example, there is such a quadruplet; \textit{(i)} $X_1 \tailarrow \udot{$X_2$} \arrowtail X_4$, \textit{(ii)} $X_1 \tailarrow \udot{$X_3$} \arrowtail X_4$, and \textit{(iii)} $X_2$ and $X_3$ are adjacent. Since $X_2 \in \mathbf{Supset} \langle X_1, X_3, X_4 \rangle$ and $X_3 \in \mathbf{Supset} \langle X_1, X_2, X_4 \rangle$, $X_2 \circirc X_3$ is oriented as $X_2 \tailcirc X_3$, and $X_2 \tailcirc X_3$ is subsequently oriented as $X_2$\textemdash $X_3$, resulting in \figref[(f)]{5}. This indicates that $X_2$ and $X_3$ are ancestors of each other, implying a cyclic causal relationship between them.
\textbf{Step 6.} In the final step, the algorithm searches for a different pattern in quadruplets where $B$ remains a collider and is part of a $\mathbf{Supset}$, but $D$ is not adjacent to both $A$ and $C$. If, in this case, $A$ and $D$ are d-connected given the $\mathbf{Supset}$ of $\langle A, B, C \rangle$, the algorithm orients the edge $B \starstar D$ as $B \tailarrow D$. In our example, no such quadruplets exist, so no additional orientation occurs. This ultimately leads to \figref[(f)]{5} as the final PAG. With the final PAG in hand, we can determine the Markov equivalence class of DCGs by reading off all the ancestral relationships represented by the PAG, as discussed in Section \ref{CCDPAG}.
% \textbf{Step 6.} In the final step, the algorithm continues to make additional inferences about ancestral relations in quadruplets by inspecting $\mathbf{Supsets}$. This time, the algorithm searches for quadruplets $\langle A, B, C, D \rangle$, where $A \tailarrow \udot{B} \arrowtail C$ while $D$ is not adjacent to either $A$ or $C$. If $A$ and $C$ are d-connected given $\mathbf{Supset} \langle A, B, C \rangle \cup {D}$, then orient $B \starcirc D$ as $B \tailarrow D$. This works based on the following inference; given $A$ and $C$ are d-separated by $\mathbf{Supset} \langle A, B, C \rangle$ while $B \in \mathbf{Supset} \langle A, B, C \rangle$, and if $D$ is an ancestor of $B$, then it means that $A$ and $C$ are d-separated by $\mathbf{Supset} \langle A, B, C \rangle \cup {D}$. In other words, if $A$ and $C$ are d-connected given $\mathbf{Supset} \langle A, B, C \rangle \cup {D}$, then $D$ is not an ancestor of $B$, and subsequently it can be oriented as $B \tailarrow D$. In our example, no such quadruplets exist.
% The final PAG is shown in \figref[(f)]{5}, from which we can determine the Markov equivalence class of DCGs. To do so, we first need to identify all the ancestral relationships that are represented in the PAG, such as $X_1$ and $X_4$ being ancestors of $X_2$ and $X_3$, and $X_2$ and $X_3$ being ancestors of each other. Furthermore, the dotted underlinings on $X_2$ and $X_3$ provide additional information that allows us to rule out cases where $X_1$ and $X_4$ are ancestors of both $X_2$ and $X_3$ simultaneously. By taking into account all the available information, we can derive two distinct DCGs that correspond to the given structural relationships, as depicted in the right-hand side of \figref{4}.
\subsection{FCI Algorithm}
The FCI algorithm, originally proposed by \cite{spirtes_causal_1995}, is a constraint-based causal discovery method for directed acyclic graphs (DAGs), which takes into account the presence of latent confounding and possible selection bias. Recently, \citet{mooij_classen2020} demonstrated that the FCI algorithm can also be applied to cyclic causal discovery in the presence of latent confounding under more general faithfulness and Markov conditions, provided that the causal relationships are \textit{non-linear}. For details on these conditions, we refer readers to \citet{forre_markov_2017}.
% In the following, we will outline the basic principles of the FCI algorithm and discuss how its output can be interpreted.
% The FCI algorithm was originally proposed by \cite{spirtes2000} as a method for the constraint-based causal discovery of DAGs, which accounts for the presence of latent confounding and possible selection bias. Recently, \citet{mooij_classen2020} established that the FCI algorithm can also be used as a method for cyclic causal discovery in the presence of latent confounding under more general faithfulness and Markov conditions, provided that the causal relations are \textit{non-linear}. For details on these conditions, we refer readers to \citet{forre_markov_2017}. In the following, we will describe the basic principles of the FCI algorithm and how its output can be interpreted.
% The FCI algorithm is explicitly designed to deal with latent confounding, which commonly exists in psychological research. It was initially designed for learning acyclic causal structures including latent confounders, but \cite{mooij_classen2020} showed that it can be applied to the cyclic case under the more general version of the faithfulness and Markov condition.\footnote{See \cite{forre_markov_2017} for details of the conditions.}
\subsubsection{FCI Output Representation: Partial Ancestral Graph (PAG)}\label{fcipag}
The FCI algorithm, like the CCD algorithm, aims to identify the underlying causal graph up to its Markov equivalence class and also employs a PAG to represent the common ancestral features among the equivalent graphs.
However, allowing latent confounders adds a complication; DCGs are not closed under marginalization over latent confounders, meaning that there exist infinitely many DCGs of observed variables ($O$) and latent confounders ($L$) that entail the same set of independencies \citep{Richardson2002}. This problem arises from the fact that we do not know how many latent confounders are involved, and the algorithm has to account for the possibilities of arbitrarily many latent confounders \citep{zhang_characterization_2005}.
To represent the presence of latent confounders in the infinite space of causal graphs, we need to introduce a new type of edge into the PAG representation. Specifically, we take bidirected edges ($\arrowarrow$) to reflect the presence of latent confounders, with $A \arrowarrow B$ denoting a confounding variable between $A$ and $B$.\footnote{The class of graphs that make use of bidirected edges ($\arrowarrow$) to represent latent confounding is called \textit{directed mixed graphs} (DMGs), which can be seen as extensions of DCGs \citep{richardson_markov_2003}.} The interpretation of edges in the PAGs estimated by the FCI algorithm is otherwise the same as in the CCD algorithm, except for the fact that in FCI PAGs, fully-connected vertices with circle endpoints ($\circirc$) may indicate a possible cyclic structure \citep{mooij_classen2020}.
% except that fully-connected vertices with circle endpoints ($\circirc$) may indicate a possible cyclic structure in FCI PAGs \citep{mooij_classen2020}.
% In what follows, we will go over how the FCI algorithm estimates a PAG given the same example DCG used for the CCD algorithm (\figref[(b)]{1}) and look at the interpretation of its output in more detail.
% In order to represent the presence of latent confounders in the finite space of causal graphs, we introduce a new class of graphs called \textit{directed mixed graphs} (DMG). DMGs can be seen as extensions of DCGs, which make use of additional bidirected edges ($\arrowarrow$) to represent latent confounding \citep{richardson_markov_2003}; $A \arrowarrow B$ is interpreted as the presence of latent confounders between $A$ and $B$. As shown in \figref[]{6}, a resulting PAG from the FCI algorithm thus amounts to a characterization of common features shared by an equivalence class of DMGs. The interpretation of edges in the PAGs output by the FCI algorithm is the same as described in Section \ref{CCDPAG}, except that in the PAGs from FCI, the fully-connected vertices with circle endpoints ($\circirc$) may indicate a possible cyclic structure \citep{mooij_classen2020}. For example, from the PAG shown in \figref[]{6}, we can read off that:
% \begin{enumerate}[nolistsep]
% \item $X_2$ and $X_3$ are not ancestors of $X_1$ and $X_4$ in every graph in $Equiv(\mathcal{G})$.
% \item $X_2$ and $X_3$ might be part of a cycle in $\mathcal{G}$.
% \end{enumerate}
% \noindent Notice that we are less certain about causal ancestry based on the PAG output by FCI compared to the PAG output by CCD, which is manifested by the increased number of circle ($\circ$) endpoints. This is typically the case as the FCI algorithm accounts for the possibility of latent confounding leading to a greater deal of uncertainty about ancestral relations. In the following, we briefly go over how the PAG in \figref[]{6} is estimated given the same example DCG used for the CCD algorithm (\figref[b]{1}).
% This problem can be solved by introducing a new class of graphs on the observed variables, called a \textit{maximal ancestral graph} (MAG) \citep{zhang_characterization_2005}. MAGs are a type of extended ancestral graphs consisting of a set of (observed) vertices and edges of the following kinds: $\rightarrow$, $-$, and $\leftrightarrow$. For any DCGs with latent confounding, there is a unique MAG over the observed variables alone that represents conditional independencies entailed by the original DCG. As in the other ancestral graphs, directed edges in MAGs denote causal ancestry; $A \stararrow B$ implies that $B$ is not an ancestor of $A$ and $A \startail B$ implies that $B$ is an ancestor of $A$. The additional type of edges, bi-directed edges $A \leftrightarrow B$, represent the presence of latent confounders between $A$ and $B$. As was the case with DCGs, multiple MAGs can imply the same set of conditional independencies. Such MAGs form a Markov equivalence class, which can be represented by a PAG. A resulting PAG from the FCI algorithm thus amounts to a characterization of common features shared by an equivalence class of MAGs, while a PAG from the CCD algorithm represents the common features shared by an equivalence class of DCGs. \figref{6} shows the PAG returned by FCI based on the same example DCG that has been used throughout the current paper (\figref[b]{1}) and the corresponding equivalence class of MAGs.
\subsubsection{Steps of FCI Algorithm}
We will walk through the steps of the FCI algorithm given the same example DCG used for the CCD algorithm (\figref[(b)]{1}). The algorithm consists of three main steps: skeleton discovery, collider structure orientation, and application of further orientation rules, where the first two steps are analogous to the CCD procedure.
As with the CCD algorithm, the FCI algorithm begins with a fully-connected PAG with $\circirc$ edges between every pair of variables (\figref[(a)]{7}). Then, it estimates the ancestral skeleton (\figref[(b)]{7}) by testing for statistical independence (see step 1 of CCD). Subsequently, the FCI algorithm searches for colliders in the same way as the CCD algorithm (see step 2 of CCD); when a collider ($B$) is identified, $A \starstar B \starstar C$ is oriented as $A \stararrow B \arrowstar C$, resulting in \figref[(c)]{7}. Lastly, the FCI algorithm executes a set of orientation rules to further orient the edges. For a complete list of orientation rules \citep{zhang_completeness_2008}, see \hyperref[algFCI]{Appendix B}. In this case, no additional endpoints are oriented in further steps, leaving \figref[(c)]{7} as the final resulting PAG.
Given this PAG, we can read off that:
\begin{enumerate}[nolistsep]
\item $X_2$ and $X_3$ are not ancestors of $X_1$ and $X_4$ in every graph in $Equiv(\mathcal{G})$.
\item $X_2$ and $X_3$ might be part of a cycle in $\mathcal{G}$ as they are fully-connected with circle endpoints.
\end{enumerate}
\noindent Notice that the PAG produced by FCI has more circle endpoints ($\circ$) than the PAG produced by CCD, which indicates a greater degree of uncertainty about causal ancestral relationships. This is because the FCI algorithm accounts for the possibility of latent confounding, leading to a larger space of possible graphs. Consequently, there are many more equivalent graphs that conform to the relational structure implied by the FCI PAG, resulting in a larger Markov equivalence class, as illustrated in the right-hand side of \figref[]{6}.
% \noindent Notice that we are now less certain about causal ancestry based on the PAG produced by FCI compared to the PAG produced by CCD, as manifested by the increased number of circle ($\circ$) endpoints. This is typically the case since the FCI algorithm accounts for the possibility of latent confounding, leading to a greater degree of uncertainty about ancestral relationships. Consequently, there are many more equivalent graphs that conform to the relational structure implied by the FCI PAG, resulting in a larger Markov equivalence class of graphs, as illustrated in \figref[]{6}.
% \textbf{Step 1.} Estimate the ancestral skeleton by testing for statistical independence given any subset of the remaining variables (see step 1 of CCD). This results in \figref[b]{7}.
% \textbf{Step 2.} Search for collider structures in the same way as the CCD algorithm. When a collider ($B$) is identified, orient $A \starstar B \starstar C$ as $A \stararrow B \arrowstar C$. This results in \figref[c]{7}.
% \textbf{Step 3.} Execute a set of orientation rules to further orient the edges.\footnote{See \hyperref[algFCI]{Appendix B} for a complete list of orientation rules \citep{zhang_completeness_2008}.} In this case, no additional endpoints are oriented, leaving \figref[c]{7} as the final resulting PAG.
% \noindent Compared to the PAG from CCD, the PAG from FCI contains more circle endpoints. This is due to the increased uncertainty about ancestral relationships resulted from relaxing the causal sufficiency assumption.
% This manifests that relaxing the causal sufficiency assumption comes at a cost; outputs of the FCI algorithm have a greater deal of uncertainty about ancestral relations, with PAGs typically containing more circle endpoints.
\begin{figure}[!t]
\centering
\caption{Trace of FCI algorithm.}
\includegraphics[width=.75\textwidth]{figures/Fig6.pdf}
\vspace{3mm}
\caption*{\small{\textit{Note.} (a) shows the fully-connected PAG, which is the starting point of the algorithm. (b) shows the \textit{ancestral} skeleton estimated in the same manner as the CCD algorithm. (c) shows the state of the PAG after the orientation step using the collider structures identified in step 2.}}
\label{fig:7}
\end{figure}
\begin{figure}[!t]
\centering
\caption{Summary of FCI algorithm operation.}
\includegraphics[width=0.9\textwidth]{figures/Fig7.pdf}
\vspace{3mm}
\caption*{\small{\textit{Note.} Given the observed statistical independencies, FCI constructs a partial ancestral graph (PAG) that captures the common \textit{ancestral} features of every directed mixed graph (DMG) in a Markov equivalence class.
The PAG estimated by FCI has more circle endpoints than the one estimated by CCD in \figref[]{4}, indicating a higher level of uncertainty about the causal relationships. This is because FCI accounts for the presence of latent confounders, which is represented by bidirected edges ($\arrowarrow$) in the graph. As a result, the Markov equivalence class tends to be relatively large.
}}
\label{fig:6}
\end{figure}
% \newgeometry{left=1.2in, right=1.2in, top=1.2in, bottom= 0.85in,
% bindingoffset=0.1in,
% heightrounded}
\subsection{CCI Algorithm}
The CCI algorithm combines features of both the CCD and FCI algorithms. It can identify cyclic causal structures, similar to CCD, and can handle latent confounding, similar to FCI \citep{strobl2019}. Employing a combined approach, however, comes at a cost; the algorithm requires more complex edge-endpoint inferences and lengthy orientation rules. For a detailed explanation of each step of the CCI algorithm, see \hyperref[algCCI]{Appendix C}.
% The CCI algorithm can be seen as a combination of the CCD and FCI algorithms. The algorithm is designed to handle cycles as well as latent confounding simultaneously. However, allowing both cycles and latent confounders comes at a cost; the CCI algorithm has to deal with even greater amount of uncertainty when it comes to learning causal relations and hence more difficult edge-endpoint inferences with more involved orientation rules (see \hyperref[algCCI]{Appendix C} for every step of the CCI algorithm in detail).
\subsubsection{CCI Output Representation: Partial Ancestral Graph (PAG)}
As with the other two algorithms, CCI generates a PAG that captures the common ancestral relationships among equivalent graphs. To account for the presence of latent confounding in the infinite causal graph space, as described in Section \ref{fcipag}, CCI also uses bidirected edges ($\arrowarrow$). Apart from that, the interpretation of the other types of edges in PAGs estimated by CCI is the same as that described in Section \ref{CCDPAG} for the CCD output. In the following section, we will briefly outline the steps of CCI with the same example DCG (from \figref[(b)]{1}) that has been used throughout the paper and examine the interpretation of the resulting PAG.
% Due to the issue regarding the infinite search space of causal graphs including latent confounders, as described in Section \ref{fcipag}, the CCI algorithm also uses bidirected edges ($\arrowarrow$) to represent the presence of latent confounding. The interpretation of the other edge-endpoints in PAGs estimated by CCI is the same as that described in Section \ref{CCDPAG} for the CCD output. In the following section, we will briefly outline the steps of CCI with the same example DCG (from \figref[b]{1}) that is used throughout the paper and examine the interpretation of the PAG estimated by the CCI algorithm.
% As with the other two algorithms, the CCI algorithm generates a PAG that represents the common ancestral features of equivalent graphs. To account for the issue of the infinite causal graph space that includes latent confounding, as described in Section \ref{fcipag}, the CCI algorithm also uses bidirected edges ($\arrowarrow$) to represent the presence of latent confounding. The interpretation of the other edge-endpoints in PAGs estimated by CCI is the same as that described in Section \ref{CCDPAG} for the CCD output. In what follows, we briefly illustrate the steps of CCI with the same example DCG (from \figref[b]{1}) that is used throughout the paper and examine the final PAG resulting from the CCI algorithm.
% As with the other two algorithms, the CCI algorithm outputs a PAG that represents common ancestral features of equivalent graphs. Due to the issue regarding the infinite search space of causal graphs including latent confounders as described in Section \ref{fcipag}, the CCI algorithm also uses directed mixed graphs (DMG) where bidirected edges ($\arrowarrow$) represent presence of latent confounding. \figref[]{8} summarizes the operation of the CCI algorithm, which is very similar to the one of the FCI algorithm. Each edge-endpoint in PAGs estimated by the CCI algorithm has the same interpretation as described in Section \ref{CCDPAG}. We can, therefore, infer the following given the example PAG in \figref[]{8}:
% \begin{enumerate}[nolistsep]
% \item $X_2$ and $X_3$ are not ancestors of $X_1$ and $X_4$ in every graph in $Equiv(\mathcal{G})$.
% \item $X_2$ is an ancestor of $X_3$ and $X_3$ is an ancestor of $X_2$ in every graph in $Equiv(\mathcal{G})$, which may imply presence of a cycle between them.
% \end{enumerate}
% \noindent Again, notice that the PAG estimated by the CCI algorithm contains more circle endpoints than the PAG estimated by the CCD algorithm, resulting from the fact that the algorithm takes into account the possibility of latent confounding.