The objective of this project is to conduct a survival analysis on marriage dissolution in the U.S. The dataset comprises various factors related to couples and their marriages, with divorce being the primary event of interest. Key variables include:
- Education of the Husband (heduc): Categorized into three groups: less than 12 years, 12 to 15 years, and 16 or more years.
- Ethnicity of the Husband (heblack): Indicates whether the husband is black or not.
- Mixed Ethnicity of the Couple (mixed): Indicates whether the couple has different ethnicities.
- Years of Marriage (years): Duration of marriage until divorce or censoring.
- Divorce Indicator (div): Coded as 1 for divorce and 0 for censoring (due to widowhood or interview).
The analysis will involve exploring the impact of these variables on the risk of divorce. Various statistical techniques such as: Kaplan-Meier estimators, Nelson-Aalen estimators, log-rank and Peto-Peto tests, and Cox proportional hazards models will be used to achieve these objectives. Additionally, data visualization techniques will be utilized to illustrate the survival curves and identify any outliers in the dataset.
The project directory contains the following files:
divorce_dataset.csv
: Contains the raw data used for analysis.survival_analysis_divorce.Rmd
: R Markdown file containing the R code for data analysis.survival_analysis_divorce.html
: HTML output of the R Markdown file, showcasing the analysis results.
To reproduce the analysis:
- Ensure you have R installed on your system.
- Clone this repository to your local machine.
- Open the survival_analysis_divorce.Rmd file in RStudio or any compatible IDE.
- Run the R code chunks in the file to perform the analysis.
- The output report (survival_analysis_divorce.html) can be generated by knit button.
This project is licensed under the MIT License - see the LICENSE
file for details.