Skip to content

Categorical Trait Analysis

Tianjing Zhao edited this page Aug 12, 2022 · 5 revisions

Categorical Trait Analysis

Categorical traits should be identified in the argument categorical_trait in build_model().

Categorical phenotypes should be coded as 1,2,3...

It is assumed that the ordered categories are determined by the unobserved continuous liability with thresholds. The MCMC samples of thresholds and liability are in the output folder.

Single-trait Linear Mixed Model (Genomic data)

Data Simulation

Below we will convert a continuous trait "y1" to a categorical trait by setting phenotypic values below -1.7 as category "1", values between -1.7 and 0.0 as category "2", and values larger than 0.0 as category "3". This is used to demonstrate categorical trait analysis using JWAS.

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

phenofile  = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])

#y1 -  categorical trait
category1_sel = phenotypes[!,:y1] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y1] .< 0.0
category3_sel = phenotypes[!,:y1] .> 0.0
phenotypes[category1_sel, :y1] .= 1
phenotypes[category2_sel, :y1] .= 2
phenotypes[category3_sel, :y1] .= 3

JWAS analysis

# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

# Step 2: Read data 
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");

# Step 3: Build Model Equations
model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
model = build_model(model_equation,categorical_trait=["y1"]);

# Step 4: Set Factors or Covariates
set_covariate(model,"x1");

# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

# Step 6: Run Analysis
out=runMCMC(model,phenotypes,chain_length=5000);

# Step 7: Check Accuruacy
results    = innerjoin(out["EBV_y1"], phenotypes, on = :ID) 
accuruacy  = cor(results[!,:EBV],results[!,:bv1])

Multiple categorical traits Linear Mixed Model (Genomic data)

Data Simulation

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

phenofile  = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])

#y1 -  categorical trait
category1_sel = phenotypes[!,:y1] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y1] .< 0.0
category3_sel = phenotypes[!,:y1] .> 0.0
phenotypes[category1_sel, :y1] .= 1
phenotypes[category2_sel, :y1] .= 2
phenotypes[category3_sel, :y1] .= 3

#y2 -  categorical trait
category1_sel = phenotypes[!,:y2] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y2] .< 0.0
category3_sel = phenotypes[!,:y2] .> 0.0
phenotypes[category1_sel, :y2] .= 1
phenotypes[category2_sel, :y2] .= 2
phenotypes[category3_sel, :y2] .= 3

JWAS analysis

# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

#Step 2: Read data
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");

# Step 3: Build Model Equations
model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
                  y2 = intercept + x1 + x2 + ID + genotypes"
model = build_model(model_equation,categorical_trait=["y1","y2"]);

# Step 4: Set Factors or Covariates
set_covariate(model,"x1");

# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

# Step 6: Run Analysis
out=runMCMC(model,phenotypes, chain_length=5000);

#Step 7: Check Accuruacy
results    = innerjoin(out["EBV_y1"], phenotypes, on = :ID)
accuruacy  = cor(results[!,:EBV],results[!,:bv1])  

results    = innerjoin(out["EBV_y2"], phenotypes, on = :ID)
accuruacy  = cor(results[!,:EBV],results[!,:bv2])  

Single-trait/Multi-trait Linear Additive Genetic Model

To run analysis without genomic data, just remove "genotypes" in the model_equation from the script above.

Clone this wiki locally