-
Notifications
You must be signed in to change notification settings - Fork 44
Categorical Trait Analysis
Tianjing Zhao edited this page Aug 12, 2022
·
5 revisions
- Single-trait Linear Mixed Model (Genomic data)
- Multiple categorical traits Linear Mixed Model (Genomic data)
- Single-trait/Multi-trait Linear Additive Genetic Model
Categorical traits should be identified in the argument
categorical_trait
in build_model().
Categorical phenotypes should be coded as 1,2,3...
It is assumed that the ordered categories are determined by the unobserved continuous liability with thresholds. The MCMC samples of thresholds and liability are in the output folder.
Below we will convert a continuous trait "y1" to a categorical trait by setting phenotypic values below -1.7 as category "1", values between -1.7 and 0.0 as category "2", and values larger than 0.0 as category "3". This is used to demonstrate categorical trait analysis using JWAS.
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)
phenofile = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
#y1 - categorical trait
category1_sel = phenotypes[!,:y1] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y1] .< 0.0
category3_sel = phenotypes[!,:y1] .> 0.0
phenotypes[category1_sel, :y1] .= 1
phenotypes[category2_sel, :y1] .= 2
phenotypes[category3_sel, :y1] .= 3
# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)
# Step 2: Read data
pedfile = dataset("pedigree.csv")
genofile = dataset("genotypes.csv")
pedigree = get_pedigree(pedfile,separator=",",header=true);
genotypes = get_genotypes(genofile,separator=',',method="BayesC");
# Step 3: Build Model Equations
model_equation ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
model = build_model(model_equation,categorical_trait=["y1"]);
# Step 4: Set Factors or Covariates
set_covariate(model,"x1");
# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);
# Step 6: Run Analysis
out=runMCMC(model,phenotypes,chain_length=5000);
# Step 7: Check Accuruacy
results = innerjoin(out["EBV_y1"], phenotypes, on = :ID)
accuruacy = cor(results[!,:EBV],results[!,:bv1])
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)
phenofile = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
#y1 - categorical trait
category1_sel = phenotypes[!,:y1] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y1] .< 0.0
category3_sel = phenotypes[!,:y1] .> 0.0
phenotypes[category1_sel, :y1] .= 1
phenotypes[category2_sel, :y1] .= 2
phenotypes[category3_sel, :y1] .= 3
#y2 - categorical trait
category1_sel = phenotypes[!,:y2] .< -1.7
category2_sel = -1.7 .< phenotypes[!,:y2] .< 0.0
category3_sel = phenotypes[!,:y2] .> 0.0
phenotypes[category1_sel, :y2] .= 1
phenotypes[category2_sel, :y2] .= 2
phenotypes[category3_sel, :y2] .= 3
# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)
#Step 2: Read data
pedfile = dataset("pedigree.csv")
genofile = dataset("genotypes.csv")
pedigree = get_pedigree(pedfile,separator=",",header=true);
genotypes = get_genotypes(genofile,separator=',',method="BayesC");
# Step 3: Build Model Equations
model_equation ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
y2 = intercept + x1 + x2 + ID + genotypes"
model = build_model(model_equation,categorical_trait=["y1","y2"]);
# Step 4: Set Factors or Covariates
set_covariate(model,"x1");
# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);
# Step 6: Run Analysis
out=runMCMC(model,phenotypes, chain_length=5000);
#Step 7: Check Accuruacy
results = innerjoin(out["EBV_y1"], phenotypes, on = :ID)
accuruacy = cor(results[!,:EBV],results[!,:bv1])
results = innerjoin(out["EBV_y2"], phenotypes, on = :ID)
accuruacy = cor(results[!,:EBV],results[!,:bv2])
To run analysis without genomic data, just remove "genotypes" in the model_equation from the script above.