Skip to content

Multiclass Bayesian analysis

Hao Cheng edited this page Jun 9, 2021 · 6 revisions

Data Simulation

Multi-class Bayesian Alphabet model are used when multiple genotype data are included in the model construction.

Below we will split the dataset "genofile" into two classes. The marker m1 to m500 are allocated into class "1", markers m501 to m1000 are allocated into class "2". This is used to demonstrate the multi-class anlaysis using JWAS.

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets

geno_data = CSV.read(genofile,DataFrame);
number_snp  = size(geno_data,2) - 1                                # -1 since the ID column
ID          = geno_data[!,:ID];                                    # get the vector of ID
geno_class1 = geno_data[!, 2:Int(number_snp/2 + 1)];               # allocate the marker m1 to m500 to class 1
geno_class2 = geno_data[!, Int(number_snp/2 + 2):number_snp + 1 ]; # allocate the marker m501 to m1000 to class 2

Multi-class Bayesian Alphabet model

# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets


# Step 2: Read data 
phenofile  = dataset("phenotypes.csv")
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes1 = get_genotypes(geno_class1,rowID = ID, method="BayesC");  # read the genotype data in class 1
genotypes2 = get_genotypes(geno_class2,rowID = ID, method="BayesC");  # read the genotype data in class 2
first(phenotypes,5)

# Step 3: Build Model Equations

model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes1 + genotypes2
                  y2 = intercept + x1 + x2 + ID + genotypes1 + genotypes2
                  y3 = intercept + x1 + ID + genotypes1 + genotypes2";
model = build_model(model_equation);

# Step 4: Set Factors or Covariates
set_covariate(model,"x1");

# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

# Step 6: Run Analysis
out=runMCMC(model,phenotypes);

# Step 7: Check Accuruacy
results    = innerjoin(out["EBV_y3"], phenotypes, on = :ID) 
accuruacy  = cor(results[!,:EBV],results[!,:bv3])
Clone this wiki locally