Multiple chromosomes with pedigree builder #2081
-
I'm successfully using msprime's pedigree-aware simulations if I use chromosome 1. However, simulating first cousins (specifically, first cousins to eighth cousins in a 10-generation pedigree) is important for my project. The expected amount of sharing for second cousins is on average 212.50 cM, so I could make do with just simulating with chromosome 1, but this does not reflect my real-life sampling plan of sampling first cousins to reduce uncertainty in a 10-generation inferred pedigree. The expected amount of sharing for first cousins is on average 850.00 cM so I would need to simulate multiple chromosomes. I run into errors as I try to do so and would like your advice on how to proceed. UPDATE: I now have this part of the question working. See #2081 (comment). I'm now only trying to figure out the second part of this question about the missing regions error. First, I confirmed that my code works using chromosome 1 and the NIH HapMap 37 genetic map. All is well from building the pedigree to running the simulation, ibd_segments, and writing a VCF file. (Pedigree building includes assigning specific populations among AFR, ADMIX, or EUR to specific individuals as per the pedigree definition.) Then, I followed your example here adapting it to simulate five chromosomes. This runs but when I used keep_intervals post-simulation based on the intervals in chrom_positions, only the ones for the first interval were present. I get errors on the subsequent intervals stating that they did not exist. The VCF file produced from this attempt only had positions for the first interval as well.
Missing regions error: I also tried to use this strategy with the genetic maps by renumbering the positions gathered from the map as if I'm working with one extra-long chromosome pre-simulation, and then selecting the appropriate intervals using keep_intervals and renumbering them back post-simulation according to the specific chromosome that those positions represent pre-simulation. I don't know if it makes sense to do this renumbering scheme. In any event, when I try to run the simulation, I'm getting the error
Could you help me to understand what I'm not getting quite right? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
Hi @lakishadavid -- I'm not really sure what's going wrong with setting up the multi-chromosome map in the first code block. Nothing pops out to me as wrong. However, since you were able to get the simulation working with the With independent simulations through the same fixed pedigree, those simulations will ensure that you have free recombination between those simulated chromosomes. You would only need to stitch together multiple chromosomes into a single map if your simulations are not conditioned on a pedigree (if you used the So I would suggest running each chromosome with its own recombination map independently under the same input pedigree. You might need to be careful about how the simulation is finished off above the pedigree, if you need trees to be fully coalesced. But parallelizing the simulations like this should come with the added bonus of being more efficient and simplifying your code. I hope this is helpful! |
Beta Was this translation helpful? Give feedback.
-
Update: I now have the multiple chromosomes from the adapted example working. I was missing the trim() method from keep_intervals. This produces the appropriate sequence length for chromosome 5, etc.
I'm now left to figure out how to correct the |
Beta Was this translation helpful? Give feedback.
Update: I now have the multiple chromosomes from the adapted example working. I was missing the trim() method from keep_intervals. This produces the appropriate sequence length for chromosome 5, etc.
I'm now left to figure out how to correct the
# Missing regions of the genome other than the flanks are currently not supported.
error from the code in my original question when I try to use the HapMap genetic maps instead.