Skip to content

Module: Alignments

rrahn edited this page Mar 13, 2017 · 40 revisions

Module Structure

alignment/detail/...                 // The core implementation of the alignment code, which is not prevent.

alignment/representation.hpp         // Required only if output format is not score only.
alignment/representation/...         // Data structures and utility functions to represent and work with alignments.

alignment/pairwise.hpp
alignment/pairwise/...             // Defining the public API for calling the standard alignment algorithms.

alignment/msa.hpp
alignment/msa/...                  // Defining the public API for calling the msa algorithms.

alignment/seed_extend.hpp
alignment/seed_extend/...          

alignment/split.hpp
alignment/split/...

alignment/banded_chain.hpp
alignment/banded_chain/...

Alignment Representation

In SeqAn 2.x we have several possibilities to represent an alignment. The most unpractical on is the Align object, which represents multiple alignment rows of the same types.

Gap Sequence

template <typename t>
concept bool aligned_sequence_view_concept(t g)
{
   requires view_concept<t>;

   typename t::underlying_type;

   // Element access
   { g[] const; } -> typename t::value_type;

   // Manipulation
   { g.insert_gap(0, 4) } -> bool;
   { g.remove_gap(0, 2) } -> bool;

   { g.find_view_position(0)       } -> typename t::size_type;
   { g.find_underlying_position(0) } -> typename t::size_type;
};

Concrete implementation:

Array Gaps Uses simple interleaved blocks structure, where the first value gives the number of gaps and the second the number of sequence chars that follow. Requires linear scan of the block structure for random access.

template <typename underlying_t> 
  requires container_concept<underlying_t>
class aligned_sequence_view_linear_access
{
public:
protected:
private:   
};

Anchor Gaps Using sorted sequence anchors, which allow binary search for random access.

template <typename host_t>
class aligned_sequence_view_logarithmic_access
{
public:
protected:
private:
};

Bitvector Gaps

  • using rank-support query data structure to lookup a position in view space.
    • sdsl::rank_support_v5? (+0.0625n bits)
    • sdsl::rank_support_il? (+128 bits)
template <typename host_t>
class aligned_sequence_view_constant_access
{
public:
protected:
private:
};

Alignment View

template <typename first_t, typename ...remaining_ts>
    requires aligned_sequence_view_concept<first_t> &&
             (gap_sequence_view_concept<remaining_ts> && ...)
class alignment : public std::vector<std::conditional_t<sizeof...(remaining_ts) == 0, 
                                                        first_t,
                                                        std::variant<first_t, remaining_ts...>>
{
    // TODO: How is the interface defined.
};

What else do we want to do with the alignment view? Utility functions:

  • merge two alignments.
  • print/stream/serialize the alignment.
  • compute statistics.

Alignment Algorithms

Alignment configuration:

...

Compile-time configuration

template <typename exec_policy_t, typename seq1_t, typename seq2_t, 
          typename align_config_t,
          typename scoring_scheme_t>
   requires is_execution_policy<exec_policy_t>::value &&
            container_concept<seq1_t> &&
            container_concept<seq2_t> &&
            align_config_static_concept<align_config_static_t>
inline auto 
align(exec_policy_t const & exec_policy,
      seq1_t const & seq1,
      seq2_t const & seq2,
      align_config_t const & config,
      align_parameters<scoring_scheme_t> const & params)
{
   // delegate to corresponding functions.
   // depending on configuration we can return several different things, like the alignment and score or just the score or the trace matrix or what ever we want.
   return {score, std::optional<output_format_t>{}};
}

template <typename exec_policy_t, typename seq1_t, typename seq2_t, 
          typename align_config_t,
          typename scoring_scheme_t,
          typename callback_f>
   requires is_execution_policy<exec_policy_t>::value &&
            container_concept<seq1_t> &&
            container_concept<seq2_t> &&
            align_config_static_concept<align_config_t>
inline void
align_and_then(exec_policy_t const & exec_policy,
               seq1_t const & seq1,
               seq2_t const & seq2,
               align_config_t const & config,
               align_parameters<scoring_scheme_t> const & params)
               callback_f && callback)
{
   // delegate to corresponding functions.
   // depending on configuration we can callback several different things, like the alignment and score or just the score or the trace matrix or what ever we want.
   callback(score, std::optional<output_format_t>{});
}

Runtime configuration

The runtime configuration creates a chain of continuators to create a static config object.

template <typename exec_policy_t, 
          typename seq1_t, 
          typename seq2_t, 
          typename align_config_t,
          typename scoring_scheme_t>
   requires is_execution_policy<exec_policy_t>::value &&
            container_concept<seq1_t> &&
            container_concept<seq2_t> &&   
            align_config_dynamic_concept<align_config_t>
inline auto 
align(exec_policy_t const & exec_policy, seq1_t const & seq1, seq2_t const & seq2,
      align_config_t const & config,
      align_parameters<scoring_scheme_t> const & params)
{
   // delegate to corresponding functions.
   // depending on configuration we can return several different things, like the alignment and score or just the score or the trace matrix or what ever we want.
   return {score, std::optional<aligned_sequence_view_constant_access>{}};
}

template <typename exec_policy_t, typename seq1_t, typename seq2_t, 
          typename align_config_t,
          typename scoring_scheme_t,
          typename callback_f>
   requires is_execution_policy<exec_policy_t>::value &&
            container_concept<seq1_t> &&
            container_concept<seq2_t> &&
            align_config_dynamic_concept<align_config_t>
inline void
align_and_then(exec_policy_t const & exec_policy,
               seq1_t const & seq1,
               seq2_t const & seq2,
               align_config_t const & config,
               align_parameters<scoring_scheme_t> const & params)
               callback_f && callback)
{
   // delegate to corresponding functions.
   // depending on configuration we can callback several different things, like the alignment and score or just the score or the trace matrix or what ever we want.
   callback(score, std::optional<aligned_sequence_view_constant_access>{});
}

Configuration of static and dynamic configs

https://gist.github.com/rrahn/aec7f42dfdb5c0dfc574b3a060628d9b

Clone this wiki locally