fastq_to_fasta
A template for creation of SeqAn3 apps, with a FASTQ to FASTA example app.
|
The HIBF binning directory. A data structure that efficiently answers set-membership queries for multiple bins. More...
#include <raptor/hierarchical_interleaved_bloom_filter.hpp>
Classes | |
class | counting_agent_type |
Manages counting ranges of values for the hibf::hierarchical_interleaved_bloom_filter. More... | |
class | membership_agent |
Manages membership queries for the hibf::hierarchical_interleaved_bloom_filter. More... | |
class | user_bins |
Bookkeeping for user and technical bins. More... | |
Public Types | |
using | ibf_t = seqan3::interleaved_bloom_filter< data_layout_mode_ > |
The type of an individual Bloom filter. More... | |
Public Member Functions | |
membership_agent | membership_agent () const |
Returns a membership_agent to be used for counting. More... | |
Constructors, destructor and assignment | |
hierarchical_interleaved_bloom_filter ()=default | |
Defaulted. More... | |
hierarchical_interleaved_bloom_filter (hierarchical_interleaved_bloom_filter const &)=default | |
Defaulted. More... | |
hierarchical_interleaved_bloom_filter & | operator= (hierarchical_interleaved_bloom_filter const &)=default |
Defaulted. More... | |
hierarchical_interleaved_bloom_filter (hierarchical_interleaved_bloom_filter &&)=default | |
Defaulted. More... | |
hierarchical_interleaved_bloom_filter & | operator= (hierarchical_interleaved_bloom_filter &&)=default |
Defaulted. More... | |
~hierarchical_interleaved_bloom_filter ()=default | |
Defaulted. More... | |
Public Attributes | |
std::vector< ibf_t > | ibf_vector |
The individual interleaved Bloom filters. More... | |
std::vector< std::vector< int64_t > > | next_ibf_id |
Stores for each bin in each IBF of the HIBF the ID of the next IBF. More... | |
user_bins | user_bins |
The underlying user bins. More... | |
Static Public Attributes | |
static constexpr seqan3::data_layout | data_layout_mode = data_layout_mode_ |
Indicates whether the Interleaved Bloom Filter is compressed. More... | |
The HIBF binning directory. A data structure that efficiently answers set-membership queries for multiple bins.
data_layout_mode_ | Indicates whether the underlying data type is compressed. See seqan3::data_layout. |
This class improves the seqan3::interleaved_bloom_filter by adding additional bookkeeping that allows to establish a hierarchical structure. This structure can then be used to split or merge user bins and distribute them over a variable number of technical bins. In the seqan3::interleaved_bloom_filter, the number of user bins and technical bins is always the same. This causes performance degradation when there are many user bins or the user bins are unevenly distributed.
A Technical Bin represents an actual bin in the binning directory. In the IBF, it stores its kmers in a single Bloom Filter (which is interleaved with all the other BFs).
The user may impose a structure on his sequence data in the form of logical groups (e.g. species). When querying the IBF, the user is interested in an answer that differentiates between these groups.
In constrast to the seqan3::interleaved_bloom_filter, the user bins may be split across multiple technical bins , or multiple user bins may be merged into one technical bin. When merging multiple user bins, the HIBF stores another IBF that is built over the user bins constituting the merged bin. This lower-level IBF can then be used to further distinguish between merged bins.
In this example, user bin 1 was split into two technical bins. Bins 3, 4, and 5 were merged into a single technical bin, and another IBF was added for the merged bin.
The individual IBFs may have a different number of technical bins and differ in their sizes, allowing an efficient distribution of the user bins.
To query the Hierarchical Interleaved Bloom Filter for values, call hibf::hierarchical_interleaved_bloom_filter::membership_agent() and use the returned hibf::hierarchical_interleaved_bloom_filter::membership_agent. In contrast to the seqan3::interleaved_bloom_filter, the result will consist of indices of user bins.
To count the occurrences in each user bin of a range of values in the Hierarchical Interleaved Bloom Filter, call hibf::hierarchical_interleaved_bloom_filter::counting_agent() and use the returned hibf::hierarchical_interleaved_bloom_filter::counting_agent_type.
The Interleaved Bloom Filter promises the basic thread-safety by the STL that all calls to const
member functions are safe from multiple threads (as long as no thread calls a non-const
member function at the same time).
using raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::ibf_t = seqan3::interleaved_bloom_filter<data_layout_mode_> |
The type of an individual Bloom filter.
|
default |
Defaulted.
|
default |
Defaulted.
|
default |
Defaulted.
|
default |
Defaulted.
|
inline |
Returns a membership_agent to be used for counting.
|
default |
Defaulted.
|
default |
Defaulted.
|
staticconstexpr |
Indicates whether the Interleaved Bloom Filter is compressed.
std::vector<ibf_t> raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::ibf_vector |
The individual interleaved Bloom filters.
std::vector<std::vector<int64_t> > raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::next_ibf_id |
Stores for each bin in each IBF of the HIBF the ID of the next IBF.
Assume we look up a bin b
in IBF i
, i.e. next_ibf_id[i][b]
. If i
is returned, there is no lower level IBF, bin b
is hence not a merged bin. If j != i
is returned, there is a lower level IBF, bin b
is a merged bin, and j
is the ID of the lower level IBF in ibf_vector.
user_bins raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins |
The underlying user bins.