Downstream Analysis

For notes on how to implement these features, refer to the Tutorial and the API Reference

Filtering Variants

Four different pre-made filters that allow for the retrieval of specific variants have been implemented. These allow the user to query in an easy and efficient manner variants of interest

1. Rare Deleterious Variants

  • criteria 1: 1000 Genomes (ALL) allele frequency (Annovar) < 0.05 or info not available
  • criteria 2: ESP6500 allele frequency ( - CADD) < 0.05 or info not available
  • criteria 3: cosmic70 ( information is present
  • criteria 4: Func_knownGene (Annovar) is exonic, splicing, or both
  • criteria 5: ExonicFunc_knownGene (Annovar) is not “synonymous SNV”

2. Known Disease Variants

  • criteria: cosmic70 ( information is present or ClinVar data is present and clinical significance is not Benign or Likely Benign

3. Deleterious Compound Heterozygous Variants

  • criteria 1: genotype_subclass_by_class (VAPr) is compound heterozygous
  • criteria 2: CADD phred score ( - CADD) > 10

4. De novo Variants

  • criteria 1: Variant present in proband
  • criteria 2: Variant not present in either ancestor-1 or ancestor-2

Create your own filter

As long as you have a MongoDB instance running and an annotation job ran successfully, filtering can be performed through pymongo as shown by the code below. Running the query will return a cursor object, which can be iterated upon.

If instead a list is intended to be created from it, simply add: filter2 = list(filter2).


If the number of variants in the database is large and the filtering is not set up correctly, returning a list will be probably crash your computer since lists are kept in memory. Iterating over the cursor object perform lazy evaluations (i.e., one item is returned at a time instead of in bulk) which are much more memory efficient.

Further, if you’d like to customize your filters, a good idea would be to look at the available fields to be filtered. Looking at the documentation, you can see what are all the fields available and can be used for filtering.

from pymongo import MongoClient

client = MongoClient()
db = getattr(client, mongodb_name)
collection = getattr(db, mongo_collection_name)

filtered = collection.find({"$and": [
                                   {"$or": [{"func_knowngene": "exonic"},
                                            {"func_knowngene": "splicing"}]},
                                   {"cosmic70": {"$exists": True}},
                                   {"1000g2015aug_all": {"$lt": 0.05}}

# filtered = list(filtered) Uncomment this if you'd like to return them as a list
for var in filtered:

Output Files

Although iterating over variants can be interesting for cursory analyses, we provide functionality to retrieve as well csv files for downstream analysis. A few options are available:

Unfiltered Variants CSV


  • All variants will be written to a CSV file.

Filtered Variants CSV

write_filtered_annotated_csv(variant_list, out_file_path)

  • A list of filtered variants will be written to a CSV file.

Unfiltered Variants VCF


  • All variants will be written to a VCF file.

Filtered Variants VCF

write_filtered_annotated_vcf(variant_list, vcf_out_path)

  • A List of filtered variants will be written to a VCF file.