Beaconplus Data / Query Model

The Progenetix / Beaconplus query model utilises the GA4GH core data model for genomic and (biomedical, procedural) queries and data delivery.

The GA4GH data model for genomics recommends the use of a principle object hierarchy, consisting of

In the Progenetix backend we mirror the GA4GH data model in the storage system, consisting of the corresponding

collections of MongoDB databases. These collections are addressed by scoped queries. Since the current Beacon query model only supports variant queries (“BeaconAlleleRequest”) and filters, we apply pre-parsing steps for mapping the filter values to the correct attributes and collections (see below).

Variant query (VarQ)

The Variant Query is a standard Beacon v1.1 BeaconAlleleRequest, including support for ranges, wildcards and structural variants (DUP, DEL, BND).

Callset Query (CsQ)

Callsets are queried only indirectly, e.g. as data aggregation target using the variants.callset_id values or => callsets.biosample_id matches.

Biosample Query (BiosQ)

Biosamples contain information about biological parameters (e.g. histology, organ site), procedural parameters (e.g. external identifiers, geographic origin) and clinical data specific to the sample (e.g. age at sample collection).

Individual Query (IndQ)

In the GA4GH core data model, Individuals (or Subjects) as data objects contain information pertaining to the whole organism. Typical attributes for use in Beaconplus queries would e.g. be genotypic sex and phenotypic information.


Filters represent a way to allow the resource provider to direct “self-scoped” query values to the corresponding attributes in their backend resource. In the Progenetix implementation, a lookup table followed by scope assignment is used to map prefixed filter values to the correct attributes and collections:

  1. Use the prefix to determine the full attribute
    * filters=NCIT:C4033 - query attribute for value NCIT:C4033
    * filters=PMID:28966033 - query attribute for value PMID:28966033
  2. Match the full attribute to the correct scope (i.e. collection, query domain)

The list below shows a selection from the configuration file (YAML):

    parameter: ''
    parameter: ''
    parameter: ''
    parameter: ''
    parameter: ''
    parameter: ''
    remove_prefix: true

The different scopes (i.e. collections) have pre-defined attributes that can be queried. For example, a filter filters=NCIT:C4033 will be resolved to and the attribute will match a parameter (or its alias) in the biosamples scope, generating a query of { "" : "NCIT:C4033" } against the biosamples collection.

    scope: biosamples
        paramkey: ''
        dbkey: 'id'
          - 'biosamples-id'
          - 'id'
        pattern: '^.+?\w+?.+?$'
        type: array
        paramkey: ''
        dbkey: ''
          - 'biosamples-biocharacteristics-id'
          - ''
          - 'biocharacteristics-id'
        pattern: '^(\w+[\:\-$])?\w*?\d(?:[\w\-\.]+?)?'
        type: array
        paramkey: ''
          - 'biosamples-external_references-id'
          - ''
          - 'external_references-id'
        pattern: '^(\w+[\:\-$])?\w.?(?:[\w\-\.]+?)?'
        type: array
Edit on Github...