Coordinate use for Beacon Queries

Precise Variants

Precise variants are such with annotated sequences of 1….n length, and bases of [ATCGN] for the referenceBases and alternateBases attributes (N for “not specified”).

Range Queries and Structural Variants

The adoption of

into the Beacon v0.4 release enabled the execution of

Introduction

While, based on reference genome coordinates, base specific genome variations can be described by a combination of a single genome position together with reference and alternate base(s), structural genome variants are described through the coordinates of the variants’ start and end positions, together with the variants’ types.

Additionally to their description by start and end positions and omitting of reporting the interspersed sequence (which in the case of CNVs may be of considerable length, up to a complete reference/chromosome), functionally equivalent structural variants usually vary with respect to their exact positions. In contrast to specific & recurring SNVs (think BRAF V600E), structural variants (think TP53 deletions in cancer) tend to be selected due to their functional effect (think all deletions rendering TP53 non-functional) rather than at events of a base-specific size. Therefore, Query Ranges are needed allow a “fuzzy” matching of all variants with a specific type, but varying genomic extent.

To summarise, the concept of using intervals for querying variant start and end positions is based on two main scenarios:

Genome Range Matches

Matching genome ranges when querying for variants requires the logical definition of how overlaps of the Query Ranges with variants are handled.

The positions in startMin,startMax and endMin,endMax each bracket the intervals for VPOS and VEND (start and end position of a variant mapped to a reference genome, respectively).

The server will return all matches for variant V, where

Legend for the sequence graphs:
Deletion which starts somewhere between positions 15 and 21, and ends between 45 and 53

Beacon request

{
  "startMin": 15,
  "startMax": 21,
  "endMin": 45,
  "endMax": 53,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
              ???????                      ?????????

Interpretation Direct submission of ranges in which the start and end of the variants have to occur for creating a match. Matched variants

----------------****************************--------------------=>
---------------*************************************------------=>
--------------------************************--------------------=>
Logical ranges and their interpolation into query ranges

In practical applications, sometimes “match types” are being used. Understanding of these logical types and their application is helpful for understanding range match queries and their interpolation into specific query attributes.

For continuous ranges, a match can for example have one of the following “matchtype” options. This is just for demonstarting the use in an interface, which then will send start and end parameters to the server:

Query Ranges can be used to accommodate for all “matchtype” options, including such not specified above (e.g. combination one side exact, other side any etc.). The following examples will demonstrate the interpolation of logical matchtypes into start and end ranges. No “matchtype” parameter is being sent to the server; the interpolation has to be provided through the interface logic. Generally, “matchtype” represents an abstract - but frequently used - logical construct which is only applied here to demonstrate different query types.

Match Examples (against a genome of length _MAXBASE_):
Right-open query for any match between 20 and 45, but not including a position below 20

Beacon request

{
  "startMin" : 0,
  "startMax": 45,
  "endMin": 20,
  "endMax": _MAXBASE_,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
????????????????????????????????????????????
                   ?????????????????????????????????????????????=>

Matched variants

-------------*************--------------------------------------=>
**************************************************************--=>
-------------------*--------------------------------------------=>
----------------------------------------**----------------------=>

Interpretation Any type of overlap with the basic DEL 20-45 is returned. This condition becomes true for any deletion DEL that starts between position 0 and the end of the region (45), and ends anywhere between the start of the region (20) and the end of the genome (1000). These conditions are emulated through providing

Complete coverage of a region (20-45)

Beacon request

{
  "startMin" : 0,
  "startMax": 20,
  "endMin": 45,
  "endMax": _MAXBASE_,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
????????????????????                       ?????????????????????=>

** Matched variants**

**************************************************************--=>
-------------------****************************-----------------=>

Interpretation We allow deletions to start before or including the start position and end including or after the end position. Note that the deletion needs to be completely within the defined window [20, 45], and therefore needs to have a minimal length of 25.

Exact query for a duplication

Beacon request

{
  "startMin" : 20,
  "startMax": 20,
  "endMin": 45,
  "endMax": 45,
  "variantType": "DUP"
}

OR

  { start: 20, end: 45, variantType: DUP }
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
                   ?                       ?

** Matched variants** (only one option)

-------------------*************************--------------------=>

Interpretation We want an exact match, i.e. the service should only return data that falls exactly into the range [20, 45]. This would be the default interpretation (however, for structural variants the main query interists will be for the “any” or “complete” types)..

Right open, left-anchored query

Beacon request

{
  "startMin" : 20,
  "startMax": 20,
  "endMin": 45,
  "endMax": _MAXBASE_,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
                   ?                       ?????????????????????=>

Matched variants

-------------------*********************************************=>
-------------------*************************--------------------=>

Interpretation We require deletions to start at the start position and cover the whole range until the end position. They can end at or after the end position.

Right open, left-anchored query with “at least one base” match

Beacon request

{
  "startMin" : 20,
  "startMax": 20,
  "endMin": 20,
  "endMax": _MAXBASE_,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
                   ?????????????????????????????????????????????=>

Matched variants

-------------------*********************************************=>
-------------------******---------------------------------------=>
Left open query, right base specific anchor

Beacon request

{
  "startMin" : 0,
  "startMax": 20,
  "endMin": 45,
  "endMax": 45,
  "variantType": "DEL"
}
-------------------xxxxxxxxxxxxxxxxxxxxxxxxx--------------------=>
????????????????????                       ?

Matched variants

-------------------*************************--------------------=>
********************************************--------------------=>

Interpretation We allow deletions to start at or before the start position. They end at the end position.

Interpretation of CIPOS,CIEND parameters provided in the variant set:

Annotated (structural) variant positions may not always be base exact but represent an approximate position of high probability (e.g. due to mapping problems, methods w/o complete genome coverage such as WES, arrays …). In the VCF specification, these Variant Position Intervals are denoted through the “CIPOS, CIEND” intervals for start and end of a structural variant, respecyively.

The interpolation of match types into query-side start,end parameters is independent of cipos,ciend (in VCF: CIPOS,CIEND) values provided as part of the variant annotations. The recommendation here is to use the interpolated ranges for the start and end positions, and match them against the variant side intervals.


2018-04-04