Progenetix File Formats

Contents


Progenetix Segment Files .pgxseg

Progenetix uses a variation of a standard tab-separated columnar text file such as produced by array or sequencing CNV software, with an optional metadata header for e.g. plot or grouping instructions.

Wile the first edition only was geared towards sample-linked segment annotations, a variation is now being provided for CNV frequencies.

Sample Segment Files

For example, with 78 samples in three NCIt cancer types, an excerpt of the segment file would look like below:

#meta=>biosample_count=78
#plotpars=>title="Testing Custom Plot Parameters"
#plotpars=>subtitle="Some Chromosomes, Colors etc."
#plotpars=>chr2plot="3,5,7,8,11,13,16"
#plotpars=>color_var_dup_hex=#EE4500;color_var_del_hex=#09F911
#plotpars=>size_title_left_px=300
#plotpars=>size_text_title_left_px=10
#sample=>biosample_id=GSM253303;group_id=NCIT:C4028;group_label="Cervical Squamous Cell Carcinoma"
#sample=>biosample_id=GSM388959;group_id=NCIT:C4024;group_label="Esophageal Squamous Cell Carcinoma"
#sample=>biosample_id=GSM252886;group_id=NCIT:C6958;group_label="Astrocytic Tumor"
biosample_id	chro	start	stop	mean	variant_type	probes
GSM252886	1	911484	11993973	-0.4486 DEL	.
GSM252886	1	12158755	22246766	0.2859 DUP	.
GSM252886	1	22346353	24149880	-0.5713 DEL	.
GSM252886	1	24160170	33603123	0.0812	. .
GSM252886	1	33683474	37248987	-0.6478 DEL	.
GSM252886	1	37391587	248655165	0.0342	. .
GSM252886	2	110819	240942225	-0.0007	. .
GSM252886	3	119131	4655519	-0.0122	. .
GSM252886	3	4662952	4857477	0.9273 DUP 	.
...

Segment CNV Frequencies

In the frequency file

Future options are under evaluation.

Examples can be derived from the Progenetix “Services” API:

#meta=>genome_binning=1Mb;interval_number=3102
#group=>group_id=icdom-81403;label=Adenocarcinoma, NOS;dataset_id=progenetix;sample_count=18559
group_id	chro	start	end	gain_frequency	loss_frequency	index
icdom-81403	1	0	1000000	8.8	9.12	0
icdom-81403	1	1000000	2000000	8.49	8.68	1
icdom-81403	1	2000000	3000000	9.81	13.19	2
icdom-81403	1	3000000	4000000	10.02	15.84	3
icdom-81403	1	4000000	5000000	7.94	15.91	4
...
icdom-81403	2	228000000	229000000	7.37	6.62	477
icdom-81403	2	229000000	230000000	7.39	6.89	478
icdom-81403	2	230000000	231000000	8.3	7.0	479
icdom-81403	2	231000000	232000000	8.24	6.86	480
icdom-81403	2	232000000	233000000	9.1	7.89	481
...

Data Matrix Files

CNV Frequency Matrix

The CNV frequency matrix contains interval CNV frequencies for genomic bins, separate for gain and loss frquencies:

#meta=>genome_binning=1Mb;interval_number=3102
#group=>group_id=NCIT:C7376;label=Pleural Malignant Mesothelioma;dataset_id=progenetix;sample_count=240
#group=>group_id=PMID:22824167;label=Beleut M et al. (2012)...;dataset_id=progenetix;sample_count=159
group_id	1:0-1000000:gainF	1:1000000-2000000:gainF	...  1:0-1000000:lossF	1:1000000-2000000:lossF	...
NCIT:C7376	9.58	7.92	...  1.89	1.89	...
PMID:22824167	6.29	0.0	... 8.18	4.4	...
Examples

CNV Status Matrix

For endpoints with per biosample or callset / analysis delvery, the Progenetix API offers the delivery of a binned status matrix. This matrix can e.g. directly be used for clustering CNV patterns.

The header will contain sample specific information.

#meta=>id=progenetix
#meta=>assemblyId=GRCh38
#meta=>filters=NCIT:C4443
#meta=>genome_binning=1Mb;interval_number=3102
#meta=>no_info_columns=3;no_interval_columns=6204
#sample=>biosample_id=pgxbs-kftvktaz;analysis_ids=pgxcs-kftwu9ca;group_id=NCIT:C6650;group_label=Ampulla of Vater adenocarcinoma;NCIT::id=NCIT:C6650;NCIT::label=Ampulla of Vater adenocarcinoma
#sample=>biosample_id=pgxbs-kftvkyeq;analysis_ids=pgxcs-kftwvv3p;group_id=NCIT:C3908;group_label=Ampulla of Vater Carcinoma;NCIT::id=NCIT:C3908;NCIT::label=Ampulla of Vater Carcinoma
...
#meta=>biosampleCount=26;analysisCount=26
analysis_id	biosample_id	group_id	1:0-1000000:DUP	1:1000000-2000000:DUP	1:2000000-3000000:DUP	1:3000000-4000000:DUP  ...
pgxcs-kftwu9ca	pgxbs-kftvktaz	NCIT:C6650	0	0.3434	1.0	1.0
pgxcs-kftwwbry	pgxbs-kftvkzwp	NCIT:C3908  0.5801	0	0.6415	1.0
...
Examples

Links

@mbaudis 2021-04-16
Edit on Github...