Data delivery

The samples are joined together and run in batches using Illumina flowcells. The results from your samples will be delivered in flowcell batches. The number of samples run on a flowcell varies depending on type of analysis and sequencing depth.
When the sequencing is finished the resulting sequence files are delivered to your customer directory INBOX on our dedicated computer cluster. The results include one fastq file per sample/lane/orientation. In our current setting each flowcell contains two lanes. This means that you will get four fastq files per sample, one forward and one reverse per lane.
The files will be named similar to these:
$> ls customer/INBOX/H7CAABDXX/
1_131031_H7CAABDXX_Sample196_GAGTGG_R1.fastq.gz
1_131031_H7CAABDXX_Sample719_GAGTGG_R2.fastq.gz
1_131031_H7CAABDXX_Sample277_CTTGTA_R1.fastq.gz
2_131031_H7CAABDXX_Sample622_ATCACG_R2.fastq.gz
2_131031_H7CAABDXX_Sample226_AGTCAA_R1.fastq.gz
The directory will also contain a couple of descriptive files. One is meta.txt, a tab-separated file that explains the content of the fastq files. It contains one row per lane and sample. The columns are sample name, flowcell id, lane, barcode, fastq file name orientation 1 and fastq file name orientation 2. E.g:
S622 H7CAABDXX 1 ATCACG 1_1310CG_R1.fastq.gz 1_1310CG_R2.fastq.gz
S135 H7CAABDXX 1 AGTCAA 1_1310AA_R1.fastq.gz 1_1310AA_R2.fastq.gz
S252 H7CAABDXX 1 CGATGT 1_1310GT_R1.fastq.gz 1_1310GC_R2.fastq.gz

The second text file, stats.txt is a tab-separated file with fragment/read statistics. It has one row per sample, each containing; sample name, flowcell id, lanes, read counts per lane, summary of sample read counts in flowcell, yield in MB per lane, summary of sample yield, percent of bases > Q30/lane and mean sequencing score/lane. E.g:
S999 H7CAABDXX 1,2 9307512,9163534 18471046 940,926 1866 92.99,93.33 36.15,36.24
S613 H7CAABDXX 1,2 15248988,14967782 30216770 1540,1512 3052 92.85,93.21 36.11,36.21
S522 H7CAABDXX 1,2 9566956,9397640 18964596 966,949 1915 92.63,92.98 36.04,36.14
S214 H7CAABDXX 1,2 17890790,17573754 35464544 1807,1775 3582 92.82,93.17 36.09,36.19

This summary information will also be emailed to you upon delivery.

Due to security reasons, the cluster can only be accessed from Clinical Genomics' premises. Please contact us for more information about delivery options.


Limitations of the methods


 
Exome sequencing

The method currently in use utilises enrichment via hybridisation; the inherent nature of this technology does not allow for complete (100%) selection of targeted regions in the exome. Hence it is inevitable that certain regions will not be represented in the final data. Also, the design of the capture kit does not target the entire exome (partly dependent on the definition of the exome). Combining capture efficiency with amplification induced relative differences in copy number of individual molecules causes the representation of the exome to be variable. To assess the completeness of the coverage, we can calculate completeness measures for predefined gene sets. Contact us for further information.

Whole-genome sequencing

The method currently in use for whole-genome sequencing is based on mechanical (random) fragmentation, followed by adapter ligation and purification to remove non-ligated adapters. The protocol contains no amplification steps and is hence significantly more uniform in its representation of the genome than the exome sequencing protocol. Limited by the read lengths, the WGS protocol does not allow mapping of reads to highly repetitive regions. To assess the completeness of the coverage, we can calculate completeness measures for predefined gene sets. Contact us for further information.