File Formats¶
Genotypes File¶
This module supports loading of three types of genotype files:
1. VCF¶
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ross Prado Ash
1A 14370 . G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
1A 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
1A 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
1A 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
1A 11111 1subfield C A 50 PASS A=1;B=2;C=3 GT 0/1 ./. 1/1
2. Genotype Matrix¶
A tab-delimited data file where each line corresponds to a SNP and columns correspond to germplasm assayed. Expected columns: (1) Marker Name, (2) Chromosome Name, (3) Position on Chromosome, (4+) Sample Genotype Calls.
Marker name Chromosome Position 1048-8R 964a-46 Giftgi
FcChr1Ap11111 1A 11111 CC AC AA
FcChr1Ap22222 1A 22222 GG GC GG
FcChr1Ap33333 1A 33333 TA AA GA
3. Genotype Flat-file¶
A tab delimited data file where each line is a genotypic call. Expected columns: (1) Marker name, (2) Chromosome Name, (3) Position on Chromosome, (4) Sample Name, (5) Genotype call.
Marker name Chromosome Position Sample name Genotype call
FcChr1Ap11111 1A 11111 Ross CC
FcChr1Ap11111 1A 11111 Prado CC
FcChr1Ap11111 1A 11111 Ash CC
FcChr1Ap11111 1A 11111 Piero CT
FcChr1Ap11111 1A 11111 Tai CC
FcChr1Ap11111 1A 11111 Beverly TC
FcChr1Ap11111 1A 11111 Argent CC
FcChr1Ap11111 1A 11111 Trenus TT
FcChr1Ap11111 1A 11111 Zapelli CC
FcChr1Ap11111 1A 11111 Amato CG
Samples File¶
All formats require a separate samples file describing the germplasm assayed. This file is expected to be a tab-delimited file with the following columns: (1) Sample name in the genotypes file, (2) Sample name, (3) Sample accession, (4) Germplasm name, (5) Germplasm accession.
The next two columns are optional: (6) Germplasm type (otherwise it is currently assumed to be of type ‘Individual’ from the stock_type cv) and (7) Organism (this allows multiple organisms in your genotypes file, assuming they have all been aligned to the same genome. Otherwise, the default value is the organism you specified as an option).
Sample name Sample_name Sample_Accession Germplasm_name Germplasm_Accession Germplasm_Type Organism
Ross Ross_110201 Catsam1 Ross Catgerm1 Individual Felis catus
Prado Prado_110201 Catsam2 Prado Catgerm2 Individual Felis catus
Ash Ash_110201 Catsam3 Ash Catgerm3 Individual Felis catus
Piero Piero_110201 Catsam4 Piero Catgerm4 Individual Felis catus
Tai Tai_110201 Catsam5 Tai Catgerm5 Individual Felis catus
Beverly Beverly_110201 Catsam6 Beverly Catgerm6 Individual Felis catus
Argent Argent_110201 Catsam7 Argent Catgerm7 Individual Felis catus
Trenus Trenus_110201 Catsam8 Trenus Catgerm8 Individual Felis catus
Zapelli Zapelli_110201 Catsam9 Zapelli Catgerm9 Individual Felis catus
Amato Amato_110201 Catsam10 Amato Catgerm10 Individual Felis catus