For simple tasks we may just go to NCBI: NCBI BLAST.
BatchPrimer3
CRAN
Dia diagram editor
GIMP
GitHub docs
GitHub
Pages
Helium
Pedigree visualization
Inkscape
LibreOffice
NCBI BLAST.
Phytozome
RStudio
Cheatsheets
Scribus desktop publishing
A common format for storing biological coordinate data, such as the chromosome, start and end position of a feature, is the GTF/GFF format. The General Feature Format (GFF) and the General Transfer Format (GTF) are similar formats that have technical differences. These technical details should be found online, such as GFF on Wikipedia and GFF/GTF File Format at ensembl. Table 1 is an example GTF file produced by the BRAKER pipeline.
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 |
---|---|---|---|---|---|---|---|---|
000000F|arrow | AUGUSTUS | gene | 13924 | 18566 | . |
|
. | W103_g1 |
000000F|arrow | AUGUSTUS | transcript | 13924 | 18566 | . |
|
. | W103_g1.t1 |
000000F|arrow | AUGUSTUS | stop_codon | 13924 | 13926 | . |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 13924 | 13960 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 13924 | 13960 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 13961 | 14096 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 14097 | 14168 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 14097 | 14168 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 14169 | 14974 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 14975 | 15055 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 14975 | 15055 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 15056 | 15272 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 15273 | 15323 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 15273 | 15323 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 15324 | 15486 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 15487 | 15581 | 1 |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 15487 | 15581 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 15582 | 16447 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 16448 | 16528 | 1 |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 16448 | 16528 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 16529 | 16623 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 16624 | 16708 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 16624 | 16708 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 16709 | 16943 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 16944 | 17023 | 1 |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 16944 | 17023 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 17024 | 17138 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 17139 | 17253 | 1 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 17139 | 17253 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 17254 | 17849 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 17850 | 17920 | 1 |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 17850 | 17920 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 17921 | 18012 | 1 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 18013 | 18100 | 0.97 |
|
1 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 18013 | 18100 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 18101 | 18323 | 0.97 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 18324 | 18396 | 0.54 |
|
2 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 18324 | 18396 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | intron | 18397 | 18559 | 0.45 |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | CDS | 18560 | 18566 | 0.68 |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | exon | 18560 | 18566 | . |
|
. | transcript_id W103_g1.t1; gene_id W103_g1; |
000000F|arrow | AUGUSTUS | start_codon | 18564 | 18566 | . |
|
0 | transcript_id W103_g1.t1; gene_id W103_g1; |
The format does not include a ‘header’ but the specification does include names for the columns. For example, column 1 is the ‘seqid’ and in our case is the name of the sequence (contig, chromosome, etc.). Columns 4 and 5 are the start and end and are sorted so that column 4 is less than column 5, this is why column 7 specifies the strand. Column 3 is the ‘type’ and is important when extracting information from the file. The contents of this column are part of a ‘controlled vocabulary’ that we can reference at the Sequence Ontology. This ‘gene’ consists of 1 ‘gene’ record, 1 ‘transcript’ record, as well as 1 ‘start_codon’ and one ‘stop_codon’. The gene also consists of 13 ‘CDS’, 13 ‘exon’, and 12 ‘intron’ records.