The General Feature Format is a textual format for describing annotations on biological sequences. The format is specified by the Sanger Center at GFF (General Feature Format) Specifications Document.
We assume that the reader is familiar with the GFF record format as described in the above, as we do not repeat this information here. A compliant GFF file for Caryoscope consists of a list of GFF records organized into three consecutive sections:
The sections may not be interspersed or reordered. For each section, we define the interpretation of each record belonging to that section below.
The name of a chromosome
Ignored
Should always be equal to chromosome
Should always be equal to 1
The length of the chromosome
Ignored
Ignored
Ignored
Ignored
Ignored
The name of the chromosome on which the centromere resides
Ignored
Should always be equal to centromere
The 1-based starting location of the centromere
The 1-based ending location of the centromere
Ignored
Ignored
Ignored
Ignored
Ignored
The name of the chromosome on which the feature resides
Ignored
A string identifying the "type" of the feature, such as exon
The 1-based starting location of the feature
The 1-based ending location of the feature
Ignored
Ignored
Ignored
Parsed as described below
Ignored
The "<attributes>" data is a set of keywords, each of which may be associated with one or more values. These are parsed as follows:
All keywords values are parsed as annotations on the data, and may be used in the Feature tooltip expression and Feature URL expression.
Any value which parses correctly as a numerical value is added to a dataset named after the keyword, and values associated with it are available for display (see Section 2.5, “Choose a dataset”).
Caryoscope accepts text files in comma- or tab-delimited format, as are usually exported or imported by popular spreadsheet software. Caryoscope expects the columns in these files in a format very similar to the GFF file described in Section 3.4.1, “General Feature Format”.
A compatible text file has three sections, as does the GFF case, for chromosomes, centromeres and features, respectively. Some columns are mandatory, as described below, while further columns to the right are considered annotations, and are treated in the same way that the "<attributes>" field is used in the GFF case.
The name of the chromosome on which the feature resides
Ignored
A string identifying the "type" of the feature, such as exon
The 1-based starting location of the feature
The 1-based ending location of the feature
Ignored
Ignored
Ignored
Ignored
Annotation columns