Genome

Formatting and uploading annotated assemblies and GenBank or GFF and FASTA files.

In KBase, a Genome object is the annotated version of an Assembly or annotated predicted coding sequences and can encompass several types of feature calls. If you want to upload solely the DNA sequence from a FASTA file (without annotations), go to the Assembly page.

The Genome importer supports only GenBank and GFF formatted files.

A GenBank-formatted input file should include sequence contig(s), feature calls (annotations), and taxonomy information for the organism. KBase parses the input file into two data objects: an assembly object with the sequence and a genome object containing the original feature calls and annotations.

GenBank-formatted files with no features can be uploaded as Genomes.

A GenBank-formatted file can be uploaded into the Staging Area from your local computer (files with.gb, .gbff, or .gbk extensions) or directly from an FTP or HTTP URL.

A GFF-formatted file must be paired with a corresponding FASTA file of the DNA sequence. These will be parsed into two data objects: an assembly object with the sequence and a genome object containing the original feature calls and annotations.

Further instructions for adding data to your Staging Area can be found here.

Importing a GenBank-formatted genome

Using a file on your computer, open the Import tab within the Data Browser. Then drag & drop the genome file into your Staging Area.

Open the Import As... pulldown menu to the right of the filename in your Staging Area and select “GenBank Genome.”

Make sure the correct file type is selected and the checkbox is active, then click the “Import Selected” button.

The Data Browser will close and the “Import GenBank File as Genome from Staging Area” App will be added to your Narrative.

Notice that the name of the Genome file is filled in, as is a suggested name for the Genome and Assembly data objects that will be created by the import, which can be changed. Adjust the Genome Type and source of the GenBank file or any advanced parameters if needed.

Click the green "Run" button to start the import. When the import is finished, your Data Panel will update to show the new Genome and Assembly objects, and a report will appear in the Import App.

Importing a GFF-formatted Genome

Open the Import tab in the Data Browser and drag and drop both the genome and corresponding FASTA file into your Staging Area. In your Staging Area, open the Import As... pulldown menu to the right of the GFF filename and select “GFF Genome."

Note the name of the corresponding FASTA file.

Make sure the correct file type is selected and the checkbox is active, then click the "Import Selected" button. The data slide-out will close and the “Import GFF/FASTA File as Genome from Staging Area” App will be added to your Narrative. The GFF File Path name will be filled in.

You will need to fill in the name of the FASTA file. Using the dropdown for the “FASTA File Path”, select the FASTA file in the Staging Area. Ensure the file type is a FASTA file type.

The name of the Scientific Name may be filled in, as is a suggested name for the Genome data object that will be created by the import. You can edit the name of the output Genome Object Name, Scientific Name, and any advanced options as needed. Click the green "Run" button. When the import is finished, your Data Panel will update to show the new Genome object, and a report will appear in the Import App.

Uploading a Genome from other sources

You can upload data into your KBase Staging Area using Globus, or by supplying a URL for a publicly accessible FTP location, Google Drive, Dropbox, or a direct HTTP link to import into the Narrative. Options for adding data to your Staging Area are described here.

Drag & Drop Limitations

The drag & drop option from your local computer works for many files, but there is a size limit that depends on your computer and browser. For larger files (around 20 gigabases), use the Globus Online transfer.

Bulk Import

Both GenBank and GFF genomes can be imported as one of the supported bulk import types. You can select multiple assemblies simultaneously from the staging area to import them at once. See the bulk import section of the guide to importing data into the Narrative.

Uploading a Genome using a URL

Open the Import tab in the Data Browser and click on the Upload with URL button (below the drag & drop area) to open an Upload App Cell.

The Data Browser will close and the “Upload a File to Staging from Web” App will appear in your Narrative. Alternatively, you can open the app directly from the Apps Panel. From the app, click on the dropdown for the URL Type and select the URL type.

When uploading a GenBank Genome, you will only need to use one link. When uploading a GFF Genome, you will need to use two links for the GFF file and the FASTA file.

In the App, click the "+" button for the URLs and paste in the name of the Genome file (GenBank or GFF). Hit the "+" button again and paste in the name of the FASTA file.

Then click the green "Run" button to start the upload. After the App completes the files will appear in your Staging Area, which you can access via the Import tab in the Data Browser.

The genome file(s) are now in your Staging Area. Now you need to import them to your Narrative to use them in analyses.

Last updated