FASTQ/SRA Reads
Formatting and uploading FASTQ and SRA reads files.
In KBase, reads from FASTQ and SRA files can be imported to create reads library data objects. The objects will either be a SingleEndLibrary or a PairedEndLibrary. The tools in KBase can then be used to assemble reads into an “Assembly” data object or to align reads to an “Assembly." After uploading and importing reads data, you may want to refer to the documentation about Assembly and Annotation. Reads can also be used in RNA-seq and expression analysis.
Single-end and paired-end reads can be uploaded in FASTQ or SRA format. For FASTQ files, please ensure that your filename ends with the .fastq, .fnq, or .fq file extension. SRA files should have the .sra file extension. The uploader also accepts compressed files in .zip, .gz, .bz2, .tar.gz, .tar.bz2 formats.
Files can be uploaded into your KBase Staging Area from your local computer or directly from a publicly accessible FTP or HTTP URL.

Bulk Import

FASTQ and SRA reads can be imported as one of the supported bulk import types. You can select multiple assemblies simultaneously from the staging area to import them at once. See the bulk import section of the guide to importing data into the Narrative.

Importing reads files from your computer

Drag & Drop Limitations

The drag & drop from your local computer works for many files, but there is a size limit that depends on your computer and browser. Some users have reported problems around 20GB. For larger files, use Globus Online transfer.

Single-end library in FASTQ format

For this example, we will assume that you have a local copy of the RNA transcripts of the sample SRR228087 from GenBank. This is a single-end library from Illumina sequencing. Follow instructions for obtaining a local copy of data from the GenBank SRA with their sratoolkit. Other methods for obtaining the data will vary from one data provider to the next.
Once the file is on your computer, open the Import tab in the Data Browser and drag the single-end library into your Staging area.
Open the pulldown menu to the right of the filename in your staging area and select “FASTQ Reads."
Now click the import icon (up arrow) to the right of “FASTQ Read”. The slide-out Data Browser will close and an app called “Import FASTQ/SRA File as Reads from Staging Area” will be added to your Narrative.
Notice that the name of the FASTA/FASTQ file is already filled in, as is a suggested Reads Object Name that will be created by the import (you can change that if you like). Adjust the Sequencing Technology and any of the advanced options if needed. Note that this was a metagenomic sample, we would uncheck the box next to Single Genome. When ready, click the green "Run" button to start the import. When the import is finished, your Data Panel will update to show the new SingleEndLibrary object, and a report will appear in the import app cell.

Paired-end library in FASTQ format

There are two ways that KBase and GenBank SRA recognize a paired-end library. In the legacy format, a paired-end library is two files which typically have the same name but have _1 and _2. For example, ERR760546_1.fastq and ERR760546_2.fastq. The other recognized format is called Interleaved. It is an 8-line format where forward and reverse reads alternate. The example above was imported as a SingleEndLibrary object because there was a single input file and the Interleaved box was un-checked.
In this example, we will upload and import a paired-end library for ERR760546 in the 2-file legacy format. Open the Import tab in the Data Browser and drag the two files into the Staging Area.
Copy the reverse reads file name to paste in a later stage.
Open the pulldown menu to the right of the filename under the Import As... column in your Staging Area and select “FASTQ Reads” for the first file in the pair. Then click the import icon (up arrow) to the right of “FASTQ Reads”. The Data Browser slide-out will close and an app called “Import FASTQ/SRA File as Reads from Staging Area” will be added.
Notice that the name of the FASTA/FASTQ file is filled in. A suggested Reads Object Name is also created by the import and can be changed.
You now need to fill in the name of the second file. In the line for “Reverse/Right FASTA/FASTQ File Path”, either paste or type in the name of the second file. The reverse file name is usually a slight variation of the forward file name.
As with the single-end library example, you can make adjustments to the available options. Adjust the Sequencing Technology and any of the advanced options if needed. If this was a single paired-end library, we would have checked the box to the right of Interleaved.
When ready, click the green "Run" button to start the import. When the import is finished, your Data Panel will update to show the new PairedEndLibrary object, and a report will appear in the import app cell.
If you get an error due to a typo in the name of the second file, it is easy to correct. Click the Reset button in the App Cell to allow you to change fields in the app, make the correction, and click the green "Run" button again.

Import a reads library from other sources

In the Staging Area, beneath the box for Drag and Drop, there are other options for adding data to your staging area. You can import reads into KBase using Globus Online, or by supplying a URL for a publicly accessible FTP location, Google Drive, Dropbox, or a direct HTTP link.
There is also an icon with two arrows in a clockwise circle that refreshes the list of genomes that have been uploaded to the Staging Area.
If your reads are in a publicly accessible URL, you can bypass the Staging Area and directly import reads into your Narrative using one of these three apps (which you can find in the Apps panel or the App Catalog):

Transfer reads from JGI

If you are a JGI user, you can transfer public genome reads and assemblies (as well as your private data and annotated genomes) from JGI to your KBase account—see this page for instructions.

Last modified 2mo ago