Formatting and uploading FASTQ and SRA reads files.
In KBase, reads from FASTQ and SRA files can be imported to create reads library data objects. The objects will either be a SingleEndLibrary or a PairedEndLibrary. The tools in KBase can then be used to assemble reads into an “Assembly” data object or to align reads to an “Assembly." After uploading and importing reads data, you may want to refer to the documentation about Assembly and Annotation. Reads can also be used in RNA-seq and expression analysis.
Single-end and paired-end reads can be uploaded in FASTQ or SRA format. For FASTQ files, please ensure that your filename ends with the .fastq, .fnq, or .fq file extension. SRA files should have the .sra file extension. The uploader also accepts compressed files in .zip, .gz, .bz2, .tar.gz, .tar.bz2 formats.
Files can be uploaded into your KBase Staging Area from your local computer or directly from a publicly accessible FTP or HTTP URL.
Using a file on your computer, open the Import tab within the Data Browser. Then drag & drop the single-end library into your Staging Area. Open the pulldown menu to the right of the filename in the Staging Area and select “FASTQ Reads NonInterleaved" or "SRA Reads."
FASTQ Single-end Library
SRA Single End Library
Make sure the correct file type is selected and the checkbox is active, then click "Import Selected". The slide-out Data Browser will close and an app called “Import FASTQ/SRA File as Reads from Staging Area” will be added to your Narrative.
Notice that the name of the Reads Library file is already filled in, as is a suggested Reads Object Name that will be created by the import (you can change that if you like). Adjust the Sequencing Technology and any of the advanced options if needed. Note that this was a metagenomic sample, we would uncheck the box next to Single Genome. When ready, click the green "Run" button to start the import. When the import is finished, your Data Panel will update to show the new SingleEndLibrary object, and a report will appear in the import app cell.
There are two ways that KBase and GenBank SRA recognize a paired-end library. A paired-end library can be either two files which typically have the same name and are designated as forward and reverse or a single interleaved file. Interleaved files use an 8-line format where forward and reverse reads alternate.
Open the Import tab in the Data Browser and drag either one interleaved file or two paired files into the Staging Area.
Open the pulldown menu to the right of the filename under the Import As... column in your Staging Area and select “FASTQ Reads NonInterleaved” for the first file in the pair. Make sure the correct file type is selected and the checkbox is active, then click "Import Selected". The Data Browser slide-out will close and the “Import FASTQ/SRA File as Reads from Staging Area” App will open.
Notice that the name of the FASTA/FASTQ file is filled in. A suggested Reads Object Name is also created by the import and can be changed.
If the files are two paired file, you will need to select a file for “Reverse/Right FASTA/FASTQ File Path” from the dropdown.
Paired-end or non-interleaved FASTQ Import
As with the single-end library example, you can make adjustments to the available parameters. Adjust the Sequencing Technology and any of the advanced options if needed. If the file is an interleaved paired-end library, check the box to the right of Interleaved.
Interleaved FASTQ Import
When ready, click the green "Run" button to start the import. When the import is finished, your Data Panel will update to show the new PairedEndLibrary object, and a report will appear in the Import App.
In the Staging Area, beneath the box for Drag and Drop, there are other options for adding data to your staging area. You can import reads into KBase using Globus Online, or by supplying a URL for a publicly accessible FTP location, Google Drive, Dropbox, or a direct HTTP link.
If your reads are in a publicly accessible URL, you can bypass the Staging Area and directly import reads into your Narrative using one of these three apps (which you can find in the Apps panel or the App Catalog):
Note that directly importing these files into KBase from the web behaves the essentially the same as uploading to the Staging Area and then importing to a Narrative, except the transfer is carried out by the Importer App. Use the link available to directly download the file as if you were going to save it to your computer.
For SRA reads from NCBI, this generally means navigating to the Data access tab of the Run Browser in the NCBI Sequence Read Archive (SRA) for the reads to import, as seen here:
For this example, SRR18272216, navigate to the Data access tab and copy the SRA-download link under Name and paste the full link into the Import from Web App.
Copy the SRA-download link located under the Name heading and paste the link into the URL input within the Importer App.
For how to search and download or locate download links for SRA sequences, see the NCBI Search and Download documentation.
The drag & drop from your local computer works for many files, but there is a size limit that depends on your computer and browser. Some users have reported problems around 20GB. For larger files, use Globus Online transfer.
FASTQ and SRA reads can be imported as one of the supported bulk import types. You can select multiple assemblies simultaneously from the staging area to import them at once. See the bulk import section of the guide to importing data into the Narrative.
If you are a JGI user, you can transfer public genome reads and assemblies (as well as your private data and annotated genomes) from JGI to your KBase account—see this page for instructions.