Generating and handling Amplicon Matrices to link to Samples
Any approach to creating an amplicon matrix should be usable in KBase, but may require additional curation. Amplicon matrices or taxonomic abundance matrices include OTU (operational taxonomic units) and ASV (amplicon sequence variants).
We recommend to upload raw counts as analysis (rarefaction, standardization, and normalization) will be tracked and provenance maintained on system. There are a number of tools in KBase to analyze Amplicon Matrices and metadata.
Amplicon matrices can be uploaded from a TSV (tab-separated values) file with a .tsv file extension.
When uploading, ensure that rows are taxa and columns are samples. A column for taxonomy is optional.
Consensus sequences must also be included as a FASTA file with extension .fa, .fasta. Sequence names must match exactly with the taxa names in the amplicon matrix.
Two files are needed for uploading amplicon data:
- FASTA file with consensus sequences
- Taxonomic abundance matrix (OTU or ASV)
- row = taxon
- column = sample
- optional column = taxonomy
Additional information to have on hand for parameters include:
- Sample processing metadata (SampleSet), i.e. sequencing technology or platform, target gene or region
- Bioinformatic processing metadata, i.e. clustering method, quality filtering steps
Amplicon Matrices can be uploaded into KBase without linking metadata. For full functionality of analysis tools using metadata, first upload and import the SampleSet.
Using a file on your computer, open the Import tab within the Data Browser. Then drag & drop the amplicon matrix file and the FASTA file with consensus sequences into the Staging Area.
Once the files are in your Staging Area, you can import the data into your Narrative.
Go to the APPS panel and open the Upload category to locate the "Import Amplicon Matrix from TSV/FASTA File in Staging Area" App. Click on the Import Amplicon App to add it to your Narrative.
Import Amplicon Matrix from TSV/FASTA File in Staging Area. Double click to view.
In the first section for Input Objects, select the previously imported SamplesSet file from the dropdown menu for linking metadata.
Under Parameters, select the Amplicon Matrix or Taxonomic Abundance TSV file using the dropdown and then the corresponding sequences FASTA file.
Parameters that require inputs are the target gene, target subfragment, taxon calling method, i.e. denoising or clustering, and sequence error cutoff.
Additional processing metadata, i.e. primer sequences, library kit, sequencing center, the denoise or clustering methods can be input by clicking "show advanced" parameters. While these parameters are not required, they are recommended to enhance workflow documentation and reproducibility.
Fill in the Amplicon Matrix Object Name and click the "Run" button.
Once your amplicon matrix is imported into the Narrative, you can use the "View Matrix as Table" App to view your new AmpliconMatrix object.
Example of test amplicon matrix with consensus FASTA
The amplicon matrix may be applied to an existing SampleSet and Attribute Mapping on-system to facilitate the interoperability of objects in KBase.
Importing Samples and Amplicon