KBase Documentation
  • KBase Documentation
  • KBase Terms & Conditions
  • Getting Started
    • Signing Up and Signing In
      • Step-by-Step Signup Guide
      • Authentication Update
    • Supported Browsers
    • Narrative Quick Start
    • Narrative Interface User Guide
      • Access the Narrative Interface
      • Tour the Narrative Interface
      • Narrative Navigator
      • Create a Narrative
      • Explore Data
      • Add Data to Your Narrative
      • Browse KBase Analysis Tools
      • Analyze Data Using KBase Apps
      • Job Browser
      • Revise Your Narrative
      • Format Markdown Cells
      • Share Narratives
      • Linking Static Narratives to ORCID
      • Access and Copy Narratives
      • Organizations
    • FAQs
  • Manage Your Account
    • Linking Accounts
    • Linking KBase to ORCiD
  • Working with Data
    • Data Upload and Download Guide
      • Data Types
      • Importing Data
        • Bulk Import Limitations
      • Assembly
      • Genome
      • FASTQ/SRA Reads
      • Flux Balance Analysis (FBA) Model
      • Media
      • Expression Matrix
      • Phenotype Set
      • Amplicon Matrix
      • Chemical Abundance Matrix
      • SampleSet
      • Compressed/Zipped Files
      • Bulk Import Specification
      • Downloading Data
    • Searching, Adding, and Uploading Data
    • Filtering, Managing, and Viewing Data
    • Linking Metadata
      • Ontologies and Validated Terms
    • Public Data in KBase
    • Transfer Data with Globus
  • Using Apps
    • Analysis Apps in KBase
      • Assembly & Annotation
      • Comparative Genomics
      • Metabolic Modeling
      • Metagenomics & Community Exploration
      • Data Matrices - Amplicon, Stats
      • Chemical Abundance
      • Expression & Transcriptomics
    • Apps in Beta
  • Running Common Workflows
    • Assembling & Annotating Microbial Genomes
      • FAQ: Assembly and Annotation
    • Comparative Genomics & Phylogenetic Analysis
      • FAQ: Comparative Genomics
    • Metagenomic & Community Analysis
      • FAQ: Metagenomics & Community Analysis
    • Transcriptomic Analysis
      • FAQ: RNA-seq Analysis
    • Constructing Metabolic Models
      • Constructing and Analyzing Metabolic Flux Models of Microbial Communities
      • FAQ: Metabolic Modeling
  • Community Developed Workflows and Tools
    • Functional Annotation
    • Functional and Taxonomic Profiling of MAGs
    • Taxonomy
    • Viral
    • Random Walk with Restart Toolkit
  • Troubleshooting
    • Problems with the User Interface
    • Help Board
    • How to Report Issues
    • Job Errors and Their Meanings
      • Common Job Errors
        • The Job Log
      • Import Job Errors
      • Assembly App Errors
      • Annotation App Errors
      • Functional Genomics App Errors
      • Modeling App Errors
  • Developing Apps
    • The KBase SDK
    • Create a KBase Developer Account
    • KBase GitHub Repository
  • External Links
    • KBase Narrative Interface
    • KBase web site
    • KBase App Catalog
  • kbase.us
Powered by GitBook
On this page
  • I am very new to shotgun metagenomics based assembly and annotations, and there are many apps are listed in KBase. Does KBase have a pre-developed workflow for shotgun metagenomics, starting from assembly, annotations and metabolic pathways mining?
  • Why extract genome sequences from metagenomes rather than working with unassembled genome sequences?
  • What is the best way to assemble 100-150bp sequencing data in order to recover MAGs?
  • What causes the contamination in the bins? And what is considered a high quality bin for filtering them out?
  • How would I exclude eukaryotes and viruses from a marine metagenome?

Was this helpful?

  1. Running Common Workflows
  2. Metagenomic & Community Analysis

FAQ: Metagenomics & Community Analysis

PreviousMetagenomic & Community AnalysisNextTranscriptomic Analysis

Last updated 3 years ago

Was this helpful?

I am very new to shotgun metagenomics based assembly and annotations, and there are many apps are listed in KBase. Does KBase have a pre-developed workflow for shotgun metagenomics, starting from assembly, annotations and metabolic pathways mining?

KBase tries to be as flexible as possible, so there are many options. One App that you could consider is the JGI Metagenome Assembly App. It is a beta app with a complete workflow, optimized by the JGI, that goes from raw reads to an assembly using BFC, BBTools for read QC and metaSPAdes. The assembled sequence can be binned using the available binning tools. And the individual bins annotated using the standard prokaryotic annotation Apps (Prokka, RAST).

For annotating the complete metagenome assembly, the Prokka App is being updated to allow this, but it is still in beta. For metabolic modeling of an entire community, there is currently one app, Build Community Metabolic Model that requires a set of genomes or bins as input. It doesn't take the entire metagenome annotation as input since it attempts to model the individual members and the transfer between them. It is possible to make a mixed bag model using the entire metagenome annotation, that can be useful to see entire pathways are present in the metagenome.

Why extract genome sequences from metagenomes rather than working with unassembled genome sequences?

When you have a contiguous fragment of a genome, there will be 1) full-length genes and their protein products, 2) genomic context of the genes [to have a better chance of understanding of which genes are being used as part of the same system/pathway, especially if they are polycistronic (operons)], 3) more accurate phylogenetic placement with consensus placement from multiple genes, and possibly even a clade-specific phylogenetic marker.

What is the best way to assemble 100-150bp sequencing data in order to recover MAGs?

Effective MAG recovery is highly dependent on your sample. If there is a lot of diversity in the sample and/or low read coverage, MAGs are more challenging to recover, regardless of the tools used. This is partly why we recommend evaluating your data prior to assembly, so you can get some idea of what your data look like.

What causes the contamination in the bins? And what is considered a high quality bin for filtering them out?

Contamination in the bins can occur for a variety of reasons, including but not limited to contig mis-assembly, limited diversity in kmer space, horizontal gene transfer. In general, a genome that is 90% complete and <5% contaminated is high-quality. A rough guide to the quality of MAGs and SAGs can be found .

How would I exclude eukaryotes and viruses from a marine metagenome?

Note that this is hard to do, not just in KBase, but in general. To do this 1) annotate the entire metagenome assembly, 2) identify contigs that fall into the different categories of interest (bacteria/archaea/viruses/eukaryotes), 3) filter out contigs belonging to eukaryotes/viruses, 4) summarize remaining results. A major challenge here will be the unambiguous identification of the different domains of life, which is sometimes tricky (e.g. prophages). Another note: file manipulation outside of KBase would be required to perform this task - as currently there are no KBase Apps to complete this task.

here