KBase Documentation
  • KBase Documentation
  • KBase Terms & Conditions
  • Getting Started
    • Signing Up and Signing In
      • Step-by-Step Signup Guide
      • Authentication Update
    • Supported Browsers
    • Narrative Quick Start
    • Narrative Interface User Guide
      • Access the Narrative Interface
      • Tour the Narrative Interface
      • Narrative Navigator
      • Create a Narrative
      • Explore Data
      • Add Data to Your Narrative
      • Browse KBase Analysis Tools
      • Analyze Data Using KBase Apps
      • Job Browser
      • Revise Your Narrative
      • Format Markdown Cells
      • Share Narratives
      • Linking Static Narratives to ORCID
      • Access and Copy Narratives
      • Organizations
    • FAQs
  • Manage Your Account
    • Linking Accounts
    • Linking KBase to ORCiD
  • Working with Data
    • Data Upload and Download Guide
      • Data Types
      • Importing Data
        • Bulk Import Limitations
      • Assembly
      • Genome
      • FASTQ/SRA Reads
      • Flux Balance Analysis (FBA) Model
      • Media
      • Expression Matrix
      • Phenotype Set
      • Amplicon Matrix
      • Chemical Abundance Matrix
      • SampleSet
      • Compressed/Zipped Files
      • Bulk Import Specification
      • Downloading Data
    • Searching, Adding, and Uploading Data
    • Filtering, Managing, and Viewing Data
    • Linking Metadata
      • Ontologies and Validated Terms
    • Public Data in KBase
    • Transfer Data with Globus
  • Using Apps
    • Analysis Apps in KBase
      • Assembly & Annotation
      • Comparative Genomics
      • Metabolic Modeling
      • Metagenomics & Community Exploration
      • Data Matrices - Amplicon, Stats
      • Chemical Abundance
      • Expression & Transcriptomics
    • Apps in Beta
  • Running Common Workflows
    • Assembling & Annotating Microbial Genomes
      • FAQ: Assembly and Annotation
    • Comparative Genomics & Phylogenetic Analysis
      • FAQ: Comparative Genomics
    • Metagenomic & Community Analysis
      • FAQ: Metagenomics & Community Analysis
    • Transcriptomic Analysis
      • FAQ: RNA-seq Analysis
    • Constructing Metabolic Models
      • Constructing and Analyzing Metabolic Flux Models of Microbial Communities
      • FAQ: Metabolic Modeling
  • Community Developed Workflows and Tools
    • Functional Annotation
    • Functional and Taxonomic Profiling of MAGs
    • Taxonomy
    • Viral
    • Random Walk with Restart Toolkit
  • Troubleshooting
    • Problems with the User Interface
    • Help Board
    • How to Report Issues
    • Job Errors and Their Meanings
      • Common Job Errors
        • The Job Log
      • Import Job Errors
      • Assembly App Errors
      • Annotation App Errors
      • Functional Genomics App Errors
      • Modeling App Errors
  • Developing Apps
    • The KBase SDK
    • Create a KBase Developer Account
    • KBase GitHub Repository
  • External Links
    • KBase Narrative Interface
    • KBase web site
    • KBase App Catalog
  • kbase.us
Powered by GitBook
On this page
  • Read Processing
  • Assembly
  • Annotation

Was this helpful?

  1. Using Apps
  2. Analysis Apps in KBase

Assembly & Annotation

Some of the tools in KBase available for Assembly and Annotation

PreviousAnalysis Apps in KBaseNextComparative Genomics

Last updated 7 months ago

Was this helpful?

KBase provides multiple Apps for de novo of prokaryotic Next-Generation Sequencing (NGS) reads from various sequencing platforms. These assemblies can then be to explore structural and functional features of a Genome or use it in other analyses. The are a good way to learn about these workflows

Read Processing

  • – Read trimming and adaptor removal

  • – Filter low complexity reads

  • – Quality assessment and reporting

  • – Custom adapter removal

Assembly

De novo of Illumina and Ion Torrent next-generation sequencing reads. Supports single-end and paired-end read libraries.

  • – is a highly-parallelized port of JGI’s Meraculous assembler. Meraculous is a de Bruijn graph-based which increases speed by not performing error correction. Instead, it bases contigs on already high-quality scores and fills the gaps based on localized assemblies from the reads. HipMer enhances the speed of Meraculous.

  • – is an iterative graph-based assembler for single-cell and standard short read data and is good for data of highly uneven sequencing depth. This assembler uses an iterative approach for selecting k-mer size that compensates for the information loss associated with single k-mer based de Bruijn graphs, making IDBA-UD one of the more accurate microbial assemblers.

  • – is a short read assembler that combines the benefits of de Bruijn graph and overlap layout consensus assembly approaches. The main concept is the creation of super-reads that contain sequence information present in the original reads, which super-reads are then extended in both directions using an efficient k-mer lookup table. MaSuRCA is one of a smaller set of assemblers biologists use for eukaryotic assembly.

  • – is a single node assembler for large and complex metagenomics NGS reads. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly, making it fast and especially suitable for assembly of small metagenomes, metatranscriptomes or low-coverage data in general.

  • – is a single-cell and standard assembler based on paired de Bruijn graphs, considered to be one of the most accurate microbial assemblers. SPAdes employs a multisized de Bruijn graph which detects and removes bubble and chimeric reads, estimates insert distance from paired kmers, and computes contigs based on paired assembly graph.

  • – is a classic de Bruijn graph based assembler that works by efficiently manipulating de Bruijn graphs through simplification and compression. It eliminates errors and resolves repeats by first using an error correction algorithm that merges sequences together. Repeats are then removed from the sequence via the repeat solver that separates paths which share local overlaps.

  • – Assess the output assemblies from different configurations of the same assembler, or compare assemblies from multiple assemblers to determine which one is optimal for downstream analysis.

Annotation

The output of the annotation apps is a Genome, which is displayed in a tabular genome viewer (see below) that shows information about the Genome as well as a list of contigs and the genes that were called on each contig.

Genomes can be with Prokka or RAST.

– identifies protein domains from widely used domain libraries (COGs, TIGRfams, Pfam).

– combines multiple open-source annotation tools in a quick and thorough annotation pipeline for prokaryotic sequences for genomes, plasmids, and metagenomes.

– uses components from the RAST () toolkit to annotate an assembled bacterial or archaeal genome.

– uses RAST to annotate a prokaryotic genome, to update the annotations of a genome, or to perform computations on a set of genomes so that they are consistent.

– performs functional annotation of plant cDNA or protein sequences.

– uses components from the RAST () toolkit to annotate a set of genomes or assemblies.

assembly
annotated
interactive tutorials
Trim Reads with Trimmomatic
Filter Out Low-Complexity Reads with PRINSEQ
Assess Read Quality with FastQC
Cutadapt
assembly
Assemble with HipMer
HipMer
Assemble with IDBA-UD
IDBA-UD
Assemble with MaSuRCA
MaSuRCA
Assemble with MEGAHIT
MEGAHIT
Assemble with SPAdes
SPAdes
Assemble with Velvet
Velvet
Compare assemblies with QUAST
annotated
Annotate Domains in a Genome
Annotate Assembly with Prokka
Annotate Microbial Assembly
Rapid Annotations using Subsystems Technology
Annotate Microbial Genome
Annotate Plant Coding Sequences with Metabolic Functions
Bulk Annotate Genomes/Assemblies
Rapid Annotations using Subsystems Technology
ViewContig