FAQ: Assembly and Annotation
Last updated
Was this helpful?
Last updated
Was this helpful?
This page on transferring data from JGI to KBase takes you through the process. You can also start from the KBase search and use the JGI tab.
In general, a genome that is >90% complete and <5% contaminated is high quality. A rough guide to the quality of MAGs and SAGs can be found here:
Assemblers currently have an upper limit of between 180,263,840 paired reads and 240,351,788 reads depending on complexity. If the job has been run twice, exceeded the 7 day limit, and your data is in this size range, it may be too big for KBase at this time.
The “best” assembler often depends on the user. Sometimes the user may want the most contiguous metagenome, or an assembly with minimized assembly artifacts. Users can use multiple assemblers and choose whichever results in the best assembly for their purpose.
You can use the to compare the contig distributions of Assembly objects.
In addition to having different options in the app, their method for assigning the annotation is different. The determination of better or worse is in the eye of the beholder. The primary advantage of RAST is its linking to our metabolic modeling. The RAST functional roles are considered a controlled vocabulary where we map specific RAST annotations to biochemical reactions in the model, so if you plan to build metabolic models, you should annotate with RAST. Because RAST tends to assign more hypothetical proteins, some people will run Prokka first, and then reannotate with RAST using the Retain old annotation for hypotheticals option.
Manual curation of annotations is not supported on-system. RAST and Prokka are likely sufficient for many applications, but as you mention, for difficult-to-annotate or highly divergent metabolic genes you may need to use additional tools. In addition to RAST and Prokka, there are on-system tools available for feature annotation using pre-generated hidden Markov models, which could be useful for higher-resolution annotation.
For fungi, it is recommended to use external tools and then import the annotated genomes into KBase.
Yes, but the results are unstable currently above 200 M reads (Illumina 150bp x 2). Use the to get a combined Reads Library object.
These takes you through the process. You can also start from the KBase search and use the JGI tab.
takes an Assembly object, follows it by gene calling using algorithms from Prodigal and Glimmer, and then functional annotation.
takes a Genome object, does not call genes, and instead preserves the original gene calls. It then re-annotates the genes, overwriting previous annotations.
primarily annotates bacteria and archaea, and may be limited with protists.
Tools available for plant annotations include and . These tools use the PlantSEED curated Database.
Currently there is a tool to construct draft metabolic models of fungal species () that uses highly curated published fungal models as the underlying biochemistry data.