Import Job Errors

This is a list of error messages found within the Job Log for Import Jobs, what they mean and how to go about fixing them or if a job ticket needs to be submitted.

Types of Import Job Errors

UE=User fixable or possible user error
KE=KBase error

Use your Browser's search tool to paste in the error to locate possible fixes or next steps.

You can always submit a ticket for help, questions, or follow-up to the KBase Help Board.

Common Import Job Error Messages

`Cannot connect to URL: ftp://ftp.imicrobe.us. ...`

UE: The provided URL cannot be accessed from within KBase.

Recheck the URL and permissions. Then try resubmitting the job.

`Invalid FTP Link: ...`

UE: The provided URL cannot be accessed from within KBase. Perhaps the option for ‘Direct’ download should be specified instead of ‘FTP’ (e.g., when downloading from the SRA)

Recheck the URL and permissions. Then try resubmitting the job.

`Invalid Google Drive Link ...`

UE: The provided URL cannot be accessed from within KBase.

Recheck the URL and its permission. Then try resubmitting the job.

Bulk or Batch Imports

`(2, No such file or directory)`

UE: The file is not in the Staging Area.

Verify that the name is correct and upload is complete. Then try resubmitting the job.

Import FASTQ/SRA File as Reads from Staging Area

`(2, No such file or directory)`

KE: The fastqdump ran but the file names are not the expected names.

Use the long workaround here.

`SRA input file type selected. But missing SRA file`

UE: The format of the file is not recognized.

Recheck the file and try resubmitting the job.

`Invalid FASTQ file ...`

KE/UE: Sometimes the user has specified the file name wrong. It can also happen because the importer has problems with file names that end in ".1"

Use the long workaround here.

`Error running command: /kb/deployment/bin/fastq-dump ...`

UE: The file does not appear to be in the expected SRA format.

Recheck the file and try resubmitting the job.

`Error running command:pigz ...`

UE: The file could not be unzipped by KBase and most likely couldn’t be unzipped by the user either.

Verify the file is can be unzipped locally.

`Both SRA and FASTQ/FASTA file given.`

UE: The inputs should be either all fastq/a or all SRA.

Modify the inputs, then try resubmitting the job.

`Same file [XXX.XXXX.gz] is used for forward and reverse. Please select different files and try again.`

UE: There are names for both a forward and reverse strand and they are identical.

A Single-end read library only needs one name. A Paired-end read library needs two files with different names.

`File /kb/XXX.fasta is not a FASTQ file`

UE: Either the file is not in fastq format or the file extension is not recognized.

Recheck that the file is in the right format. Change the extension to .fastq if needed, then try resubmitting the job.

`Invalid FASTQ file`

UE: Possible issues

The fastq file includes one or more sequences that are less than 10 bases. Short reads are a problem for some tools.
The fastq file doesn't have the right number of lines. For example, the lines in a single-end file needs to be a multiple of four and interleaved paired-end library should be a multiple of eight.
The options haven't been selected correctly. For example, using an interleaved fastq file but failing to check the Interleaved box. The documentation on FASTQ/SRA Reads may be helpful.
The file might not have the right filename to be recognized.
- The file is an SRA file and not FASTQ.
DOS-style carriage-return line files along with new-lines. Our fasta validation doesn't handle this properly. To remove the carriage return characters use this unix command: tr -d '\015' < 1.fastq >cleaned_1.fastq

`Reading FASTQ record failed - non-blank lines are not a multiple of four.`

UE: The number of lines in the FASTQ file are not a multiple of four.

Recheck the file and try resubmitting the job.

`Interleave failed - reads files do not have an equal number of records….`

UE: Something went wrong trying to interleave the Paired-end files.

Recheck the line count of the files. Hidden carriage returns or linefeeds in the file could contribute to the problem.

`Deinterleave failed - line count is not divisible by 8`

UE: The interleaved file does not appear to be the correct format.

Recheck the file and try resubmitting the job.

`Object 1: Illegal character in object name`

UE: The name of the output reads object can’t have spaces or special characters.

Rename the output file and then try resubmitting the job.

Import FASTA File as Assembly from Staging Area

`There are no contigs to save, thus there is no valid assembly.`

UE: There are no contigs that passed the minimum contig size.

Adjust the minimum contig size or other optional parameters. Then try resubmitting the job.

`The FASTA header key XXX appears more than once in the file`

UE: The FASTA header lines may not be unique.

Recheck the format of the header lines and try resubmitting the job.

`This FASTA file has non nucleic acid characters`

UE: The file appears to be proteins or special characters instead of DNA.

Recheck the file contents, and then try resubmitting the job.

`This FASTA file may have amino acids in it instead of the required nucleotides.`

UE: The file appears to be proteins instead of DNA.

Recheck the file contents, and then try resubmitting the job.

`FASTQ/FASTA input file type selected. But missing FASTQ/FASTA file`

UE: Selected file does not match the import selected.

Select a valid combination and try resubmitting the job.

`(\utf-8\, b\PK\x03\x04\x14\x00\x08…….`

UE: Attempt to import a zip file with multiple files as a single data object.

Run the App ' Unpack a Compressed File in Staging Area' on the file and retry resubmitting the job.

Import GenBank File as Genome from Staging Area

`Duplicate gene ID: XXXX_xxxx`

UE: Gene IDs within the input file are not unique.

Edit gene IDs and try resubmitting the job.

`The input directory does not have any files with one of the following extensions .gbff,.gbk,.gb,.genbank,.dat,.gbf`

UE: The app only recognizes the listed file extensions as valid GenBank files.

Change the file extension and try resubmitting the job.

`XXX is not a valid KBase taxon ID`

UE: The Taxonomy ID in the advanced parameters is optional and needs to be an integer when specified. The user provided the text string ‘XXX’.

Use an integer taxon ID or leave it blank. The information will be picked up from the GenBank file or from the scientific name.

Import GFF3/FASTA file as Genome from Staging Area

`Every feature sequence id must match a fasta sequence id`

UE: The IDs in the ‘sequence source’ lines must match the header lines in the FASTA file in the GFF format.

Correct the GFF format and try resubmitting the job.

`unable to parse >.....`

UE: The file may not be in GFF format.

Recheck the file format and try resubmitting the job.

`Features must be completely contained within the Contig in the Fasta file.`

UE: The coordinates for the feature are outside the bounds of the contig.

Recheck the file where indicated and try resubmitting the job. In rare instances, the GFF file contains a feature that wraps around the 0 position and the coordinates look like the feature goes off the end of the sequence. The options are to 1) remove the feature from the GFF file, 2) edit the feature so that it is in two parts, or 3) find a GenBank formatted version of the file and resubmit.