There is a semi-auto pipeline to run NCBI blast at RCC rcluster.
Split big query file with multiple query sequences into multiple small input files and run blastall(NCBI).
rccbatchblast - given sequences in FASTA format, find similar sequences in a BLAST database at rcluster. It splits the inoput files in to chunks and submits all chunks to the queue. It takes all standard options from ncbi blastall. There are two more options: -s number of sequences in each unit. The input sequence file will be splitted in to many smaller size files. This option defines how many sequences in each splitted file. -q The name of the queue. The jobs will be submitted to the queue. For more detail about queue, please refer to rcc queue
Search Result utilities
rccbatchblast-check - check the results of rccbatchblast
* After submit your job, check if your jobs are done.
* if all jobs succeed, the blast result will merge in check.blast; number of input sequences, number of result queries, and total CPU time will be summarized.
* if jobs failed, or there are duplicated results in units, suspicious folders will be backup with prefix e + original folder name; commands of clean up and resubmission are given at the report.
* Please check and analyst errors and resubmit. All results are written to check.report
* In: original fasta file
* Out: check.report, check.balst
Note: DO NOT use "submit job to the queue".
rccbatchblast is a script which already takes care of the submitting to queue.
Except command is"rccbatchblast", the options are same as NCBI blast, plus the options of queue name and chunk size. please refer to Blast
rccbatchblast -i inputfile -o outputfile -d targetdatabase -p program-name -b bValue -v vValue -s numbe-of-sequence-in-split-unit -q queue-name -m mValue
where default bValue=1; vValue=1; size-of-split-unit=100; queue-name=r1-96h; mValue is N/A. Refer queue at rcluster for more options of queue-name.bjobs -u your-user-name
where your-user-name is the user who run the above RCCBatchBlast.rccbatchblast-check infile
where infile is Original input fasta file to blast.