Programming languages: C/Bash shell script/AWK Execution environment (recommended): Platform: Linux x86-64 (kernel 2.6.32 or later; recommended distribution: Bio-Linux 6 or later versions) Memoery: at least 2GB. BLAT shall be executable anywhere. Installation/Uninstallation: Type "sh INSTALL.sh" or "./INSTALL.sh" to complete the installation. Type "sh UNINSTALL.sh" or "./UNINSTALL.sh" to uninstall this tool. Initialization (only once whenever the target species and its annotation remain unchanged): Init_ENSTs.sh [INPUT_ENSTS] [INPUT_ENSTS_CDS_PHASE] Init_Genomes.sh [GENOME] [DIR_CHROMOSOMES] [INPUT_ENSTS]: Gene annotation (csv/tsv) of the target species downloaded from Ensembl BioMart. Required 19 attributes (must follow the order below): Ensembl Gene ID, Ensembl Transcript ID, Chromosome Name, Gene Start (bp), Gene End (bp), Strand, Transcript Start (bp), Transcript End (bp), 5' UTR Start, 5' UTR End, 3' UTR Start, 3' UTR End, CDS Start, CDS End, Exon Chr Start (bp), Exon Chr End (bp), Gene Biotype, Transcript Biotype, Status (transcript). [INPUT_ENSTS_CDS_PHASE]: Another gene annotation (csv/tsv) of the target species downloaded from Ensembl BioMarkt. Required 11 attributes (must follow the order below): Ensembl Gene ID, Ensembl Transcript ID, Chromosome Name, Strand, Exon Chr Start (bp), Exon Chr End (bp), Exon Rank in Transcript, phase, CDS Start, CDS End, Transcript Biotype. [GENOME]: The whole genome (fasta) of the target species downloaded from UCSC/Ensembl database. [DIR_CHROMOSOMES]: The directory specified by the user for storing the target genome that will be split into several files according to the chromosome names. ***************************************************************************************************************************************************** NE-Extractor.sh: The core procedure of ExonFinder.sh. Execution (usage): NE-Extractor.sh [BLAT_OUTPUT] [INPUT_ENSTS] [INPUT_ENSTS_CDS_PHASE] [DIR_CHROMOSOMES] \ [EXPRESSED_SEQUENCES] [CDNA_LIBRARY] [MAX_GAP_LEN] [TASK_NAME] [IS_INTER_SPECIES] [BLAT_OUTPUT]: An output file of BLAT (psl format)。 [INPUT_ENSTS]: As described in the initialization steps. Note: Be sure to run "Init_ENSTs.sh [INPUT_ENSTS] [INPUT_ENSTS_CDS_PHASE]" first. [INPUT_ENSTS_CDS_PHASE]: As described in the initialization steps. Note: Be sure to run "Init_ENSTs.sh [INPUT_ENSTS] [INPUT_ENSTS_CDS_PHASE]" first. [DIR_CHROMOSOMES]: As described in the initialization steps. Note: Be sure to run "Init_Genomes.sh [GENOME] [DIR_CHROMOSOMES]" first. [EXPRESSED_SEQUENCES]: The expressed sequences (e.g., 454/EST reads in fasta format) which were used to obtain [BLAT_OUTPUT]. [CDNA_LIBRARY]: The cDNA (fasta) of the target species downloaded from Ensembl databases. [MAX_GAP_LEN]: Maximum gap length between contiguous segments aligned by BLAT. Two contiguous segments between which there are no more than [MAX_GAP_LEN] gaps will be regarded as one segment. [TASK_NAME]: A label for the current task. No space is allowed. [IS_INTER_SPECIES]: Logical value. If [BLAT_OUTPUT] is obtained from cross-species BLAT alignment, enter "1" here. Otherwise, enter "0" here. Output file: [TASK_NAME]_identified_candidates.tsv Columns in order: chr, start (1-base), end (1-base), strand, transcript ID, novel exonic length, AS type (CASSETTE or RETAIN), splicing site motifs, genomic type (3'UTR/5'UTR/CDS), coordinates of flanking exons, #supporting reads, supporting reads. For "start", "end", and "splicing sites motifs": Two or more numbers/motifs separated by semicolons stand for events of multiple cassette-on exons. For "coordinates of flanking exons" (1-base): strand "+": upstream flanking exon 5'end, upstream flanking exon 3'end; downstream flanking exon 5'end, downstream flanking exon 3'end strand "-": downstream flanking exon 3'end, downstream flanking exon 5'end; upstream flanking exon 3'end, upstream flanking exon 5'end ***************************************************************************************************************************************************** ExonFinder: The pipeline of deriving novel cassette exons and retain-introns with cross-species support. Step 1: Obtain candidates with cross-species ESTs support. NE-Extractor.sh [NON-TARGET_BLAT_OUTPUT] [TARGET_ENSTS] [TARGET_ENSTS_CDS_PHASE] [TARGET_DIR_CHROMOSOMES] \ [NON-TARGET_EXPRESSED_SEQUENCES] [TARGET_CDNA_LIBRARY] [MAX_GAP_LEN] [NON-TARGET_TASK] 1 Step 2: Obtain candidates with target-species ESTs support. NE-Extractor.sh [TARGET_BLAT_OUTPUT] [TARGET_ENSTS] [TARGET_ENSTS_CDS_PHASE] [TARGET_DIR_CHROMOSOMES] \ [TARGET_EXPRESSED_SEQUENCES] [TARGET_CDNA_LIBRARY] [MAX_GAP_LEN] [TARGET_TASK_NAME] 0 Step 3: Merge the candidates identified in Step 1 and 2. Merge_Results.sh [NON-TARGET_TASK] [TARGET_TASK_NAME] [OPTION] [OPTION]: 0: Only with cross-species support; 1: With target species support