Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tech [2019/02/15 19:53]
fplmarques
tech [2019/03/29 13:34] (current)
fplmarques [Extract features from NCBI/GenBank annotation]
Line 6: Line 6:
 </​menu>​ </​menu>​
  
-======= Tools for Molecular ​Sistematics ​ ​=======+======= Tools for Molecular ​Systematics ​ ​=======
 ------- -------
 +===== Usefull scripts =====
 +==== Extract features from NCBI/​GenBank annotation ====
 +
 +The code below extracts nucleotide sequences from *.gb files and outputs the CDS found in fasta format.
 +
 +<code python | extract_CDS.py> ​
 +
 +## Script modified from https://​www.biostars.org/​p/​230441/​
 +# This script extract CDS nucleotide sequences from mitochondrial refseq genomes
 +# usage: python extract_CDS.py input_file.gb output_file.fas
 +#
 +from Bio import SeqIO
 +import sys
 +
 +with open(sys.argv[2],​ '​w'​) as nfh:
 +        for rec in SeqIO.parse(sys.argv[1],​ "​genbank"​):​
 +                if rec.features:​
 +                        for feature in rec.features:​
 +                                if feature.type == "​CDS":​
 +                                        nfh.write(">​%s from %s\n%s\n"​ % (
 +                                        feature.qualifiers['​gene'​][0],​
 +                                        rec.name,
 +                                        feature.location.extract(rec).seq))
 +
 + </​code>​
 +
 +
 +The code below extracts aminoacids sequences of CDS from *.gb files and outputs the CDS found in fasta format.
 +
 +<code python | extract_CDSaa.py>​
 +
 +## Script modified from https://​www.biostars.org/​p/​230441/​
 +# This script extract CDS sequences of aminoacids from mitochondrial refseq genomes
 +# usage: python extract_CDSaa.py input_file.gb output_file.fas
 +#
 +from Bio import SeqIO
 +import sys
 +
 +with open(sys.argv[2],​ '​w'​) as ofh:
 +        for seq_record in SeqIO.parse(sys.argv[1],​ '​genbank'​):​
 +                for seq_feature in seq_record.features:​
 +                        if seq_feature.type=="​CDS":​
 +                                assert len(seq_feature.qualifiers['​translation'​])==1
 +                                ofh.write(">​%s from %s\n%s\n"​ % (
 +                                seq_feature.qualifiers['​gene'​][0],​
 +                                seq_record.name,​
 +                                seq_feature.qualifiers['​translation'​][0]))
 +
 + </​code>​
 +
 +The code below extracts tRNA sequences from *.gb files and outputs the tRNA sequences found in fasta format.
 +
 +<code python | extract_tRNA.py>​
 +
 +## Script modified from https://​www.biostars.org/​p/​230441/​
 +# This script extract tRNA sequences from mitochondrial refseq genomes
 +# usage: python extract_tRNA.py input_file.gb output_file.fas
 +#
 +from Bio import SeqIO
 +import sys
 +
 +with open(sys.argv[2],​ '​w'​) as nfh:
 +        for rec in SeqIO.parse(sys.argv[1],​ "​genbank"​):​
 +                if rec.features:​
 +                        for feature in rec.features:​
 +                                if feature.type == "​tRNA":​
 +                                        nfh.write(">​%s from %s\n%s\n"​ % (
 +                                        feature.qualifiers['​product'​][0],​
 +                                        rec.name,
 +                                        feature.location.extract(rec).seq))
 +
 + </​code>​
 +
 +The code below extracts rRNA sequences from *.gb files and outputs the rRNA sequences found in fasta format.
 +
 +<code python | extract_rRNA.py>​
 +
 +## Script modified from https://​www.biostars.org/​p/​230441/​
 +# This script extract rRNA sequences from mitochondrial refseq genomes
 +# usage: python extract_rDNA.py input_file.gb output_file.fas
 +#
 +from Bio import SeqIO
 +import sys
 +
 +with open(sys.argv[2],​ '​w'​) as nfh:
 +        for rec in SeqIO.parse(sys.argv[1],​ "​genbank"​):​
 +                if rec.features:​
 +                        for feature in rec.features:​
 +                                if feature.type == "​rRNA":​
 +                                        nfh.write(">​%s from %s\n%s\n"​ % (
 +                                        feature.qualifiers['​product'​][0],​
 +                                        rec.name,
 +                                        feature.location.extract(rec).seq))
 +
 + </​code>​
 +
 +As the script requires individual files, you can automate sequencial runs by using the following script:
 +
 + <​code>​for file in *.gb ; do python extract_*.py ${file} ${file%.*}.fas ; done </​code>​
 +
 +This code extracts individual genomes from concatenated genomes in GenBank (*.gb) format.
 +
 +<code perl | get_individual_genomes.pl>​
 +#​!/​usr/​bin/​perl
 +#
 +# usage:
 +#​ perl ​ get_individual_genomes.pl input_file.gb
 +#
 +#
 +$file = $ARGV[0];
 +open (FILE, $file);
 + while (<​FILE>​) {
 +   if ($_ =~ m/​^LOCUS\s+(\w+).*/​){
 + $output_file = "​$1\.gb";​
 + open($out, '>',​ $output_file);​
 + print $out "​$_";​
 +   }
 +   if (($_ =~ m/​^[\w|\s|\d].*/​) ne ($_ =~ m/​^LOCUS.*/​)){
 + print $out "​$_";​
 +   }
 +   if ($_ =~ m/​^\/​\/​\n/​){
 + print $out "​$_";​
 + close ($out);
 +   }
 + }
 +close (FILE);
 + </​code>​
 ===== Sequence submission to the NCBI/​GenBank ===== ===== Sequence submission to the NCBI/​GenBank =====
  
Line 280: Line 407:
 -------- --------
  
-==== Wormbox v. 1.0 ====+===== Wormbox v. 1.0 =====
  
 WormBox (Vellutini & Marques, 2011-14) is a plugin to FIJI/ImageJ that was written with the intent to help you to automate the uptake of measurements from sets of images. This macro is based on plotting landmarks upon images from which a table with linear distances will be generated. You can also use it to count structures (//i.e.//, meristic variables). We have not tested how effective timewise this plugin will be in comparison to the traditional methods most people obtain measurements from specimens. We expect it will vary a lot among cases. Ultimately, it will all depend on how easy (and fast) you can produce images from which you can extract the measurements you need. If that is not an issue, we believe you will find that this application will speed up data gathering. You should also consider that, even if you do not speed up your work by using this application,​ using it will provide you with full documentation of your measurements -- which usually is not the case by traditional methods. Be that as it may, give it a try. We will be happy if it turned out to be a good tool for your research. To download the most recent version of the plugin, [[https://​github.com/​nelas/​WormBox|press here]]. WormBox (Vellutini & Marques, 2011-14) is a plugin to FIJI/ImageJ that was written with the intent to help you to automate the uptake of measurements from sets of images. This macro is based on plotting landmarks upon images from which a table with linear distances will be generated. You can also use it to count structures (//i.e.//, meristic variables). We have not tested how effective timewise this plugin will be in comparison to the traditional methods most people obtain measurements from specimens. We expect it will vary a lot among cases. Ultimately, it will all depend on how easy (and fast) you can produce images from which you can extract the measurements you need. If that is not an issue, we believe you will find that this application will speed up data gathering. You should also consider that, even if you do not speed up your work by using this application,​ using it will provide you with full documentation of your measurements -- which usually is not the case by traditional methods. Be that as it may, give it a try. We will be happy if it turned out to be a good tool for your research. To download the most recent version of the plugin, [[https://​github.com/​nelas/​WormBox|press here]].