The
method described above is implemented as C programs and C-shell scripts.
Some limitations of the current implementation are listed below :
-
Dealing with RNA
sequences longer than 200 nucleotides will be time expensive (as all combinatorial
approaches based on a branch and bound algorithm)
-
In this release
we cannot consider consensus secondary structure made of more than 6 helices
(a new version will overcome this limit )
-
It is designed to
run on UNIX environment (checked on Solaris 2.5.1)
Download
Clic
here to download rna.tar. Then execute the command tar xvf
rna.tar . This will create a rna directory that contains the
software to install.
Install
Run
the script rnainstall which is in the rna directory, then
logout and login again. Notice that rnainstall will modify the .cshrc
file.
Compile
Run
the script compile which is in the rna directory. You need
to have the gcc compiler.
Run
The
program should be executed from the directory containing your sequences
(as an example the rna directory contains a data subdirectory
that contains the file rnaSeq). The program takes RNA sequences
(inputs) and marks the occurrences of the candidate consensus structures
(outputs) in the sequences. It is made of 2 scripts alea and rna
that
you must run in this order (although there are located in the bin
directory
you can run them from anywhere).
-
alea
is used to create a random model of the RNA sequences : it
takes a set of unaligned RNA sequences, randomly shuffles them and outputs
a file containing the frequencies of all possible secondary structures
(having at most 6 helices) of these random sequences.
syntax alea
N Mod < Seq
o N
is the number of RNA sequences taken in Seq (from the beginning)
o Mod
is the name of the output file that will contain the random model of the
RNA sequences
o Seq
is the name of the input file containing the RNA sequences (see the file
rnaSeq
in
the data directory to have the format of the sequences)
example alea
100 rnaMod < rnaSeq
where
one creates the random model of 100 RNA sequences contained in the
file rnaSeq. rnaModis
the name of the file which will contain random model.
-
rna
locates the occurrences of candidate consensus secondary structures of
the set of unaligned RNA sequences using a random model of these sequences
syntax rna
N P Mod < Seq > Res
o N
(the same N than in alea) is the number of RNA sequences
o P is
the percentage of RNA sequences where any candidate consensus structure
must be present
o Mod
is the random model computed by alea
o Seq(the
same Seq than in alea) is the file containing the RNA sequences
o Resis
the name of the file that will contain the results
example rna
100 85 rnaMod < rnaSeq > rnaRes
Compute
the candidate secondary structures that are present in at least 85%
of 100 RNA sequences contained in rnaSeq, and whose frequencies
in then random model rnaMod are low. The results are in rnaRes
(see
the file rnaRes in
the data directory to have the format of the outputs)
rna is
itself made of 2 scripts, sol and struct, that can be run
separately (in this order) :
-
sol
is used to form the sets of (possible) helices having best free energies
for
each of a set of RNA sequences. This is a very time expensive computing
using a branch and bound algorithm, so we advice to not deal with sequences
longer than 200 nucleotides.
syntax sol
N < Seq > fSol
o N
(the same N than in alea) is the number of RNA sequences
o Seq(the
same Seq than in alea) is the file containing the RNA sequences
o fSolis
the name of the file that will contain the sets of possible helices
example sol
100 < rnaSeq > rnaSol
Compute
the sets of possible helices for each among 100 RNA sequences (contained
in rnaSeq). The results are in rnaSol
-
struct
is used to located the occurrences of candidate consensus secondary structures
from the sets of possible helices of each of a set of RNA sequences and
from a model random of these sequences.
syntax struct P
Mod < fSol > Res
o P is
the percentage of RNA sequences that must contain any candidate consensus
structure
o Mod
is the random model computed by alea
o fSol(the
same fSol than in sol) is the file containing the sets of possible
helices of the RNA sequences used in sol (notice that fSol
also contains the RNA sequences)
o Resis
the name of the file that will contain the results
example struct
85 rnaMod < rnaSol > rnaRes
Compute
the candidate secondary structures from the set of possible helices contained
in rnaSol that are present in at least 85% of the corresponding
RNA sequences and whose frequencies in then random model rnaMod
are low. The results are in rnaRes (see the file rnaRes in
the data directory to have the format of the outputs)
Running
struct
after sol is interesting because it is based on a fast algorithm,
so you can make tries with different values for the parameter P.