After properly installed BRIE Python package, two excutable binary files could
be run from command line directly: brie
, brie-diff
. From version 0.2.0,
all preprocessing are divided and moved into BRIE-kit package, which is aimed
to be used in Python2 only.
This is the main program to quitify the fraction of exon inclusion level. In order to automatically learn the informative prior, the predictive features are required. There are two ways to get the annotation and sequence features:
Then you could input the feature file obtained above, and run it like this:
brie -a AS_events/SE.gold.gtf -s Cell1.sorted.bam -f mouse_features.csv.gz -o out_dir -p 15
By default, you will have three output files in the out_dir: fractions.tsv
,
weights.tsv
and samples.csv.gz
.
fractions.tsv
, there are 8 columns:weights.tsv
, there are the weights for the Bayesian regression, with
#Feature+2 lines, involving each features, interpret and sigma (a hyperparameter).
There are two columns each line, including the label and the value.sample.csv.gz
, there are the MCMC samples of posterior distribution of
Psi. These samples are used to detect the differential splicing.There are more parameters for setting (brie -h
always give the version you
are using)
Usage: brie [options]
Options:
-h, --help show this help message and exit
-a ANNO_FILE, --anno_file=ANNO_FILE
Annotation file for genes and transcripts in GTF or
GFF3
-s SAM_FILE, --sam_file=SAM_FILE
Sorted and indexed bam/sam files, use ',' for
replicates e.g., rep1.sorted.bam,sam1_rep2.sorted.bam
-o OUT_FILE, --out_file=OUT_FILE
Prefix of the output files with full path
-f FACTOR_FILE, --factor_file=FACTOR_FILE
Features in csv.gz file to predict isoform expression.
Optional arguments:
-p NPROC, --nproc=NPROC
Number of subprocesses [default: 4]
-w WEIGHT_FILE, --weight_file=WEIGHT_FILE
File with weights, an output of Brie.
-y FTYPE, --ftype=FTYPE
Type of function target: FPKM, Y, Psi [default: Y].
--fLen=FRAG_LENG Two arguments for fragment length: mean and standard
diveation, default: auto-detected
--bias=BIAS_ARGS Three argments for bias correction:
BIAS_MODE,REF_FILE,BIAS_FILE(s). BIAS_MODE: unif,
end5, end3, both. REF_FILE: the genome reference file
in fasta format. BIAS_FILE(s): bias files from dice-
bias, use '---' for time specific files, [default:
unif None None]
--sigma=_SIGMA Sigma in Bayesian regression: the Gaussian standard
deviation of residues [default: Auto].
--lambda=_LAMBDA Lambda in Bayesian regression: the coeffiecient of L2
constrain on weights [default: 0.1].
--mcmc=MCMC_RUN Four arguments for in MCMC iterations:
save_sample,max_run,min_run,gap_run. Required:
save_sample =< 3/4*mim_run. [default: 500 5000 1000 50]
Hyperparamers
sigma
is the square rooted variance of Gaussian noise in Bayesian
regression. By default, it will learn it automatically. Alternatively, you
could set it with your experience, for example, 3 might be a good option.lambda
is the constrain on weights of Bayesian regression. 0.1 is good
option in ENCODE data.weight_file
is fixed weights for Bayesian regression. Therefore, the
prior is predicted from the input weight file and its sequence features.This command allows to detect differential splicing between many cells pair-wisely, including just two cells, by calculating Bayes factor. You could run it as follows:
For two cells (-p 1 --minBF 0
gives all events in the same order. Speed:
10-20 second with 1 CPU)
brie-diff -i cell1/samples.csv.gz,cell2/samples.csv.gz -o c1_c2.diff.tsv -p 1 --minBF 0
For many cells (gives events with BF>10
. Speed: 100 cells in ~10min with 30
CPUs)
fileList=cell1/samples.csv.gz,cell2/samples.csv.gz,cell3/samples.csv.gz,cell4/samples.csv.gz
brie-diff -i $fileList -o c1_c4.diff.tsv
Then you will have two output files. The first one (in the format of xxx.diff.tsv) contains all Bayes factor passing the threshold, and it has with 15 columns:
Note
Bayes factor is different from p value in hypothesis test. A good threshold
could be Bayes factor > 10
as differential splicing event between two
cells.
Also another file ranks these splicing events by the number of cell paris with
differential splicing. It has 4 columns: gene_id
, cell_pairs
,
mean_BF
, median_BF
.
There are more parameters for setting (brie-diff -h
always give the version
you are using):
Usage: brie-diff [options]
Options:
-h, --help show this help message and exit
-i IN_FILES, --inFiles=IN_FILES
Input files of Brie samples for multiple cells, comma
separated for each cell, e.g., cell1,cell2,cell3
-o OUT_FILE, --outFile=OUT_FILE
Output file with full path
Optional arguments:
-p NPROC, --nproc=NPROC
Number of subprocesses [default: 4]
-n BOOTSTRAP, --bootstrap=BOOTSTRAP
Number of bootstrap [default: 1000]
--minBF=MINBF Minimum BF for saving out, e.g., 3 or 10. If it is 0,
save all events [default: 10]
One typical example on 130 mouse cells during gastrulation is in this folder, from which you will quantify the splicing with BRIE, identify the highly variable splicing events and visualise them with sashimi plot. https://github.com/huangyh09/brie/tree/master/example/gastrulation
There are some earlier examples: https://sourceforge.net/projects/brie-rna/files/examples/