======
Manual
======

After properly installed BRIE Python package, two excutable binary files could 
be run from command line directly: ``brie``, ``brie-diff``. From version 0.2.0, 
all preprocessing are divided and moved into BRIE-kit_ package, which is aimed
to be used in Python2 only. 

.. _BRIE-kit: https://github.com/huangyh09/briekit/wiki

1. BRIE isoform estimate
========================

This is the main program to quitify the fraction of exon inclusion level. In 
order to automatically learn the informative prior, the predictive features are 
required. There are two ways to get the annotation and sequence features: 

1. use our processed annotation file and according sequence features, which you 
   can download from here_. Currently, we produced data for human_ and mouse_. 
   We suggest align RNA-seq reads to the according version of reference genome.

2. generate the annotation and fetch the sequence features with the help of 
   brie-event_ and brie-factor_ by yourself

.. _here: https://sourceforge.net/projects/brie-rna/files/annotation/
.. _human: https://sourceforge.net/projects/brie-rna/files/annotation/human/gencode.v25/
.. _mouse: https://sourceforge.net/projects/brie-rna/files/annotation/mouse/gencode.vM12/
.. _brie-event: https://brie-rna.sourceforge.io/manual.html#splicing-events
.. _brie-factor: https://brie-rna.sourceforge.io/manual.html#sequence-features


Then you could input the feature file obtained above, and run it like this:

::

  brie -a AS_events/SE.gold.gtf -s Cell1.sorted.bam -f mouse_features.csv.gz -o out_dir -p 15

By default, you will have three output files in the out_dir: ``fractions.tsv``, 
``weights.tsv`` and ``samples.csv.gz``. 

- In ``fractions.tsv``, there are 8 columns:

  * column 1: transcript id
  * column 2: gene id
  * column 3: transcript length
  * column 4: reads counts for whole events
  * column 5: FPKM for each isoform
  * column 6: fraction for each isoform, called Psi
  * column 7: lower bound of 95% confidence interval of isoform fraction
  * column 8: higher bound of 95% confidence interval of isoform fraction

- In ``weights.tsv``, there are the weights for the Bayesian regression, with 
  `#Feature+2` lines, involving each features, interpret and sigma (a hyperparameter). 
  There are two columns each line, including the label and the value.

- In ``sample.csv.gz``, there are the MCMC_ samples of posterior distribution of 
  Psi. These samples are used to detect the differential splicing.

.. _MCMC: https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

There are more parameters for setting (``brie -h`` always give the version you 
are using)

.. code-block:: html

  Usage: brie [options]

  Options:
    -h, --help            show this help message and exit
    -a ANNO_FILE, --anno_file=ANNO_FILE
                          Annotation file for genes and transcripts in GTF or
                          GFF3
    -s SAM_FILE, --sam_file=SAM_FILE
                          Sorted and indexed bam/sam files, use ',' for
                          replicates e.g., rep1.sorted.bam,sam1_rep2.sorted.bam
    -o OUT_FILE, --out_file=OUT_FILE
                          Prefix of the output files with full path
    -f FACTOR_FILE, --factor_file=FACTOR_FILE
                          Features in csv.gz file to predict isoform expression.

    Optional arguments:
      -p NPROC, --nproc=NPROC
                          Number of subprocesses [default: 4]
      -w WEIGHT_FILE, --weight_file=WEIGHT_FILE
                          File with weights, an output of Brie.
      -y FTYPE, --ftype=FTYPE
                          Type of function target: FPKM, Y, Psi [default: Y].
      --fLen=FRAG_LENG    Two arguments for fragment length: mean and standard
                          diveation, default: auto-detected
      --bias=BIAS_ARGS    Three argments for bias correction:
                          BIAS_MODE,REF_FILE,BIAS_FILE(s). BIAS_MODE: unif,
                          end5, end3, both. REF_FILE: the genome reference file
                          in fasta format. BIAS_FILE(s): bias files from dice-
                          bias, use '---' for time specific files, [default:
                          unif None None]
      --sigma=_SIGMA      Sigma in Bayesian regression: the Gaussian standard
                          deviation of residues [default: Auto].
      --lambda=_LAMBDA    Lambda in Bayesian regression: the coeffiecient of L2
                          constrain on weights [default: 0.1].
      --mcmc=MCMC_RUN     Four arguments for in MCMC iterations:
                          save_sample,max_run,min_run,gap_run. Required:
                          save_sample =< 3/4*mim_run. [default: 500 5000 1000 50]

**Hyperparamers**

* ``sigma`` is the square rooted variance of Gaussian noise in Bayesian 
  regression. By default, it will learn it automatically. Alternatively, you 
  could set it with your experience, for example, 3 might be a good option. 
* ``lambda`` is the constrain on weights of Bayesian regression. 0.1 is good 
  option in ENCODE data.
* ``weight_file`` is fixed weights for Bayesian regression. Therefore, the 
  prior is predicted from the input weight file and its sequence features.
  

2. Differential splicing
========================

This command allows to detect differential splicing between many cells 
pair-wisely, including just two cells, by calculating Bayes factor. You could 
run it as follows:

For two cells (``-p 1 --minBF 0`` gives all events in the same order. Speed: 
10-20 second with 1 CPU)

::

  brie-diff -i cell1/samples.csv.gz,cell2/samples.csv.gz -o c1_c2.diff.tsv -p 1 --minBF 0


For many cells (gives events with ``BF>10``. Speed: 100 cells in ~10min with 30 
CPUs)

::

  fileList=cell1/samples.csv.gz,cell2/samples.csv.gz,cell3/samples.csv.gz,cell4/samples.csv.gz

  brie-diff -i $fileList -o c1_c4.diff.tsv

Then you will have two output files. The first one (in the format of xxx.diff.tsv) 
contains all Bayes factor passing the threshold, and it has with 15 columns:

* column1-2: transcript id and gene id
* column3-4: cell 1 and cell 2 names (the folder names)
* column5-6: prior of exon inclusion fraction for cell 1 and cell 2
* column7-8: posterior of exon inclusion fraction for cell 1 and cell 2
* column9-12: counts for inclusion and exclusion for cell1, and then cell 2
* column13-14: probability of prior and posterior diff<0.05
* column 15: Bayes factor

.. note::
  Bayes factor is different from p value in hypothesis test. A good threshold 
  could be ``Bayes factor > 10`` as differential splicing event between two 
  cells.

Also another file ranks these splicing events by the number of cell paris with
differential splicing. It has 4 columns: ``gene_id``, ``cell_pairs``, 
``mean_BF``, ``median_BF``.

There are more parameters for setting (``brie-diff -h`` always give the version 
you are using):

.. code-block:: html

  Usage: brie-diff [options]

  Options:
  -h, --help            show this help message and exit
  -i IN_FILES, --inFiles=IN_FILES
                        Input files of Brie samples for multiple cells, comma
                        separated for each cell, e.g., cell1,cell2,cell3
  -o OUT_FILE, --outFile=OUT_FILE
                        Output file with full path

  Optional arguments:
    -p NPROC, --nproc=NPROC
                        Number of subprocesses [default: 4]
    -n BOOTSTRAP, --bootstrap=BOOTSTRAP
                        Number of bootstrap [default: 1000]
    --minBF=MINBF       Minimum BF for saving out, e.g., 3 or 10. If it is 0,
                        save all events [default: 10]


3. Examples
===========

One typical example on 130 mouse cells during gastrulation is in this folder, 
from which you will quantify the splicing with BRIE, identify the highly 
variable splicing events and visualise them with sashimi plot.
https://github.com/huangyh09/brie/tree/master/example/gastrulation


There are some earlier examples: 
https://sourceforge.net/projects/brie-rna/files/examples/

- Example to quantify splicing with provided annotation (bash code and data): 
  brie-examples.zip_

- Example to quantify splicing with provided annotation (bash code): 
  brie_demo.sh_

- Example to generate splicing events and fetch sequence factors (bash codes): 
  anno_maker.sh_

.. _brie-examples.zip: http://ufpr.dl.sourceforge.net/project/brie-rna/examples/brie_quantify/brie-examples.zip
.. _brie_demo.sh: https://github.com/huangyh09/brie/blob/master/example/brie_demo.sh
.. _anno_maker.sh: https://github.com/huangyh09/brie/blob/master/example/anno_maker.sh