BRIE-kit manual

In this manual, we will introduce how to install, run briekit-event, briekit-event-filter, and briekit-factor. You can find examples below or in bash file anno_human.sh and anno_mouse.sh in the example folder of the BRIE-kit repository.

Installation

We recommend using Python in Anaconda platform, which gives everything you need in one folder, and you have the permission to change files even you’re not root.

BRIE-kit is developed under Python 2.7 environment, and not full compatible with Python 3, so please use it in Python2 environment. We recommend you to create conda environment to get Python2.7 as following command lines. Of course, you can install Ananconda2 to get a default Python 2 environment, but we recommend the conda environment, as you only do this preprossing once.

conda create -n briekit python=2.7 numpy=1.13.0

source activate briekit

Once you are in a Python 2 environment, there are usually two ways to isntall a package:

  • Opt 1: Type in terminal: pip install briekit. Add -U if you want to upgrade your earlier installation.
  • Opt 2: Download the GitHub repository, and type python setup.py install

Note, if you don’t use Anaconda and don’t have root permission, add --user, so you can install it in your folder.

Sometimes, you may need to install pysam separatly (hopefully not). In our test pysam=0.10-0.14 works fine.

1. Splicing events generation

Note

This function is not compatible with Python 3.

briekit-event for generating from full annotation. This program is modified from Yarden Katz’s Python package rnaseqlib, with supporting different input annotation formats, e.g., gtf, gff3 and ucsc table. For example, you could download a full annotation file for mouse from GENCODE. Then, you can download the gene annotation and generate the splicing event by the following command:

cd $DATA_DIR

# download gene annotation
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gff3.gz

briekit-event -a $anno_ref -o $DATA_DIR/AS_events

Then in the $DATA_DIR/AS_events folder, the skipping-exon events, i.e., SE.gff3.gz will be generated.

Check more arguments bu briekit-event -h.

Note

If the directory (BIN_DIR) of executable briekit-event is not in PATH variable, use $BIN_DIR/briekit-event rather than briekit-event. The same for following functions.

2. Splicing events filtering

As the annotation file is not perfect, there may be false splicing events generated from above command line. Therefore, it can be useful to add some quality control on these splicing events. Here, we provide another function briekit-event-filter to only keep high-quality events, and also add informative ids (gene id / transcript id). Based on above SE.gff3.gz, we could select the gold-quality splicing event by following command line. Note, the reference genome sequence is also required, for example, mouse genome sequence here.

Here are examples to generate default filtering and lenient filtering events.

cd $DATA_DIR

# download genome reference
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/GRCm38.p5.genome.fa.gz
gzip -d GRCm38.p5.genome.fa.gz

# default filtering
briekit-event-filter -a AS_events/SE.gff3.gz --anno_ref=gencode.vM12.annotation.gtf.gz -r GRCm38.p5.genome.fa

# lenient filtering
briekit-event-filter -a AS_events/SE.gff3.gz --anno_ref=gencode.vM12.annotation.gtf.gz \
-r GRCm38.p5.genome.fa -o AS_events/SE.lenient.gtf --add_chrom chrX,chrY --as_exon_min 10 \
--as_exon_max 100000000 --as_exon_tss 10 --as_exon_tts 10 --no_splice_site #--keep_overlap

Then you will find an output file as AS_events/SE.filtered.gff3.gz, which only contains splicing events passing the following (default) constrains:

  • located on autosome and input chromosome
  • not overlapped by any other AS-exon
  • surrounding introns are no shorter than a fixed length, e.g., 100bp
  • length of alternative exon regions, say, between 50 and 450bp
  • with a minimum distance, say 500bp, from TSS or TTS
  • surrounded by AG-GT, i.e., AG-AS.exon-GT

Check more arguments for events filtering by briekit-event -h.

3. Sequence features

With the splicing annotation file, a set of short sequence feature can be calculated by command line briekit-factor. Besides the annotation file, it also requires genome sequence file (the same as above), and a phast conservation file in bigWig format. For human and mouse, you could download it directly from UCSC browser: mm10.60way.phastCons.bw and hg38.phastCons100way.bw.

Note

In order to fetch data from the bigWig file, we use a utility bigWigSummary that is provided from UCSC. You could download the binary file for linux from here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigSummary

Here is example to download the executable bigWigSummary to /usr/local/bin. Of course, you can download it to anywhere you want, e.g., the same directory to the data.

cd /usr/local/bin
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigSummary
chmod +x bigWigSummary

In order to tell briekit-factor this directory, you could either add this directory into PATH variable, or use it as an arguments of briekit-factor by --bigWigSummary /usr/local/bin/bigWigSummary. If you prefer to add it to PATH, add this line to export PATH="~/ucsc:$PATH" into the .profile or .bashrc file.

Then, you could get the sequence features by briekit-factor, for example,

cd $DATA_DIR

#download phastCon file
wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw

briekit-factor -a AS_events/SE.filtered.gff3.gz -r GRCm38.p5.genome.fa -c mm10.60way.phastCons.bw -o mouse_features.csv -p 10 --bigWigSummary ./bigWigSummary

Then you will have the features stored in a mouse_features.csv.gz file, where #`factors` * #`gene_ids` features values are saved.

Check more arguments for fetch sequence features by briekit-factor -h.