Star Solo

⚡ ALL INFORMATION CLICK HERE 👈🏻👈🏻👈🏻

Star Solo

Features →

Mobile →
Actions →
Codespaces →
Packages →
Security →
Code review →
Project management →
Integrations →

GitHub Sponsors →
Customer stories →

Explore GitHub →

Learn and contribute

Topics →
Collections →
Trending →
Learning Lab →
Open source guides →

Connect with others

The ReadME Project →
Events →
Community forum →
GitHub Education →
GitHub Stars program →

Plans →

Compare plans →
Contact Sales →

Nonprofit →
Education →

In this repository

All GitHub

↵

In this repository

All GitHub

↵

In this user

All GitHub

↵

In this repository

All GitHub

↵

Code

Issues

Pull requests

Discussions

Actions

Projects

Wiki

Security

Insights

Go to file
T

Go to line
L

Copy path

Copy permalink

alexdobin

Added extras/scripts/calcUMIperCell.awk: a script to calculate total …

Latest commit
7707e77
Mar 1, 2021

History

…number of UMIs per cell and filtering status.

Users who have contributed to this file

© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs

Contact GitHub
Pricing
API
Training
Blog
About

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.

STARsolo is a turnkey solution for analyzing droplet single cell RNA sequencing data (e.g. 10X Genomics Chromium System) built directly into STAR code.
STARsolo inputs the raw FASTQ reads files, and performs the following operations
STARsolo output is designed to be a drop-in replacement for 10X CellRanger gene quantification output.
It follows CellRanger logic for cell barcode whitelisting and UMI deduplication, and produces nearly identical gene counts in the same format.
At the same time STARsolo is ~10 times faster than the CellRanger.
STARsolo is run the same way as normal STAR run, with addition of several STARsolo parameters:
The genome index is the same as for normal STAR runs.
The parameters required to run STARsolo on 10X Chromium data are described below:
The STAR solo algorithm is turned on with:
or, since 2.7.3a, with more descriptive:
The CellBarcode whitelist has to be provided with:
The 10X Chromium whitelist files can be found or inside the CellRanger distribution or on GitHub/10XGenomics .
Please make sure that the whitelist is compatible with the specific version of the 10X chemistry: V2 or V3. For instance, in CellRanger 3.1.0, the V2 whitelist is:
and V3 whitelist (gunzip it for STARsolo):
The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:
Importantly, in the --readFilesIn option, the 1st file has to be cDNA read, and the 2nd file has to be the barcode (cell+UMI) read, i.e.
For instance, standard 10X runs have cDNA as Read2 and barcode as Read1:
For multiple lanes, use commas separated lists for Read2 and Read1:
CellRanger uses its own "filtered" version of annotations (GTF file) which is a subset of ENSEMBL annotations, with several gene biotypes removed (mostly small non-coding RNA). Annotations affect the counts, and to match CellRanger counts CellRanger annotations have to be used.
10X provides several versions of the CellRanger annotations:
https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
For the best match, the annotations in CellRanger run and STARsolo run should be exactly the same.
The FASTA and GTF files, for one of the older releases:
have to be used in STAR genome index generation step before mapping:
If you want to use your own GTF (e.g. newer version of ENSEMBL or GENCODE), you can generate the "filtered" GTF file using 10X's mkref tool:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references
To make the agreement between STARsolo and CellRanger even more perfect, you can add
to the genome generation options, which is used by CellRanger to generate STAR genomes. It will generate sparse suffixs array which has an additional benefit of fitting into 16GB of RAM. However, it also results in 30-50% reduction of speed.
The considerations above are for raw counts, i.e. when cell filtering (calling) is not performed. To get filtered cells, refer to Cell filtering (calling) section.
The adapter clipping utilizes vectorized Smith-Waterman algorithm from Opal package by Martin Šošić: https://github.com/Martinsos/opal
Simple barcode lengths and start positions on barcode reads are described with
By default, it is assumed that the barcode is located on one of the mates of paired-end read, while cDNA is on the other mate. However, in some scRNA-seq protocols the barcode and cDNA sequences are located on the same mate. In this case, we can specify the mate on which the barcode is located (1 or 2) with --soloBarcodeMate . Also, the barcode/adapter sequences have to be clipped (leaving only cDNA) with --clip5pNbases or --clip3pNbases .
For instance, for the 10X 5' protocol , the 1st mate contains the barcode at the 5', with 16b CB, 10b UMI and 13b adapter (39b total). If the 1st mate is sequenced longer than 39b, the remaining bases are cDNA that can be mapped together with the 2nd mate (which contains only cDNA):
More complex barcodes are activated with --soloType CB_UMI_Complex and are described with the following parameters
In addition to raw, unfiltered output of gene/cell counts, STARsolo performs cell filtering (a.k.a. cell calling), which aims to select a subset of cells that are likely to be "real" cells as opposed to empty droplets (containing ambient RNA).
Two types of filtering are presently implemented: simple (knee-like) and advanced EmptyDrop-like. The selected filtering is also used to produce summary statistics for filtered cells in the Summary.csv file, which is similar to CellRanger's summary and is useful for Quality Control.
Knee filtering is similar to the method used by CellRanger 2.2.x. This is turned on by default and is controlled by:
You can also add three numbers for this option (default values are given in parenthesis): the number of expected cells (3000), robust maximum percentile for UMI count (0.99), maximum to minimum ratio for UMI count (10).
CellRanger 3.0.0 use advanced filtering based on the EmptyDrop algorithm developed by Lun et al . This algorithm calls extra cells compared to the knee filtering, allowing for cells that have relatively fewer UMIs but are transcriptionally different from the ambient RNA. In STARsolo, this filtering can be activated by:
It can be followed by 10 numeric parameters: nExpectedCells (3000), maxPercentile (0.99), maxMinRatio (10), indMin (45000), indMax (90000), umiMin (500), umiMinFracMedian (0.01), candMaxN (20000), FDR (0.01), simN (10000).
It is possible to run only the filtering algorithm (without the need to re-map) inputting the previously generated raw matrix:
The /path/to/count/dir/raw/ directory should contain the "raw" barcodes.tsv , features.tsv , and matrix.mtx files generated in a previos STARsolo run.
The output will contain the filtered files.
The read sequences and barcodes can be input from SAM (or BAM) files, both unmapped (uBAM) and previously mapped (e.g. Cellranger's BAM):
In case of BAM files, use samtools view command to convert to BAM:
The file should contain one line for each read. For previously mapped file it can be achieved by filtering out non-primary alignments as shown above. Note that unmapped reads have to be included in the file to be remapped.
We need to specify which SAM attributes correspond to seqeunces/qualities of cell barcodes (CR/CY) and UMIs (UR/UY):
If you wish to omit some, All or None of the SAM attributes in the output BAM file (if you requested one), use --readFilesSAMattrKeep option. For previously mapped files, --readFilesSAMattrKeep None is often the best option to avoid duplicated SAM attributes in the BAM output.
If you request coordinate-sorted BAM output, and use a coordinate-sorted mapped BAM input (such as CellRanger's possorted BAM), it may result in slow sorting and require large amountss of RAM. In this case, it is recommended to shuffle the alignments before mapping with samtools bamshuf command.
Plate-based (Smart-seq) scRNA-seq technologies produce separate FASTQ files for each cell. Cell barcodes are not incorporated in the read sequences, and there are no UMIs. Typical STAR command for mapping and quantification of these file will look like:
STARsolo --soloType SmartSeq option produces cell/gene (and other features )
count matrices, using rules similar to the droplet-based technologies. The differnces are (i) individual cells correspond to different FASTQ files,there are no Cell Barcode sequences, and "Cell IDs" have to be provided as input (ii) there are no UMI sequences, but reads can be deduplicated if they have identical start/end coordinates.
The convenient way to list all the FASTQ files and Cell IDs is to create a file manifest and supply it in --readFilesManifest /path/to/manifest.tsv . The manifest file should contain 3 tab-separated columns. For paired-end reads:
For single-end reads, the 2nd column should contain the dash - :
Cell-id can be any string without spaces. Cell-id will be added as ReadGroup tag ( RG:Z: ) for each read in the SAM/BAM output. If Cell-id starts with ID: , it can contain several fields separated by tab, and all the fields will be copied verbatim into SAM @RG header line.

STAR __ SOLO - Home | Facebook
STAR /STARsolo.md at master · alexdobin/ STAR · GitHub
Star Solo загрузить в лучшем качестве - послушать все песни онлайн - 49...
Solo Star - Wikipedia
10 Solo star outfits ideas | war stories, hans solo , star wars

Star Solo слушать онлайн или скачать бесплатно на телефон, андроид, айпад или айфон вы можете на сайте mp3ex.me

Star Wars Main Theme From Star Wars

Star Gazing feat Lyriqs da Lyraciss

Star Of Wonder We Three Kings new edit

Solo por Ella El Cigarrito Olguita Mix

Star Solo

Report Page