30k Mouse E18 Combined Cortex, Hippocampus and Subventricular Zone Nuclei Multiplexed#
Dataset: 30k Mouse E18 Combined Cortex, Hippocampus and Subventricular Zone Nuclei Multiplexed, 12 CMOs
The detailed description of this dataset can be found here.
Note
The processing of QC, barcode extraction and matrix generation is similar to 10k 1:1 Mixture of Raji and Jurkat Cells Multiplexed, 2 CMOs.
Demultiplexing#
In summary, 799,808,703 of total 1,238,424,843 read pairs have the valid
structure (-cb_m 2
, -fb_m 1
). The average UMIs per cell is
22,962.0 for this feature barcode library.
Inspect feature count matrix.
In [1]: import pandas as pd
In [2]: m = pd.read_csv("matrix_featurecount.csv.gz", index_col=0)
In [3]: m.shape
Out[3]: (12, 59342)
In [4]: m.sum(axis=1)
Out[4]:
CMO301_ATGAGGAATTCCTGC 90849728
CMO302_CATGCCAATAGAGCG 73859662
CMO303_CCGTCGTCCAAGCAT 94388625
CMO304_AACGTTAATCACTCA 112458502
CMO305_CGCGATATGGTCGGA 57147183
CMO306_AAGATGAGGTCTGTG 67979145
CMO307_AAGCTCGTTGGAAGA 191026854
CMO308_CGGATTCCACATCAT 97445798
CMO309_GTTGATCTATAACAG 189936250
CMO310_GCAGGAGGTATCAAT 130990984
CMO311_GAATCGTGATTCTTC 131131183
CMO312_ACATGGTCAACGCTG 55103315
dtype: int64
In [5]: (m > 0).sum(axis=1)
Out[5]:
CMO301_ATGAGGAATTCCTGC 59128
CMO302_CATGCCAATAGAGCG 59088
CMO303_CCGTCGTCCAAGCAT 59096
CMO304_AACGTTAATCACTCA 59194
CMO305_CGCGATATGGTCGGA 58794
CMO306_AAGATGAGGTCTGTG 58932
CMO307_AAGCTCGTTGGAAGA 59306
CMO308_CGGATTCCACATCAT 59078
CMO309_GTTGATCTATAACAG 59292
CMO310_GCAGGAGGTATCAAT 59202
CMO311_GAATCGTGATTCTTC 59204
CMO312_ACATGGTCAACGCTG 58836
dtype: int64
Gaussian mixture model#
Cells are demulitplexed based on the feature count matrix (CMO
abundance). Demultiplexing method 2
(set by -dm
) is inspired by
the method described on 10x Genomics’ website. A cell identity matrix
is generated in the output directory: 0 means negative, 1 means
positive. To set normalization method, use -nm
(default clr
). To
set the probability threshold for demultiplexing, use -p
(default
0.9
). To generate visualization plots, set -v
. To choose
visualization method, use -vm
(default tsne
).
$ fba demultiplex \
-i matrix_featurecount.csv.gz \
--output_directory demultiplexed \
-dm 2 \
-v
2021-10-05 20:45:58,069 - fba.__main__ - INFO - fba version: 0.0.x
2021-10-05 20:45:58,069 - fba.__main__ - INFO - Initiating logging ...
2021-10-05 20:45:58,069 - fba.__main__ - INFO - Python version: 3.8
2021-10-05 20:45:58,069 - fba.__main__ - INFO - Using demultiplex subcommand ...
2021-10-05 20:46:17,903 - fba.__main__ - INFO - Skipping arguments: "-q/--quantile", "-cm/--clustering_method"
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - Output directory: demultiplexed
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - Demultiplexing method: 2
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - UMI normalization method: clr
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - Visualization: On
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - Visualization method: tsne
2021-10-05 20:46:17,903 - fba.demultiplex - INFO - Loading feature count matrix: matrix_featurecount.csv.gz ...
2021-10-05 20:46:27,051 - fba.demultiplex - INFO - Number of cells: 31,171
2021-10-05 20:46:27,052 - fba.demultiplex - INFO - Number of positive cells for a feature to be included: 200
2021-10-05 20:46:27,163 - fba.demultiplex - INFO - Number of features: 12 / 12 (after filtering / original in the matrix)
2021-10-05 20:46:27,163 - fba.demultiplex - INFO - Features: CMO301 CMO302 CMO303 CMO304 CMO305 CMO306 CMO307 CMO308 CMO309 CMO310 CMO311 CMO312
2021-10-05 20:46:27,164 - fba.demultiplex - INFO - Total UMIs: 713,913,321 / 713,913,321
2021-10-05 20:46:27,218 - fba.demultiplex - INFO - Median number of UMIs per cell: 22,962.0 / 22,962.0
2021-10-05 20:46:27,218 - fba.demultiplex - INFO - Demultiplexing ...
2021-10-05 20:46:29,001 - fba.demultiplex - INFO - Generating heatmap ...
2021-10-05 20:47:17,305 - fba.demultiplex - INFO - Embedding ...
2021-10-05 20:49:27,083 - fba.__main__ - INFO - Done.
According to the description of this dataset:
The four E18 mouse nuclei samples were multiplexed at equal proportions with 3 CMOs per nuclei sample, resulting in a pooled sample labeled with 12 CMOs. Nuclei from the non-multiplexed sample were used as one of the four sample types composing the multiplexed sample.
Heatmap of the relative abundance of features (CMOs) across all cells. Each column represents a single cell. Multiplets have more than one CMOs.
t-SNE embedding of cells based on the abundance of features (CMOs, no transcriptome information used). Colors indicate the CMO status for each cell, as called by FBA. Twelve singlet clusters and cross-oligo multiplet clusters are clearly present.
Preview the demultiplexing result: the numbers of singlets.
In [1]: import pandas as pd
In [2]: m = pd.read_csv("demultiplexed/matrix_cell_identity.csv.gz", index_col=0)
In [3]: m.loc[:, m.sum(axis=0) == 1].sum(axis=1)
Out[3]:
CMO301 1078
CMO302 824
CMO303 1085
CMO304 1575
CMO305 959
CMO306 1362
CMO307 2912
CMO308 2144
CMO309 2841
CMO310 2675
CMO311 2292
CMO312 951
dtype: int64
Kernel density estimation#
Cells are demulitplexed based on the feature count matrix (CMO
abundance) using demultiplexing method 4
, which is implemented with
modifications to the method described in McGinnis, C., et al. (2019).
A cell identity matrix is generated in the output directory: 0 means
negative, 1 means positive. To generate visualization plots, set -v
.
$ fba demultiplex \
-i matrix_featurecount.csv.gz \
-dm 4 \
-v
2021-12-27 12:03:15,693 - fba.__main__ - INFO - fba version: 0.0.x
2021-12-27 12:03:15,693 - fba.__main__ - INFO - Initiating logging ...
2021-12-27 12:03:15,693 - fba.__main__ - INFO - Python version: 3.9
2021-12-27 12:03:15,693 - fba.__main__ - INFO - Using demultiplex subcommand ...
2021-12-27 12:03:18,145 - fba.__main__ - INFO - Skipping arguments: "-q/--quantile", "-cm/--clustering_method", "-p/--prob"
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - Output directory: demultiplexed
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - Demultiplexing method: 4
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - UMI normalization method: clr
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - Visualization: On
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - Visualization method: tsne
2021-12-27 12:03:18,145 - fba.demultiplex - INFO - Loading feature count matrix: matrix_featurecount.csv.gz ...
2021-12-27 12:03:18,453 - fba.demultiplex - INFO - Number of cells: 31,171
2021-12-27 12:03:18,453 - fba.demultiplex - INFO - Number of positive cells for a feature to be included: 200
2021-12-27 12:03:18,499 - fba.demultiplex - INFO - Number of features: 12 / 12 (after filtering / original in the matrix)
2021-12-27 12:03:18,499 - fba.demultiplex - INFO - Features: CMO301 CMO302 CMO303 CMO304 CMO305 CMO306 CMO307 CMO308 CMO309 CMO310 CMO311 CMO312
2021-12-27 12:03:18,499 - fba.demultiplex - INFO - Total UMIs: 713,913,321 / 713,913,321
2021-12-27 12:03:18,523 - fba.demultiplex - INFO - Median number of UMIs per cell: 22,962.0 / 22,962.0
2021-12-27 12:03:18,523 - fba.demultiplex - INFO - Demultiplexing ...
2021-12-27 12:03:39,128 - fba.demultiplex - INFO - Quantile cutoff: 49
2021-12-27 12:03:51,501 - fba.demultiplex - INFO - Generating heatmap ...
2021-12-27 12:04:07,664 - fba.demultiplex - INFO - Embedding ...
2021-12-27 12:04:56,977 - fba.__main__ - INFO - Done.
Heatmap of relative abundance of feature across all cells. Each column represents a single cell.
t-SNE embedding of cells based on the abundance of features (no transcriptome information used). Colors indicate the sgRNA status for each cell, as called by FBA.
Preview the demultiplexing result: the numbers of singlets.
In [1]: import pandas as pd
In [2]: m = pd.read_csv("demultiplexed/matrix_cell_identity.csv.gz", index_col=0)
In [3]: m.loc[:, m.sum(axis=0) == 1].sum(axis=1)
Out[3]:
CMO301 1127
CMO302 872
CMO303 1124
CMO304 1562
CMO305 950
CMO306 1386
CMO307 3085
CMO308 2187
CMO309 2914
CMO310 2452
CMO311 2248
CMO312 950
dtype: int64