10k 1:1 Mixture of Raji and Jurkat Cells Multiplexed#
Dataset: 10k 1:1 Mixture of Raji and Jurkat Cells Multiplexed, 2 CMOs
The detailed description of this dataset can be found here.
Preparation#
Fastq files and feature barcodes are prepared as described here.
QC#
Threshold: one mismatch#
When running the qc
subcommand, omitting the -1
(read 1) option
activates bulk mode, which is useful for designing and testing feature
barcoding assays prior to conducting single cell experiments. For
example, bulk mode can be used to estimate: 1) the number of reads with
valid feature barcodes, which can indicate primer specificity and
suggest the number of reads required for sequencing; and 2) the
distribution of feature barcodes, which reflects the biological aspect
of the assay design.
To specify read 2 and feature barcodes, use -2
and -f
options,
respectively. The search range for read 2 can be controlled with
-r2_c
. In this example, a single mismatch is allowed for feature
barcode matching, set by -fb_m
. Use -n
to specify the number of
reads to analyze, with None
indicating that all reads in the fastq
file should be analyzed. By default, the distribution of detected
feature barcodes is summarized in qc/feature_barcode_frequency.csv
.
$ fba qc \
-2 SC3_v3_NextGem_DI_CellPlex_Jurkat_Raji_10K_1_multiplexing_capture_S1_combined_R2_001.fastq.gz \
-f SC3_v3_NextGem_DI_CRISPR_10K_feature_ref.tsv \
-r2_c 0,15 \
-fb_m 1 \
-n None
The content of qc/feature_barcode_frequency.csv
.
feature barcode |
num_reads |
percentage |
CMO301_ATGAGGAATTCCTGC |
132435325 |
0.62351149 |
CMO302_CATGCCAATAGAGCG |
79628216 |
0.37489323 |
CMO308_CGGATTCCACATCAT |
320078 |
0.00150694 |
CMO309_GTTGATCTATAACAG |
6445 |
3.03E-05 |
CMO303_CCGTCGTCCAAGCAT |
4047 |
1.91E-05 |
CMO304_AACGTTAATCACTCA |
2199 |
1.04E-05 |
CMO307_AAGCTCGTTGGAAGA |
1735 |
8.17E-06 |
CMO312_ACATGGTCAACGCTG |
1508 |
7.10E-06 |
CMO306_AAGATGAGGTCTGTG |
1323 |
6.23E-06 |
CMO310_GCAGGAGGTATCAAT |
532 |
2.50E-06 |
CMO311_GAATCGTGATTCTTC |
502 |
2.36E-06 |
CMO305_CGCGATATGGTCGGA |
472 |
2.22E-06 |
Result summary.
98.3% (212,402,382 / 216,070,514) of reads have valid feature barcodes. CMO301_ATGAGGAATTCCTGC and CMO302_CATGCCAATAGAGCG are the most abundant CMOs. They account for all most all of the valid reads. Although the valid read ratio is 1.663171 (132,435,325 / 79,628,216), cells labeled with them separately are mixed at 1: 1 ratio. See here for more details.
2021-10-02 02:02:31,092 - fba.__main__ - INFO - fba version: 0.0.11
2021-10-02 02:02:31,092 - fba.__main__ - INFO - Initiating logging ...
2021-10-02 02:02:31,092 - fba.__main__ - INFO - Python version: 3.7
2021-10-02 02:02:31,092 - fba.__main__ - INFO - Using qc subcommand ...
2021-10-02 02:02:31,873 - fba.__main__ - INFO - Bulk mode enabled: only feature barcodes on reads 2 are analyzed
2021-10-02 02:02:31,873 - fba.__main__ - INFO - Skipping arguments: "-w/--whitelist", "-cb_m/--cb_mismatches", "-r1_c/--read1_coordinate"
2021-10-02 02:02:31,875 - fba.qc - INFO - Number of reference feature barcodes: 12
2021-10-02 02:02:31,875 - fba.qc - INFO - Read 2 coordinates to search: [0, 15)
2021-10-02 02:02:31,875 - fba.qc - INFO - Feature barcode maximum number of mismatches: 1
2021-10-02 02:02:31,875 - fba.qc - INFO - Read 2 maximum number of N allowed: inf
2021-10-02 02:02:31,875 - fba.qc - INFO - Number of read pairs to analyze: all
2021-10-02 02:02:31,875 - fba.qc - INFO - Matching ...
2021-10-02 02:04:19,871 - fba.qc - INFO - Reads processed: 10,000,000
2021-10-02 02:06:06,844 - fba.qc - INFO - Reads processed: 20,000,000
2021-10-02 02:07:53,987 - fba.qc - INFO - Reads processed: 30,000,000
2021-10-02 02:09:40,854 - fba.qc - INFO - Reads processed: 40,000,000
2021-10-02 02:11:27,502 - fba.qc - INFO - Reads processed: 50,000,000
2021-10-02 02:13:14,277 - fba.qc - INFO - Reads processed: 60,000,000
2021-10-02 02:15:02,641 - fba.qc - INFO - Reads processed: 70,000,000
2021-10-02 02:16:51,149 - fba.qc - INFO - Reads processed: 80,000,000
2021-10-02 02:18:40,463 - fba.qc - INFO - Reads processed: 90,000,000
2021-10-02 02:20:30,099 - fba.qc - INFO - Reads processed: 100,000,000
2021-10-02 02:22:19,651 - fba.qc - INFO - Reads processed: 110,000,000
2021-10-02 02:24:09,364 - fba.qc - INFO - Reads processed: 120,000,000
2021-10-02 02:25:59,016 - fba.qc - INFO - Reads processed: 130,000,000
2021-10-02 02:27:48,634 - fba.qc - INFO - Reads processed: 140,000,000
2021-10-02 02:29:38,323 - fba.qc - INFO - Reads processed: 150,000,000
2021-10-02 02:31:28,018 - fba.qc - INFO - Reads processed: 160,000,000
2021-10-02 02:33:17,585 - fba.qc - INFO - Reads processed: 170,000,000
2021-10-02 02:35:07,168 - fba.qc - INFO - Reads processed: 180,000,000
2021-10-02 02:36:56,770 - fba.qc - INFO - Reads processed: 190,000,000
2021-10-02 02:38:46,487 - fba.qc - INFO - Reads processed: 200,000,000
2021-10-02 02:40:36,129 - fba.qc - INFO - Reads processed: 210,000,000
2021-10-02 02:41:42,628 - fba.qc - INFO - Number of reads processed: 216,070,514
2021-10-02 02:41:42,628 - fba.qc - INFO - Number of reads w/ valid feature barcodes: 212,402,382
2021-10-02 02:41:42,629 - fba.__main__ - INFO - Output file: qc/feature_barcode_frequency.csv
2021-10-02 02:41:42,645 - fba.__main__ - INFO - Done.
Threshold: two mismatches#
Let’s relax the threshold to allow 2 mismatches for feature barcode
matching (set by -fb_m
).
$ fba qc \
-2 SC3_v3_NextGem_DI_CellPlex_Jurkat_Raji_10K_1_multiplexing_capture_S1_combined_R2_001.fastq.gz \
-f SC3_v3_NextGem_DI_CRISPR_10K_feature_ref.tsv \
-r2_c 0,15 \
-fb_m 2 \
-n None
The content of qc/feature_barcode_frequency.csv
.
feature barcode |
num_reads |
percentage |
CMO301_ATGAGGAATTCCTGC |
133957542 |
0.624153341 |
CMO302_CATGCCAATAGAGCG |
80322629 |
0.374250203 |
CMO308_CGGATTCCACATCAT |
323662 |
0.00150805 |
CMO309_GTTGATCTATAACAG |
6498 |
3.03E-05 |
CMO303_CCGTCGTCCAAGCAT |
4091 |
1.91E-05 |
CMO304_AACGTTAATCACTCA |
2225 |
1.04E-05 |
CMO307_AAGCTCGTTGGAAGA |
1751 |
8.16E-06 |
CMO312_ACATGGTCAACGCTG |
1535 |
7.15E-06 |
CMO306_AAGATGAGGTCTGTG |
1351 |
6.29E-06 |
CMO310_GCAGGAGGTATCAAT |
539 |
2.51E-06 |
CMO311_GAATCGTGATTCTTC |
507 |
2.36E-06 |
CMO305_CGCGATATGGTCGGA |
477 |
2.22E-06 |
Result summary.
99.33% (214,622,807 / 216,070,514) of reads have valid feature barcodes.
2021-10-02 02:02:31,268 - fba.__main__ - INFO - fba version: 0.0.11
2021-10-02 02:02:31,268 - fba.__main__ - INFO - Initiating logging ...
2021-10-02 02:02:31,268 - fba.__main__ - INFO - Python version: 3.7
2021-10-02 02:02:31,268 - fba.__main__ - INFO - Using qc subcommand ...
2021-10-02 02:02:32,021 - fba.__main__ - INFO - Bulk mode enabled: only feature barcodes on reads 2 are analyzed
2021-10-02 02:02:32,021 - fba.__main__ - INFO - Skipping arguments: "-w/--whitelist", "-cb_m/--cb_mismatches", "-r1_c/--read1_coordinate"
2021-10-02 02:02:32,025 - fba.qc - INFO - Number of reference feature barcodes: 12
2021-10-02 02:02:32,025 - fba.qc - INFO - Read 2 coordinates to search: [0, 15)
2021-10-02 02:02:32,026 - fba.qc - INFO - Feature barcode maximum number of mismatches: 2
2021-10-02 02:02:32,026 - fba.qc - INFO - Read 2 maximum number of N allowed: inf
2021-10-02 02:02:32,026 - fba.qc - INFO - Number of read pairs to analyze: all
2021-10-02 02:02:32,026 - fba.qc - INFO - Matching ...
2021-10-02 02:13:36,407 - fba.qc - INFO - Reads processed: 10,000,000
2021-10-02 02:24:40,718 - fba.qc - INFO - Reads processed: 20,000,000
2021-10-02 02:35:43,572 - fba.qc - INFO - Reads processed: 30,000,000
2021-10-02 02:46:45,598 - fba.qc - INFO - Reads processed: 40,000,000
2021-10-02 02:57:47,743 - fba.qc - INFO - Reads processed: 50,000,000
2021-10-02 03:08:49,904 - fba.qc - INFO - Reads processed: 60,000,000
2021-10-02 03:19:52,124 - fba.qc - INFO - Reads processed: 70,000,000
2021-10-02 03:30:54,289 - fba.qc - INFO - Reads processed: 80,000,000
2021-10-02 03:41:56,459 - fba.qc - INFO - Reads processed: 90,000,000
2021-10-02 03:53:01,896 - fba.qc - INFO - Reads processed: 100,000,000
2021-10-02 04:04:07,940 - fba.qc - INFO - Reads processed: 110,000,000
2021-10-02 04:15:13,882 - fba.qc - INFO - Reads processed: 120,000,000
2021-10-02 04:26:19,716 - fba.qc - INFO - Reads processed: 130,000,000
2021-10-02 04:37:25,780 - fba.qc - INFO - Reads processed: 140,000,000
2021-10-02 04:48:31,630 - fba.qc - INFO - Reads processed: 150,000,000
2021-10-02 04:59:36,756 - fba.qc - INFO - Reads processed: 160,000,000
2021-10-02 05:10:42,247 - fba.qc - INFO - Reads processed: 170,000,000
2021-10-02 05:21:47,635 - fba.qc - INFO - Reads processed: 180,000,000
2021-10-02 05:32:53,151 - fba.qc - INFO - Reads processed: 190,000,000
2021-10-02 05:43:58,739 - fba.qc - INFO - Reads processed: 200,000,000
2021-10-02 05:55:04,397 - fba.qc - INFO - Reads processed: 210,000,000
2021-10-02 06:01:48,423 - fba.qc - INFO - Number of reads processed: 216,070,514
2021-10-02 06:01:48,424 - fba.qc - INFO - Number of reads w/ valid feature barcodes: 214,622,807
2021-10-02 06:01:48,425 - fba.__main__ - INFO - Output file: qc/feature_barcode_frequency.csv
2021-10-02 06:01:48,442 - fba.__main__ - INFO - Done.