10k A375 Cells Transduced with (1) Non-Target and (1) Target sgRNA#

Dataset: 10k A375 Cells Transduced with (1) Non-Target and (1) Target sgRNA, Dual Indexed

The detailed description of this dataset can be found here.


Preparation#

Fastq files and feature barcodes are prepared as described here.


QC#

Threshold: one mismatch#

When running the qc subcommand, omitting the -1 (read 1) option activates bulk mode, which is useful for designing and testing feature barcoding assays prior to conducting single cell experiments. For example, bulk mode can be used to estimate: 1) the number of reads with valid feature barcodes, which can indicate primer specificity and suggest the number of reads required for sequencing; and 2) the distribution of feature barcodes, which reflects the biological aspect of the assay design.

To specify read 2 and feature barcodes, use -2 and -f options, respectively. The search range for read 2 can be controlled with -r2_c. In this example, a single mismatch is allowed for feature barcode matching, set by -fb_m. Use -n to specify the number of reads to analyze, with None indicating that all reads in the fastq file should be analyzed. By default, the distribution of detected feature barcodes is summarized in qc/feature_barcode_frequency.csv.

$ fba qc \
    -2 SC3_v3_NextGem_DI_CRISPR_10K_crispr_S1_combined_R2_001.fastq.gz \
    -f SC3_v3_NextGem_DI_CRISPR_10K_feature_ref_edited.tsv \
    -r2_c 31,51 \
    -fb_m 1 \
    -n None

The content of qc/feature_barcode_frequency.csv.

feature barcode

num_reads

percentage

NON_TARGET-1_AACGTGCTGACGATGCGGGC

59,310,228

0.6979885931379053

RAB1A-2_GCCGGCGAACCAGGAAATAG

25,662,834

0.3020114068620947

Result summary.

58.59% (84,973,062 / 145,032,428) of reads have valid feature barcodes (sgRNAs). Although the valid read ratio is 1.663170816 (NON_TARGET-1_AACGTGCTGACGATGCGGGC / RAB1A-2_GCCGGCGAACCAGGAAATAG; 59,310,228 / 25,662,834), cells transduced with them separately are mixed at 1: 1 ratio. See here for more details.

2021-02-17 16:12:35,393 - fba.__main__ - INFO - fba version: 0.0.7
2021-02-17 16:12:35,393 - fba.__main__ - INFO - Initiating logging ...
2021-02-17 16:12:35,393 - fba.__main__ - INFO - Python version: 3.7
2021-02-17 16:12:35,393 - fba.__main__ - INFO - Using qc subcommand ...
2021-02-17 16:12:35,394 - fba.__main__ - INFO - Bulk mode enabled: only feature barcodes on reads 2 are analyzed
2021-02-17 16:12:35,394 - fba.__main__ - INFO - Skipping arguments: "-w/--whitelist", "-cb_m/--cb_mismatches", "-r1_c/--read1_coordinate"
2021-02-17 16:12:35,395 - fba.qc - INFO - Number of reference feature barcodes: 2
2021-02-17 16:12:35,395 - fba.qc - INFO - Read 2 coordinates to search: [31, 51)
2021-02-17 16:12:35,395 - fba.qc - INFO - Feature barcode maximum number of mismatches: 1
2021-02-17 16:12:35,395 - fba.qc - INFO - Read 2 maximum number of N allowed: inf
2021-02-17 16:12:35,395 - fba.qc - INFO - Number of read pairs to analyze: all
2021-02-17 16:12:35,395 - fba.qc - INFO - Matching ...
2021-02-17 16:15:07,684 - fba.qc - INFO - Reads processed: 10,000,000
2021-02-17 16:17:39,083 - fba.qc - INFO - Reads processed: 20,000,000
2021-02-17 16:20:09,116 - fba.qc - INFO - Reads processed: 30,000,000
2021-02-17 16:22:38,981 - fba.qc - INFO - Reads processed: 40,000,000
2021-02-17 16:25:11,671 - fba.qc - INFO - Reads processed: 50,000,000
2021-02-17 16:27:44,790 - fba.qc - INFO - Reads processed: 60,000,000
2021-02-17 16:30:18,110 - fba.qc - INFO - Reads processed: 70,000,000
2021-02-17 16:32:51,391 - fba.qc - INFO - Reads processed: 80,000,000
2021-02-17 16:35:24,625 - fba.qc - INFO - Reads processed: 90,000,000
2021-02-17 16:37:57,678 - fba.qc - INFO - Reads processed: 100,000,000
2021-02-17 16:40:30,706 - fba.qc - INFO - Reads processed: 110,000,000
2021-02-17 16:43:03,867 - fba.qc - INFO - Reads processed: 120,000,000
2021-02-17 16:45:37,197 - fba.qc - INFO - Reads processed: 130,000,000
2021-02-17 16:48:10,511 - fba.qc - INFO - Reads processed: 140,000,000
2021-02-17 16:49:27,662 - fba.qc - INFO - Number of reads processed: 145,032,428
2021-02-17 16:49:27,663 - fba.qc - INFO - Number of reads w/ valid feature barcodes: 84,973,062
2021-02-17 16:49:27,664 - fba.__main__ - INFO - Output file: qc/feature_barcode_frequency.csv
2021-02-17 16:49:27,689 - fba.__main__ - INFO - Done.

Threshold: two mismatches#

Let’s relax the threshold to allow 2 mismatches for feature barcode matching (set by -fb_m).

$ fba qc \
    -2 SC3_v3_NextGem_DI_CRISPR_10K_crispr_S1_combined_R2_001.fastq.gz \
    -f SC3_v3_NextGem_DI_CRISPR_10K_feature_ref_edited.tsv \
    -r2_c 31,51 \
    -fb_m 2 \
    -n None

The content of qc/feature_barcode_frequency.csv.

feature barcode

num_reads

percentage

NON_TARGET-1_AACGTGCTGACGATGCGGGC

66,334,740

0.6613115326075217

RAB1A-2_GCCGGCGAACCAGGAAATAG

33,973,113

0.3386884673924782

Result summary.

69.16% (100,307,853 / 145,032,428) of reads have valid feature barcodes.

2021-02-17 16:12:00,407 - fba.__main__ - INFO - fba version: 0.0.7
2021-02-17 16:12:00,407 - fba.__main__ - INFO - Initiating logging ...
2021-02-17 16:12:00,408 - fba.__main__ - INFO - Python version: 3.7
2021-02-17 16:12:00,408 - fba.__main__ - INFO - Using qc subcommand ...
2021-02-17 16:12:00,408 - fba.__main__ - INFO - Bulk mode enabled: only feature barcodes on reads 2 are analyzed
2021-02-17 16:12:00,408 - fba.__main__ - INFO - Skipping arguments: "-w/--whitelist", "-cb_m/--cb_mismatches", "-r1_c/--read1_coordinate"
2021-02-17 16:12:00,426 - fba.qc - INFO - Number of reference feature barcodes: 2
2021-02-17 16:12:00,426 - fba.qc - INFO - Read 2 coordinates to search: [31, 51)
2021-02-17 16:12:00,426 - fba.qc - INFO - Feature barcode maximum number of mismatches: 2
2021-02-17 16:12:00,426 - fba.qc - INFO - Read 2 maximum number of N allowed: inf
2021-02-17 16:12:00,426 - fba.qc - INFO - Number of read pairs to analyze: all
2021-02-17 16:12:00,426 - fba.qc - INFO - Matching ...
2021-02-17 16:28:02,710 - fba.qc - INFO - Reads processed: 10,000,000
2021-02-17 16:44:07,554 - fba.qc - INFO - Reads processed: 20,000,000
2021-02-17 17:00:13,431 - fba.qc - INFO - Reads processed: 30,000,000
2021-02-17 17:16:17,034 - fba.qc - INFO - Reads processed: 40,000,000
2021-02-17 17:32:21,635 - fba.qc - INFO - Reads processed: 50,000,000
2021-02-17 17:48:26,948 - fba.qc - INFO - Reads processed: 60,000,000
2021-02-17 18:04:31,050 - fba.qc - INFO - Reads processed: 70,000,000
2021-02-17 18:20:34,413 - fba.qc - INFO - Reads processed: 80,000,000
2021-02-17 18:36:38,778 - fba.qc - INFO - Reads processed: 90,000,000
2021-02-17 18:52:44,033 - fba.qc - INFO - Reads processed: 100,000,000
2021-02-17 19:08:49,500 - fba.qc - INFO - Reads processed: 110,000,000
2021-02-17 19:24:56,356 - fba.qc - INFO - Reads processed: 120,000,000
2021-02-17 19:41:02,072 - fba.qc - INFO - Reads processed: 130,000,000
2021-02-17 19:57:09,967 - fba.qc - INFO - Reads processed: 140,000,000
2021-02-17 20:05:15,665 - fba.qc - INFO - Number of reads processed: 145,032,428
2021-02-17 20:05:15,666 - fba.qc - INFO - Number of reads w/ valid feature barcodes: 100,307,853
2021-02-17 20:05:15,667 - fba.__main__ - INFO - Output file: qc/feature_barcode_frequency.csv
2021-02-17 20:05:15,701 - fba.__main__ - INFO - Done.