Search International and National Patent Collections
Some content of this application is unavailable at the moment.
If this situation persists, please contact us atFeedback&Contact
1. (WO2017093400) METHOD FOR DETERMINING CELL CLONALITY
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

CLAIMS

1. A method for determining the clonality of a Master Cell Bank (MCB), said MCB resulting from predictable or not predictable insertion of a transgene of known sequence into a host progenitor cell (HPC) genome of known sequence, said method comprising the steps of:

a) Identifying one or more transgene insertion regions (TIRs) in the genome of a reference subclone cell (RSC), wherein the RSC has been isolated from the MCB for which clonality is to be determined, and wherein said identifying is achieved by

i. paired-end sequencing of said RSC genome to obtain an RSC genome sequence or RSC genome sequences; and

ii. alignment of said RSC genome sequence or sequences to said known HPC genome sequence and said known transgene sequence,

thereby yielding one or more transgene insertion regions (TIRs);

b) Determining one or more TIRs as identified in step (a) with the highest degree of sequence coverage,

wherein said sequence coverage refers to the number of times a given nucleic acid sequence containing a given TIR is read during the sequencing process by partially overlapping reads;

wherein said one or more TIRs with the highest degree of sequence coverage are assigned as reference TIRs (RTIRs);

c) Identifying one or more transgene insertion regions (TIRs) in the respective genomes of one or more subclone (SCs);

wherein each of the SCs has been isolated from the MCB for which clonality is to be determined but is independent of said RSC,

wherein said identifying is achieved by

i. paired-end sequencing of each respective SC genome to obtain an SC genome sequence or SC genome sequences; and

ii. alignment of each respective SC genome sequence or sequences to said known HPC genome sequence and said known transgene sequence,

thereby yielding one or more comparative transgene insertion regions (CTIRs);

d) Comparing said one or more RTIRs determined in step (b) with the respective CTIRs determined in step (c);

e) Evaluating the correspondence between each of said one or more CTIRs present in a respective SC and corresponding RTIRs present in said RSC; and

f) Determining clonality of said MCB based on said correspondence evaluated in part (e), wherein said MCB is considered to be monoclonal, if said RSC and said one or more SCs are grouped into the same cluster.

2. The method of claim 1 , wherein paired-end sequencing involves sequencing of a given nucleic acid molecule from both ends of said nucleic acid molecule, thereby generating pairs of reads for a given nucleic acid molecule representing a fragment of the genome to be sequenced.

3. The method of claim 1 or 2, wherein said RSC is sequenced with a higher sequence coverage compared to said one or more SCs.

4. The method of any of the above claims, wherein said MCB results from the insertion of said transgene at multiple positions into said HPC genome, wherein said random insertion is preferably effected using a retroviral vector.

5. The method of any of the previous claims, wherein the determination TIRs comprises classification of paired-end read 1 sequences and paired-end read 2 sequences derived from paired-end libraries into 4 classes, wherein

• class 1 comprises read 1 sequences mapping to said transgene;

• class 2 comprises read 1 sequences mapping to said HPC genome;

• class 3 comprises read 2 sequences mapping to said transgene; and

• class 4 comprises read 2 sequences mapping to said HPC genome;

wherein said read 1 and said read 2 represent respective forward and backward reads corresponding to the 5' and 3' ends of a given nucleic acid molecule within a nucleic acid cluster generated in sequencing of a nucleic acid library of said RSC or said one or more SCs.

6. The method of claim 5, wherein read 1 sequences are combined with the corresponding read 2 sequences using a flow cell sequence identifier, wherein said sequence identifier comprises information of the flow cell lane, the tile number within the flow cell, the "x" coordinate of the nucleic acid cluster within a tile, and the "y" coordinate of the nucleic acid cluster within a tile, thereby assigning each sequence pair corresponding to read 1 and read 2 sequences a unique position within the flow cell.

7. The method of claims 5 and 6, wherein the respective read 1 and read 2 sequences of a respective read pair are separately aligned to the known sequences of the transgene and the HPC genome.

8. The method of any of claims 5 to 7, wherein only the read pairs comprising class 1 and 4 sequences and the read pairs comprising class 2 and class 3 sequences are selected for further analysis.

9. The method of any of claims 5 to 8, wherein said TIRs are identified by aligning the paired-end read sequences corresponding to class 2 and class 4 to the HPC genome, thereby defining a 2kb region for each of said TIRs in the HPC genome.

10. The method of any of the previous claims, comprising determining n RTIRs with the highest sequence coverage in the paired-end NGS library; wherein n is an integer from 5 to 50, preferably 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50.

1 1. The method of claim 10, wherein the first n RTIRs with highest sequence coverage are determined based on

a) the number of reads of a respective paired-end read sequence corresponding to class 2 and class 4 mapping to the HPC genome, wherein higher number of reads indicates inclusion as an RTIR; and

b) the partial overlap of the number of reads of a respective paired-end read sequence corresponding to class 2 and class 4, wherein lower partial overlap of number of reads indicates inclusion as an RTIR.

12. The method of claim 10 or 1 1 , wherein each of the first n RTIRs in said RSC genome is compared with the corresponding genomic location of said CTIRs in each of said one or more SC genomes.

13. The method of claim 12, wherein comparison of said RTIRs in said RSC and said CTIRs in said one or more SCs is achieved by generating a presence/absence matrix of insertion regions, wherein one matrix dimension represents said n RTIRs of said transgene in said RSC genome and another, preferably orthogonal, matrix dimension represents said RSC and each of said one or more SCs.

14. The method of claim 13, wherein the presence or absence of a respective CTIR in said one or more SCs relative to a respective RTIR in said RSC is represented in the matrix as a binary color code, wherein a first color represents the respective presence or absence of a respective RTIR in said RSC, the respective presence or absence of a respective CTIR in said one or more SCs, and wherein a second color represents the respective absence or presence of a respective RTIR in said RSC, the respective absence or presence of a respective CTIR in said one or more SCs.

15. The method of any of the preceding claims, wherein the relationship between said RSC and each of the said one or more SCs is evaluated by calculation of a distance matrix.

16. The method of claim 15, wherein the distance matrix is calculated based on the following formula (I),

Dd (RSC.SCm) = 1 - (2* N(total) / [N(CTIR) + N(RTIR)])

wherein Dd (RSC,SCm) represents the distance function between said RSC genome and a respective SCmgenome, wherein N (total) is the number of insertion regions present both in said RSC genome and said SCm genome; N(CTIR) is the total number of insertion regions present in said SCm genome; and N(RTIR) is the total number of insertion regions present in said RSC genome; wherein Dd (RSC, SCm) represents the distance, on a scale of 0 to 1 , wherein a distance of 0 represents clonal identity between said RSC and a respective SCm, and 1 represents clonal difference.

17. The method of claim 16, wherein the parameters N(totai), N(CTIR) and/or N(RTIR) are calculated based on the presence/absence matrix of insertion regions generated according to either of claims

14 or 1 5.

18. The method of claim 1 6 or 1 7, wherein the method comprises representing said one or more SCs relative to the RSC on a common distance matrix.

19. The method of claim 18, wherein two respective genomes are considered to belong to a common cluster if the distance between them as calculated according to Formula (I) is 0.