top of page

TAPIR: a T-cell receptor language model for predicting rare and novel targets.

Fast et al. (BioRxiv) DOI: 10.1101/2023.09.12.557285

TAPIR: a T-cell receptor language model for predicting rare and novel targets.


●  TCR

●  pMHC

●  Machine learning

Main Findings

Understanding how T cells detect peptide antigens is a fundamental question in immunology. The general rules are well established— T cell receptors (TCRs) detect peptide–MHC (pMHC) complexes through specific interactions via their complementarity-determining regions (CDR1-3). ‘Orphan’ TCRs, those with no defined ligand, are easily identified from tissues or tumours but defining their ligands is an empirical process with limited throughput. Computational tools to predict TCR–pMHC interactions would greatly facilitate this process.

In a recent preprint, Fast, Dhar and Chen, from the biotech company VCreate, present TAPIR (TCR And Peptide Interaction Recognizer), a language model that predicts TCR:pMHC interactions. TAPIR is a deep convolutional neural network, a type of machine learning algorithm, trained on the protein sequences of known TCR–pMHC pairs. Fast et al. combined publicly available data in the VDJ database with their own proprietary dataset, totalling more than 2,000 TCR–pMHC pairs. Fast et al develop their own novel method for identifying TCR:pMHC pairs in a high throughput manner, but details of this method were absent from the preprint. TAPIR was flexibly trained, by masking individual components to allow predictions to be made for missing components, such as inputting only TCRβ sequences.

After validation against known TCR:pMHC pairs left out of the training data, two potential applications of TAPIR were explored. The first will facilitate TCR deorphanisation by predicting the MHC restriction of a specific TCR. For any TCR sequence, TAPIR ranked potential MHC ligands but whether TAPIR can also predict specific peptide epitopes was not clear. The second task was identifying TCRs that recognise a known pMHC target, which was demonstrated in two ways. In the first method, T cells from cancer patients carrying the same shared neoantigen in PI3KA,  were in vitro stimulated against this neoantigen and subjected to TCR sequencing. TAPIR ranked these TCRs for predicting binding to the simulating neoantigen. TAPIR highly ranked a previously identified TCR specific for this antigen and also the top-ranked TCR by TAPIR demonstrated high antigen specific in vitro functional responses. Thus, TAPIR facilitated identifying TCRs against a known target.

In their second approach, the authors show that TAPIR can generate de novo TCRs, referred to as Artificial Intelligence TCRs (AITCRs), against a known pMHC target of interest. Remarkably, some of these AITCRs displayed functional recognition of the target pMHC, an incredible proof-of-concept that provides a vision of what machine learning could bring to T cell immunology in the future.


The dataset of TCR–pMHC pairs generated by VCreate is proprietary information. Furthermore, the novel method developed to identify these pairs is described with minimal detail. Whether preprint deposition is However, TAPIR is available as an online tool to non-profit organizations although the full capabilities of TAPIR do not seem available.


Reviewed by Malcolm J. W. Sim as part of a cross-institutional journal club between the Icahn School of Medicine at Mount Sinai, the University of Oxford, the Karolinska Institute and the University of Toronto. The author declares no conflict of interests in relation to their involvement in the review. You can follow him on Twitter.

bottom of page