The present invention provides a method for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation. In particular, the invention/method controls for the influence of protein abundance, stability and HLA/MHC binding on processing and presentation, enabling a machine-learning algorithm or statistical inference model trained using the method to be applied to any test peptide regardless of its HLA/MHC restriction i.e. the algorithm operates in a HLA/MHC-agnostic manner. This is attained through the building of positive and negative data sets of peptide sequences (peptides identified or inferred from surface bound or secreted MHC/peptide complexes in the literature, and those which are not). Specifically, the positive and negative data sets comprise a multiplicity of pairings between individual entries, in which both sequences of a pair are of equal or similar length, and are derived from the same source protein, and/or have similar binding affinities, with respect to the HLA/MHC molecule from which the peptide of the positive peptide is restricted.