A representation of a nucleic acid sequence encodes a particular gene having at least one intron. An intron signature value corresponding to the at least one intron is determined based on a first computational function applied to at least one portion of the representation of the nucleic acid sequence corresponding to the at least one intron. A protein signature value is determined, being based on a second computational function applied to a representation of a protein. In a database, an association is formed between the intron and protein signature values. This process is repeated for each of a plurality of nucleic acid sequences. Nucleic acid sequences in the database are ordered based on a sort of corresponding intron signature values. An ordering determined by the sort is used to determine or confirm a role or function of a portion of a given nucleic acid sequence.