(EN)
In a speech processing apparatus, an acquisition unit is configured to acquire a speech. A separation unit is configured to separate the speech into a plurality of sections in accordance with a prescribed rule. A calculation unit is configured to calculate a degree of similarity in each combination of the sections. An estimation unit is configured to estimate, with respect to the each section, a direction of arrival of the speech. A correction unit is configured to group the sections whose directions of arrival are mutually similar into a same group and correct the degree of similarity with respect to the combination of the sections in the same group. A clustering unit is configured to cluster the sections by using the corrected degree of similarity.