An extended field of view image is formed by combining a plurality of spatially offset ultrasonic images which have been spatially aligned. Images which are to be aligned are processed to produce two sets of corresponding images of progressively lower resolution. The spatial alignment is performed by comparing images of the same resolution level from each set, and progressing from comparison of the lowest resolution images to comparison of the highest. Only prominent feature areas of the images are used in the comparison to improve alignment reliability and to reduce computational needs. As each pair of images is compared, the result of the comparison is used to pre-align the images of the next resolution level. Results are checked and refined against boundary conditions and by gradient refinement.