Beschreibung:
While ultrasound provides a remarkable tool for tracking the tongue's movements during speech, it has yet to emerge as the powerful research tool it could be. A major roadblock is that the means of appropriately labeling images is a laborious, time-intensive undertaking. In earlier work, Fasel and Berry (2010) introduced a "translational" deep belief network (tDBN) approach to automated labeling of ultrasound images of the tongue, and tested it against a single-speaker set of 3209 images. This study tests the same methodology against a much larger data set (about 40,000 images), using data collected for different studies with multiple speakers and multiple languages. Retraining a “generic” network with a small set of the most erroneously labeled images from language-specific development sets resulted in an almost three-fold increase in precision in the three test cases examined.