It is now widely recognized that a crucial test for speaker independent recognition systems is their success in dealing with dialect diversity. Sociolinguistic research on linguistic change in progress has found rapid development of sound changes in most urbanized areas of North America, leading to increased dialect diversity. It appears that the dialects of New York, Philadelphia, Detroit, Chicago, Saint Louis, Dallas and Los Angeles are now more different from each other than they were 50 or 100 years ago.
Most efforts to establish dialect regions in the U.S. are based on dialectological studies of vocabulary, begun in the 1930's by Kurath (1939), and completed for the U.S. by the Dictionary of American Regional English (Carver 1987). For most regions, it is not clear how these dialect divisions correspond to differences in pronunciation that would be important for speech recognition. It seems evident to us that any dialect divisions that would be relevant to speech recognition must be based on the distribution of differences in phonological organization, as determined by acoustic analysis.
For the past three years, the Linguistics Laboratory of the University of Pennsylvania has been engaged in an effort to supply this information through Telsur, a telephone survey of North America, supported by NSF, NEH and Bell Northern Research. The survey is based on a sampling method that is sensitive to both population and geography. The end product of the survey is the Phonological Atlas of North America, which maps the phonological organization of all English dialects and the extent of ongoing sound changes. The current results of the survey show 500 speakers representing the urbanized areas above 200,000 and a number of smaller localities.
There are two major types of sound changes which affect success rates in speech recognition: mergers and chain shifts. The Telsur interview includes a number of direct inquiries aimed at the presence or absence of phonological mergers, in both perception and production. The maps to follow are based on the phonetic transcription of the 500 interviews, including both spontaneous speech and more formal elicitations by minimal pairs and other methods. The study of urbanized areas is not quite complete, but enough has been done to give a clear picture of national trends.
The first consideration in estimating the extent of dialect diversity is the phonological inventory: how many different segments there are to be modeled, and what distinctions should be shown in the dictionary representations. North American dialects are differentiated by unconditioned mergers, which affect the phonemes wherever they appear, and by conditioned mergers, which occur in a particular environment. There is only one unconditioned merger in the vowel system: the collapse of the distinction between short /o/ and long open /oh/, which distinguishes cot and caught, hock and hawk, Don and dawn. In about half of the geographic United States and all of Canada, these pairs are pronounced the same.
Map 1 shows the extent of this merger in speech production before /t/, as in cot vs. caught. The symbols labelled "same" show speakers who are merged; symbols marked "close" shows pairs that are produced phonetically in close approximation as judged by the analyst, and the symbols labelled "distinct" show speakers who produce a clear distinction. It is immediately apparent that the geographic extent of the merger is quite broad; it covers half the geographic area of the United States (and Canada), though most of the heavily populated areas in the North, North Midland and mid-Atlantic States keep these word classes separate.
The four areas where the merger predominates are delineated by the isoglosses on Map 1a. (1) is Northeastern New England, here shown as including Maine, New Hampshire, Vermont and the northeastern portion of Massachusetts, stopping -not far south of the Boston area. Area (2) is Western Pennsylvania. In the Linguistic Atlas maps reflecting the situation in the 1940's (Kurath and McDavid 1961), this region was confined to the area around Pittsburgh. Here it extends northward to include Erie, Pennsylvania on Lake Erie, westward to include a number of cities in Ohio, and southward to West Virginia and northern Kentucky. Though not shown here, the merger is quite solid in Canada, and area (3) appears to be a southward extension of the Canadian merger. Finally, most of the American West is included in area (4), though some variation remains in certain large cities: Los Angeles, the Bay area, and Denver. This merger is continuing to expand and is stronger with younger speakers. Some of the variation in the west reflects the differential ages of the speakers, and some the influence of Southern dialect from settlement patterns of the late 19th century (as in Montana). However, there is no indication that this merger is progressing so rapidly that speech recognition programs can ignore the distinction: on the contrary, the inland North, most of the South, and the mid-Atlantic states show considerable resistance to the spread of this merger. In fact, the transition zone in the midwest, through the Dakotas, Nebraska and Kansas is just about where it was in the only other national map available, a survey of long distance telephone operators in 1968 (Labov 1991, Figure 1.12).
One of the most widespread conditioned mergers concerns the distinction between /i/ and /e/ before /m/ and /n/, as in pin vs. pen, him vs. hem. This is usually reflected in a high front vowel for both, so that both pin and pen sound like pin to speakers of other dialects. As a result, the two words are normally distinguished as ink pen and safety pin in these areas. The "same" symbols indicate the loss of the distinction; "close" designates speakers who produce the two vowels very close to each other; and the "distinct" tokens are speakers who make a plain distinction. This merger has long been known to be characteristic of Southern States dialects (Kurath and McDavid 1961). Brown 1990 and Bailey and Ross 1992 showed that it was still in the process of expansion in the South. Map 2 indicates that it is quite widespread throughout the South Midland (southern Ohio, central Indiana, Illinois, Missouri and Kansas), Texas, and a scattering of points in the West. The areas where the pin ~ pen distinction is solidly maintained include the inland and much of the South Midland, all of Pennsylvania and the Middle Atlantic States. On the other hand, the merger is spreading northward and westward from its base in the South, with a broad zone of variation in the South Midland.
The greatest difficulties for speech recognition are posed not by mergers but by chain shifts of vowels. Over the past two decades, two major patterns of chain shifting have been identified, which rotate the vowels of English in opposite directions (Labov, Yaeger & Steiner 1972, Labov 1990, Labov 1994). To examine these, I will draw upon the 220 speakers for whom complete acoustic analyses are now available. The first chain shift is the Northern Cities Shift, shown in Figure 1 below.
The Northern Cities Shift is found throughout the industrial inland North and most strongly advanced in the largest cities: Syracuse, Rochester, Buffalo, Cleveland, Toledo, Detroit, Flint, Gary, Chicago, Rockford. The shift begins when /æ/, the vowel of cad, moves to the position of the vowel of idea /i'/ (1). The vowel /o/ in cod then shifts forward so that it sounds like cad to speakers of other dialects (2). /oh/ in cawed moves down to the position formerly occupied by cod (3), /e/ in Ked moves down and back to sound like the vowel of cud (4), /cud moves back to the position formerly occupied by cawed (5), and /i/ in kid moves back in parallel to the movement of /e/ (6).
A more concrete view of the Northern Cities Shift can be seen on Chart 1. The chart shows the F1, F2 measurements of the six vowels involved for Sharon K., a 35-year old woman from Rochester, NY. The mean value for each vowel is displayed in a white circle. In Sharon K.'s speech stages 1-5 of the shift have gone to completion. (1) Short /æ/ has moved from lower front to high front position, while (2) short /o/ has moved forward to the position vacated by /æ/, and (3) long open /oh/ has moved down to the position vacated by /o/. At the same time, short /e/ has moved down and back until it is positioned in low central position, directly above /o/, while wedge, the vowel of cut has moved back to the position vacated by /oh/.
To assess the effects of the Northern Cities Shift, it may be helpful to compare the relative positions of the vowels involved for two advanced representatives of the shift with two speakers who are quite remote from its influence. Figure 2 displays the normalized means of five short vowels and /oh/. The two Southern speakers, from Springfield, Missouri and Greenville, South Carolina, show the short front vowels vertically aligned, and wedge somewhat front of short /o/, which is in low back position, not far from /oh/. Below them are two Detroit speakers, mother and daughter, who show a developed and advanced stage of the Northern Cities Shift. Here /æ/ has risen to upper mid front position, considerably fronter than short /i/, and /e/ has fallen to mid central position. For Leslie R., /e/ is still to the front of /o/, but for the daughter, Janice, the fronting of /o/ and the further backing of /e/ has led to what is almost a vertical alignment. At the same time, wedge has moved back and /oh/ down, so that it is now directly above /oh/.
On the other hand, the Southern Shift, found throughout the Southern States, South Midland, and many other areas, moves vowels in an opposite direction. The shift begins when /ay/ becomes monophthongized and shifts slightly to the front (1). The nucleus of the diphthong /ey/ then falls along a non-peripheral track until it becomes the lowest vowel in the system (2). The nucleus of the diphthong /iy/ follows a parallel path towards mid-center position (3). The short front vowels /i, e/ shift forward and up until they reach the front peripheral positions formerly occupied by /iy/ and /ey/, and /æ/ moves in parallel (4). The nuclei of /uw/ and /ow/ then shift forward to front and center positions (5,6). /ohr/ (now most often merged with /owr/ ) moves up to high back position (7), and /ahr/ shifts up and back to the position that /ohr/ vacated (8).
The operation of the Southern Shift is illustrated in detail in Chart 2, which shows the mean values of the relevant vowels in the normalized system of Thelma M. from Birmingham, Alabama. In this doubly linear plot, the vertical dimension is the first formant, which corresponds roughly to phonetic height, and the horizontal dimension is the second formant, which corresponds roughly to fronting and backing of the vowel. The arrows indicate the path of each vowel class from the initial position for American English to the current location of the means for Thelma M. In contrast to the Northern Cities Shift, Chart 2 shows that the three short front vowels, /i/, /e/, /æ/ have retained their vertical organization, and have moved up and to the front, in a peripheral position. On the other hand, the long vowel /iy/ in be and the long vowel /ey/ in made have shifted down and to the center along a centralized, non-peripheral track. The /ey/ vowel now has a low nucleus, extending down to overlap to a considerable extent with /ay/, which is monopthongized to [a:]. The back vowels /uw/ and /ow/ have moved strongly to the front, while /ahr/ and /ohr/ show an upward chain shift in the back.
The Northern Cities Shift and the Southern Shift are both complex relations of 6 to 10 vowels. One of the goals of the Telsur project is to derive a small set of numerical parameters which can place each speaker's system within the overall configuration of the regional dialects of North America in a way that reflects both geographic and linguistic regularities. Two such parameters will be presented here: æ/e reversal and e/o alignment.They are designed primarily as measures of participation in the Northern Cities Shift, but they also isolate Southern systems, since the movements of the Southern Shift are diametrically opposed to movements of the Northern Cities Shift.
The first parameter is a discrete transformation of the quantitative relations of short /æ/ in man, bad, cat, etc. and short /e/ in pen, bet, bed, etc. Step 1 of Figure 1 begins with /æ/ in low front position, further back and lower than /e/. As /æ/ moves forward, their relative positions change. First /æ/ shifts to a position where it is fronter than /e/ but still lower, and eventually reaches a position where it is both higher and fronter. This reversal is accelerated as /e/ moves down and back in step 4 of Figure 1.
Map 3 shows the geographic distribution of these relative positions of short /æ/ and short /e/. As indicated by the legend, the red circles show the situation characteristic of conservative and Southern dialects, where /e/ is lower and backer than /æ/. The yellow circles show speakers where /æ/ has moved to the front, but is not yet higher than /e/; the blue circles are all those systems where the relative positions are now reversed.
It is immediately clear that the blue circles are confined to the area of the Northern dialect region, ranging from Chicago in the western portion to Grand Rapids, Detroit, Cleveland, Buffalo, Rochester and Syracuse. The grey line through Pennsylvania and the Midwest is a useful point of reference here. This is the line separating the Northern dialect area from the North Midland, drawn by Carver (1987) on the basis of the distribution of words in the Dictionary of American Regional English. The lexical isogloss drawn from the DARE data, separating North from North Midland, sharply delimits this aspect of the Northern Cities Shift. On the other hand, the most conservative relation between /æ/ and /e/ dominates the Southern and Western dialects, with the intermediate stage characteristic of the North Midland.
A second parameter of the Northern Cities Shift concerns the front/back alignment of short /e/ and short /o/. The starting point for these sound changes, indicated by the position of the ellipses in Figure 1, shows short /e/ well in front and higher than short /o/. In step 2 of Figure 1, /o/ moves forward, and in step 4, /e/ moves to the back. Each step in the Northern Cities Shift reduces the difference between the F2 of /e/ and the F2 of /o/, so that for the most advanced stage of the shift, seen in Figure 2, /e/ is directly aligned with /o/ on the F2 axis.
Map 4 shows six categories in the F2 differentiation of short /e/ and /o/. Red circles show speakers whose mean F2 difference between /e/ and /o/ is over 800 Hz. These are clearly confined to the Southern dialect area. Orange circles, with 600-800 Hz differences are characteristic of the South Midland and the West extending from Ohio to Kansas. Closer positions of /e/ and /o/ are shown by green and light blue symbols; the dark blue symbols show the speakers for whom /e/ and /o/ are vertically aligned, characteristic of the advanced stages of the Northern Cities Shift. Again, this pattern is found only in the Northern dialect region, and appears in all of the large cities of that area.
A third index of the North/South axis is independent of the Northern Cities Shift: it concerns the relative fronting of /aw/ and /ay/. In the North, most speakers have a nucleus of /aw/ back of center, while /ay/ is center or front of center. In the North Midland and the South, and most of the West, these relations are reversed, as shown in Map 5.
On Map 5, the darkest circles labelled "< 50" in the legend indicate speakers for whom the difference between the mean F2 of /aw/ and the mean F2 of /ay/ is less than 50 Hz; The medium circles labelled "50-300" show all those speakers with a difference between 50 and 300 Hz; the circles labelled "> 300" show strong fronting of /aw/, with a difference of over 300 Hz. It is evident that the blue circles are confined to an area north of the North/Midland isogloss, but here this northern trait extends westward through the North Central states. The green circles are characteristic of the South, but are also well represented in the South Midland portions of Ohio, Indiana and Illinois, and dominate the more western extension of the South Midland in Missouri and Kansas.
Given the number of parameters involved in comparing two-dimensional plots of sixteen vowels, it seems reasonable to submit the matrix to a principal components analysis and see whether the geographic areas identified on the map will emerge from the overall pattern of the data.
The various points, each corresponding to a speaker, are colored according to the dialect: dark blue is the Northern Cities and blue-green is North Central. These clearly cluster at the high end of U1. Red circles show Southern speakers, and purple South Midland, which clearly cluster at the low end of U1. It is quite clear that U1 is the North/South axis which several of the earlier figures have shown to be an organizing principle of dialect differentiation. It also represents the advancement of the two sets of sound changes. Note that the high point, labeled Detroit 14, is the 14-year-old daughter of the speaker labeled Detroit 42. The lowest outlier is 39-year-old Thelma M. from Birmingham, whom we saw earlier as the most advanced representative of the Southern Shift in the set of acoustic analyses done so far.
Along the U2 axis, we see the separation of the eastern from the western sections of the South Midland: Four Kansas speakers, who aligned strongly with the South on the /aw/-/ay/ parameter, are clustered at right with high values of U2, while the Southern mountain speakers are grouped at left, with low U2 values.
Our data collection in the South and Canada is still incomplete, and there is much more to be said about the dialectal situation in the West. Yet as a first result, these acoustic analyses yield new quantitative methods for aligning dialects along the North/South axis which we believe will be quite useful in speech recognition. It must be noted that the sharpest North/South division is far from the Mason-Dixon line: it is the great divide between the North and North Midland dialects. On one side of this line are the large cities of the inland North from Madison in the west to Syracuse in the east, which follow the Northern Cities Shift with extraordinary homogeneity. South of this line are the large cities of the North Midland, which follow other and more diverse principles: Pittsburgh, Columbus, Dayton, Cincinnati, Indianapolis, Evansville, Saint Louis. South of the Ohio River, we find once again uniform patterns of vowel shifting: the Southern Shift governs the vowel development of Appalachia as well as the coastal South.
The Phonological Atlas provides the first national view of American phonology, more systematic and considerably simpler than the information that has been gleaned from dialectology over the past sixty years. In place of the gradual decay of rural dialects, we see the rapid development of new sound changes in all the urbanized areas. Instead of a long list of dialect features, the Telsur telephone survey provide a small set of parameters that can be located at any given dialect in relation to others. At the same time, the Telsur project has verified the revolutionary conclusions that Kurath and McDavid derived from the Linguistic Atlas : that there are three coequal regions of American dialects, the North, the Midland and the South. The boundary between the North and the North Midland, which is hidden from popular awareness and seemed obscure to many scholars, emerges as the most important feature on the landscape of American English.
Bailey, Guy and Garry Ross 1992. The evolution of a vernacular. In M. Rissanen et al., History of Englishes: New Methods and Interpretations in Historical Linguistics. Berlin: Mouton de Gruyter. Pp. 519-531.
Brown, Vivian. 1990. The social and linguistic history of a merger: /i/ and /e/ before nasals in Southern American English. Texas A & M University dissertation.
Carver, Craig M.. 1987. American Regional Dialects: A Word Geography. Ann Arbor: U. of Michigan Press.
Kurath, Hans and Raven I. McDavid, Jr. 1961. The Pronunciation of English in the Atlantic States. Ann Arbor: U. of Michigan.
Labov, William. 1991. The three dialects of English. In P. Eckert (ed.), New Ways of Analyzing Sound Change. New York: Academic Press. Pp. 1-44.
Labov, William 1994. Principles of Linguistic Change. Volume 1: Internal factors. Oxford: Blackwell Publishers.
Labov, William, Malcah Yaeger & Richard Steiner. 1972. A Quantitative Study of Sound Change in Progress. Philadelphia: U. S. Regional Survey.