In Homework #4, students got samples chosen at random without replacement from the list of 78,984 headwords from the 8th edition of the Merriam-Webster Collegiate Dictionary. Participant rated the words in their samples as yes ("I definitely know this word"), no ("I have no clue whatsoever about this word"), or maybe ("I have some sort of idea about this word, and could try to use it, but I might be at least partly wrong").

For the yes category, the percentiles of class answers were:

  0% 25% 50% 75% 100%
Proportion known
in sample of 100
.33 .5475 .6 .66 .79
Estimated count
of head words
26,065 43,244 47,490 52,129 62,397

For the yes and maybe categories combined, the percentiles were:

  0% 25% 50% 75% 100%
Proportion known
in sample of 100
.45 .68 .72 .77 .88
Estimated count
of head words
35,543 53,709 56,868 60,818 69,506

In other words, we can estimate that the median class member is confident of knowing 47,500 of the MW head words, and has some knowledge of almost 57,000 of them.

Here's the histogram of the yes + maybe categories, in terms of estimated headwords known to some degree:

This is a lower bound on the median member's passive vocabulary, since the dictionary list lacks most proper names as well as some other categories of words. Thus a glance at today's headlines turns up (among many others) these well-known words not on the list: Afghanistan, BBC, blog, Intel, Iraq, Liberia, Microsoft, Germany, GOP, Pakistan.