Missing Words in villturdrekkin's 5000 Most Frequent

(Lcwright1964) #1

The course author has done an excellent job on this list, which is from the Routledge Frequency Dictionary by Mark Davies. But the actual source contains EXACTLY 5000 entries, yet from this course there are 22 words missing. The OCD in me wants to know why. Any ideas?

This is the course in question:


(Ian) #2

Hi, the Davies dictionary (1st ed., 2004) contains some “repeated” entries due to the same Spanish word being used in more one part of of speech. For example:

complejo adj complex, complicated 1278
complejo nm complex 2948

When the Memrise software totals up the total number of words in a course, it does not count the items it detects as duplicates.

(Lcwright1964) #3

Thanks, Ian. I did indeed notice the duplication myself. FWIW, I just got a copy of the second edition of the dictionary. It is different still! A different and much larger corpus (actually, combined corpora), with 500 new entries displacing former ones. Very helpful work. Just learning the first 2000 words of the Memrise course helped me muddle through mainstream Spanish texts with relative ease–which makes sense since Davies’ own research found that the first 2000 covered between 84 and 86 percent of written texts and nearly 93 percent of the oral ones. I will do some research to see if he has reported similar rates for the second edition.

(Ian) #4

The corpus used for 2nd ed. of the Davies dictionary includes a ~2 billion word web corpus. I’m not sure such a large corpus is really necessary, but doesn’t do any harm. I’m guessing that most of 500 new items in the 2nd edition come from the web corpus. I think that learning all the vocab from both the 1st and 2nd Editions (presumably ~5,500 words) would be a very useful exercise. Maybe someone will create a Memrise course that covers the 500 new items in the 2nd Edition.