Littérature Courses

I downloaded the epub file from that site, converted it to text with Calibre, counted the frequencies of the words with a word frequency counter program (Word Frequency Counter), pasted that into Excel (list 1

I pasted the text of the book into Notepad++, replaced all non-word characters with newlines (\W changed to \n, first change to “Regular expression”), pasted the result into Excel, deleted duplicates (list 2), then found the frequencies of the words in this list by doing a =vlookup search on list 1. Result: This was very quick and dirty to show what’s possible. You can also associate the words with the paragraphs or lines they are used in, and do cloze deletion.

To make this better, it’s good to have a list of conjugations associated with their infinitive, to get a better idea of the true frequency of a base-word, and perhaps a study of the frequencies of the words in different corpora, to exclude the most common and least common words.


I stand by my statement of Aniara needing to be read.

amanda, maybe you want to make the course with me? I might wait until I finish the first 2 of your courses.


This was very quick and will have mistakes. Would you like to search for a list of conjugations and corpora? After you’ve decided on a few texts, and the format you would like, it’s easy enough to make something.

If you give me a list of links like this!/bibliotek?title=Aniara and a format you would the output to have, I could do those books as well. I associate the word with the paragraphs with an Excel macro, that takes slightly longer, especially if I don’t first filter out the most frequent words.

I don’t read Swedish :no_mouth:

I use this method with Chinese. I import the text into Pleco and also a user dictionary like this of the text I am reading. Then when I am reading the book or watching the program, I can just tap on the word and see how often it’s used in different corpora, and the context in this book, Pleco also allows you to switch to different dictionaries so you can also see the definition of the word. If you then decide to learn the word, you can just tap a button to add it to your flashcard deck. Pleco only works for Chinese. Perhaps there is a different program that will work with European languages.


If you could do that for a selection of books, that’d be incredible. I’ll have to seriously peruse the site tonight to see which texts would be most beneficial and least amount of overlap. Maybe @amanda-norrsken and @Xephers have some thoughts?

You people are amazing! I only understood a fraction of @Arete_Hime’s posts, LOL, but it sounds impressive :smiley:


I happened to find a PDF of “Män som hatar kvinnor” on the net somewhere. I think it was a Russian file, but I was able to use it. I’ll post the link if I can find where I stashed it :blush:


Oh, and I have also found my original Google docs files with the levels I originally created. I can share them with you, if you like. I have a horrible feeling that I was unhappy with my attempts to make a memrise course and at some point deleted what I had done… I was sure that no-one would ever try and do the course, I think :frowning:

Little did I know that there was four-gated Danzig out there :smiley:

At any rate, just PM me and give me your email and then I can share those docs and we could build a new “Män som hatar kvinnor” course which consists of “lexical chunks” and not just individual words.

Let me know what you think!

This takes you to page 54, which must have been where I was when I was trying to make my course.


I advocate choosing either graded readers (boring) or children’s books. You should try to find books where your comprehension is already over 95%, ideally 98%, then you can actually read them for pleasure, and relatively quickly, so you can get more exposure to the language than if you have to spend hours on a single page.

That said, my previous recommendation of using movies or tv series still stands.

Also, please try to find a list, perhaps a corpus, that allows you to associate the word with its lemma (base-word). Now the list includes things like this:

If my guess is correct that most of these are variations of a single word, it’s helpful to group them together. I tried to quickly find a corpus and found this: but that doesn’t show the lemma I think.

In my own dictionaries, I include the title of the book and the paragraph number (I include the first several paragraphs the word is used in) to help me remember where I saw the word first. You could also decide to make a course based on a series of books (the work involved, after making the text file from the books, is identical).

An entry for a word might then look like, say if you used Nils and Ronja (Rövardotter):

Take your time to think about it. What source material to use and how to make the course. You’ll also probably want to find dictionaries.

Thanks for all of the advice. I’m a huge children’s literature buff and read it in five languages at the moment for sheer pleasure, so I completely agree with everything in that article (which I’ve already read! ha, ha). Sweden has great traditions of rewarding its CL authors, from the to to to the international ,not to mention the wealth of pan-scandinavian prizes.

ANYWHO. Finding the books in epub/pdf is the issue. I believe on the website I linked you there’s no Astrid Lindgren for Ronja. There is the complete Nils, one book by Elsa Beskow, and the Swedish versions of The Secret Garden and Anne of Green Gables. Maybe those coupled with the vocabulary from the first Stieg Larsson, which is written with easy diction and all useful, would combine to create a fantastic course.

I’ll take the time to consider it all and gather resources and such.

When I was looking for epubs, I came across this tremendous website, which has extracts from books and even some full ones.

For example, I read this one in German, and I’m so glad to see the original! @amanda-norrsken, I think you’d love it. A pixar short waiting to happen.

It would be wonderful to have more literature courses. I’ve been contemplating creating a course for the ballads for a while, but I still have a couple hundred more words to add to the Harry Potter course. Now that I have gotten so far with it, I have to see it through to the end. I’m also interested in courses for detective fiction and children’s lit.

My method of course creation is probably not the most efficient - I add words to a spreadsheet as I am reading and then use bulk import to memrise after I have added the definition. I also individually check them against my database to avoid duplicates.


That’s how I read in my languages. Except replace spreadsheet for yellow legal pad; I handwrite whilst reading, then look up the definitions when I’m adding the words to my private courses.

The feature to think about with Arete_Hime’s method is weeding out the obvious words that’ll pop up when you aggregate a whole book, like “jag” or “är” &c.


I hadn’t deleted my courses! Here they are (for what they are worth):

Have a look at what I have put together so far and let me know if this is the kind of thing you like!

You can do that either manually, automatically by comparing it to the list of words you already learned on Memrise say (you could use for example to export words from Memrise), or automatically by filtering out the most frequent words according to a corpus, if you use one, or the frequency count.

If you can’t find a list of all verbs and their conjugations, perhaps comparing the list with a list of regular verbs in their infinitive form (and adjectives in their base form etc) with =vlookup(…,TRUE) (this function gives you approximate matches) gets you also almost there, with more mistakes. You’d still need a list with irregular verbs and stuff.

Then another problem I realized is probably words that have spaces in them and that can be split, like phrasal verbs. The program won’t be able to accurately count those. Something that might mitigate that is to have also a list of words like that and before doing the frequency count replace those words, like blossa upp with something like blossa-upp. I think I know of a userscript that allows you to do that.

The more processing you do the more accurate probably you’re going to get it, but also perhaps the more time you need to spend on it finding and manipulating the data. It’s on you to decide when it’s good enough.

If you find a source and don’t know how to get it in a usable format quickly after you’ve tinkered with it a bit, please post it here so we can look at it.

I’ll try to get all of that this week. Thanks again for the tips and walkthroughs.

@amanda-norrsken, your courses are great! Learning the vocabulary in context is obviously superior (hint: mems…), but I normally just make my courses with the word and recall the context from the novel.


I went ahead and downloaded this dictionary:

Looking through it, if I were to make the course, I would definitely add the German translations to the course as many of the words look so similar, and probably etymology if I could quickly find that as I read many words are also derived from French and English.

I managed to get one entry per line by converting the document to HTMLZ in Calibre and doing some replace all:

That gets you to a first short list of 37k words, which can be useful in some ways (the project’s own page probably has this already in a better format).


I’m working on a course for Liftarens Guide till Galaxen / The Hitchhiker’s Guide to the Galaxy (current status: 5 chapters done) and may do some other popular books in the future.

Sneak peak:

Note that these courses presume that you are already at a B1~ level - not all vocabulary is covered. Only things which myself (C1~) and my friend (~A2-B1) felt necessary to add.

There’s also this course which is for a short children’s book. You can find this book here (note - pretty sure the books on this website are out of copyright, hence why they haven’t gotten into any trouble).

I quite like reading and I have a LOT of Swedish books. I also have a habit of underlining new vocabulary whenever I read a book, which makes collecting said vocabulary much easier. Additionally, I quite like making courses… so feel free to hit me up with suggestions. I’d like to do some classic literature next, personally.


Sounds awesome. Loving the sample sentences. Can’t wait to see the finished product. Maybe you want to go in and make a course together with @miaomiaopurr?—going to make one on Ronja Rövardotter, I believe.

Like I said above, you have to read Aniara. Even if you’re not a poetry reader, you’ll like it.


I’ve only read Bröderna Lejonhjärta so far (yes, it also has underlined vocabulary). My only exposure to other Astrid Lindgren is via audiobooks.

I love poetry, so maybe I should look into this “Aniara” business.


Thanks for sharing the spreadsheet of your course in progress. I really look forward to trying them out when they are ready. :slight_smile:

I’m curious if you can recommend any sources for audiobooks in Swedish? I tend to rely on (US version) and while they have a fair selection of several languages, their Swedish offerings are really disappointing. Any ideas would be appreciated!

This may be of interest.

I’m looking forward to the hitch hikers guide course. That’ll be fun once to do.