Get pinyin, zhuyin, simplified and traditional form on Google Sheets directly

Hi all,

I made this Google Sheets add-on to get pinyin, zhuyin, simplified, traditional form and definitions easily. You can easily look up 50 words at once with a single drag.

Here is how it works:

You can install the add-on easily:

  1. Open a Google Sheets document (https://sheets.google.com)
  2. Go to Add-ons > Get add-ons > search " Dictionary Functions " and install it
  3. click " Allow " on Permission dialog.

Supported functions:
=pinyin(term)
=zhuyin(term)
=simplified(term)
=traditional(term)
=def(term, “zh”)

or you can this function to retrieve multiple fields at once like in the video:
=dict(term, “zh”, fields)

2 Likes

I’d love to give this a try but… where’s the link?

forget it, I’m dumb lol

Trying it right now :slight_smile:

Thanks. There are many known issues. Please let me know what doesn’t work for you? :grinning:

I tested your addon using the entire set of Simplified Chinese HSK words

I attempted to retrieve the pinyin for each word and found the following issues:

  1. After roughly 1000 requests it throws you the following error:
    “Error: Forbidden access. You have reached the daily quota, please try again tomorrow. (line 42).”
  • Looks like Google sheets is fundamentally not the best place to run scrapers from.
  1. In the 1000 word sample, it only managed to find the pinyin in 40% of the cases
  • It is probably because wiktionary prioritises traditional characters and has empty entries for simplified ones. If your program fails to find the info on the original entry page, it must attempt to request the ‘see [traditional word]’ url and try again on the other page.
  1. Cannot use the function in a cell that has content on the adjacent right, as it is trying to copy the wiktionary message.
  • If I were you I would remove the wiktionary message

,

Hi, thank you for your feedback. All issues are expected.

  1. This is not the limit of Wiktionary. In fact, the add-on does not connect to Wiktionary servers at all. I placed the daily limit (1000 requests) because I expected that this limit is good for normal use cases. Sending 1000 daily requests per user to my current database is a lot compared to other normal websites (it may crashes my server due to database bottleneck if users keep sending requests). As I get enough fund from Patreon, I’ll upgrade the database so I can increase the limit.

  2. This is one of the known issues I’m going to fix soon. You’re right. You cannot look up anything for simplified form yet because Wiktionary tries to reduce duplicate information by referring to traditional form. But this is an easy fix.

  3. The last column is attribution. It’s required when sharing copyrighted data under CC-BY-SA 3.0. You can delete the column for personal use. If you distribute the data like I did, you just need to include the attribution (or the copyright notice) somewhere. Otherwise, you may get sued by Wiktionary contributors/authors.

Oh I was assuming it was a limitation from Google’s side.
You mean you downloaded the entire wiktionary onto your server? Why not just send direct requests from the user to wiktionary to avoid these issues? (I just wrote a Python scraper for pinyin that does that but for Baidu dictionary)

Yes. Wiktionary makes it easy to download their whole databases. There are 3 reasons why I did that

  1. I’m not parsing their HTML pages. I’m parsing their page contents which come from their databases.
  2. It’s more performant. The data are already parsed and stored on my database. So it just needs to query my database instead of parsing the pages all over again for each requests.
  3. Do analysis. When I have those data, I can connect them. For example, group all words by their categories.

Previously, you couldn’t look up simplified words. This issue has been fixed :grinning: