I want to talk more about what I’ve been calling “vigilante revitalization”. I recently wrote a series of posts about dying languages, discussing the importance of revitalization and why it’s difficult. Now, a new obstacle has come up: AI-generated books.
If you’re new here, or didn’t see those posts, the most relevant one for understanding what I’m talking about is this one about why revitalization is difficult. To understand the full scope of the issue, I’d recommend checking out my other posts, too. (See why it’s so important, and how you can help!)
As a quick summary, language revitalization is the process of bringing back a dying language. This often involves close work with the community that speaks that language.
I’m here to talk about a recent development, though. One that popped up after I wrote those posts. Specifically this article from the Montreal Gazette regarding several books that appeared on Amazon claiming to teach people dying/ endangered indigenous languages.
As a quick summary of that article, I’ll just say that these books were likely AI-Generated. There is no definitive proof of this, but native speakers of several of these languages find it likely.
Also, these books were credited to researchers or scholars of those languages who did not write them. This gives the books false credibility that could damage the reputations of people working in that field.
I speak in the past-tense here because they were, thankfully, removed from Amazon. But this speaks to a larger problem, one that I’ve mentioned before but want to discuss in more detail.
Computers and Language
Human language is heavily reliant on inferences, figures of speech, and hidden meanings. Language can take so many forms that direct translations aren’t even always possible.
Now, every language is capable of expressing any possible concept. It’s one of the defining features of language. But they don’t always express these concepts in the same way.
Here’s a quick, easy example. In Irish Gaelic/ Gaeilge, emotions and some physical states are expressed using the word for “on”. If I’m hungry, I would say the equivalent of “There is hunger on me.”
Now obviously, with a proper understanding of figures of speech and metaphor, we can understand the intent behind this phrase even if we don’t know the full context. But with computers, it’s easy for something like this to be entirely mistranslated.
Even English phrases like “give up”, “break down”, or “show up” can be easily prone to misinterpretation by those who don’t know their meaning. Computers have a hard time working around these phrases. (Just put a sentence through Google Translate a few times and you’ll see what I mean.)
There are so many different translations of books like The Odyssey, Beowulf, or even the Bible because there is a lot of room for interpretation when going from one language to another.
Old English can have some loose meanings, and it forces translators to make some creative inferences when translating the works. In fact, Old English poetry focuses more on alliteration and meter than on rhyme!
This is why it’s so important for language revitalization (or any language-learning program) to involve a native or proficient speaker, especially when making educational resources. Going into language revitalization without an understanding of both the proper procedures and the language itself will cause a lot of problems.
The AI Problem
The information in these books was incorrect in many other ways, too. (One of the books gave a translation for animal in a language that doesn’t have that word!) This is what led many to believe they had been generated by AI.
In my Old English class last spring, we discussed why AI was a bad tool for translating, especially with less documented languages. The AI has very little to draw on, no way of fact-checking itself, and often just makes things up to fill in gaps. It always assumes that it knows what it’s talking about even if it has no sources.
Asking an AI to write this book is no different than asking someone who has barely even heard of the language to write it. A human might at least put in the effort to research the language and find missing information.
AI is like a con artist claiming to know everything about everything, and just making up anything it doesn’t know.
Good-Intentioned Vigilante Revitalization
My main goal with that article is just to use it as a jumping-off point to talk about a deeper topic.
These books are a good example of why I’ve spoken out against this vigilantism. I don’t think it’s a huge or widespread issue. But even small groups or individuals without experience seeking to get into this field can do real harm.
Let’s assume for a minute that the person publishing these books had good intentions. They wanted to help people learn these languages in an accessible format; an attempt at vigilante revitalization.
Accessible, easy learning doesn’t always make for accurate learning. As the article described, these books were generally split between nouns, adjectives, and verbs. But one of the languages doesn’t even have standalone adjectives (meaning they are likely applied to nouns like we add -s to make something plural)!
Someone making these books through AI generation (or just a bad understanding of the language) could have the best intentions. But that doesn’t make the end result any better. People hoping to learn from these books will only find it harder to learn the correct information.
They find this book, hoping to learn about their heritage through it. As they study and learn, they start to engage with other speakers. It’s here that they find out they’ve learned a lot of words and even grammar incorrectly.
To draw on the adjective example above again, someone learning the language would essentially be speaking English-with-different-words. Forcing English grammar rules and categories onto another language isn’t much different from forcing religious customs onto a different religion.
Bad-Intentioned Vigilante Revitalization
Now that we’ve gone over the harm that can be caused from good intentions, let’s talk about bad intentions.
These language books were put on Amazon with a price attached; whoever published these wanted to make money. There’s nothing inherently wrong with that. People need to be paid for their work.
But when a book is published with inaccurate information and a false name, you start to question the publisher’s motivations. In similar circumstances, people would wonder if it’s a scam.
And, if I’m being honest, this is most likely what happened here. Somebody saw a possibility: a group of vulnerable languages in need of help. So they published some books, made with as little time investment as possible via AI, and published them to profit off of those issues.
Most likely, the person responsible for this expected the books to be taken down, but just planned to get the money and vanish.
It sucks, but it happens. Scams come and go constantly. I’m glad Amazon removed the books, and I hope something like this never happens again.
Conclusion
Like I said above, I knew when I wrote my previous posts that vigilante revitalization wasn’t really a super common thing. I mentioned it because I wanted to make sure people didn’t get the wrong idea from my posts here. I knew it did happen occasionally, and wanted to dissuade anyone from making it worse.
Then this Montreal Gazette article appeared. It provided a perfect example of why this can become a problem. Hopefully it can further help to discourage people from attempting this on their own. If you’re truly committed to helping dying languages, then you should be committed to doing it properly.
Learn the procedures, engage with the communities, learn the language, and build rapport. Become an activist or just help spread awareness. That’s how you can make a difference here.
Questions, comments, or concerns? Let me know below, or contact me directly!
Hi Tristan,
I am learning to look past AI to humans. For example, I read this post to drop a human-generated comment for a human and their human blogging tribe.
Filling my day with one-to-one, human interactions makes it easier to spot and look past AI. Or to see how even as the machine pushes it to the world through greed and/or desperation we do not need to use or consume it. Not easy for us to grasp at times but the consumer winds up being the determiner in the end.
Like with the revitalizing issue above, AI becomes a can of worms eventually. Always. Simply because it is artificial, or, not real, and the artificial always falls apart.
Ryan
Ryan
I couldn’t agree more, Ryan! I try to avoid talking about AI too much on here (other than to say that I don’t use it) but it popped up as a useful example of how even good intentions can cause problems with something like this.
Often what causes cultures and languages to disappear is intervention from other humans. Even now, AI is a tool used by other people. It isn’t malicious by its own nature; it’s only because of how we choose to use it.