The case against DeepL

3 Sept

As a freelance translator, I hear pretty regularly: "Enjoy it while it lasts; you'll be replaced by DeepL any day now!"

While I understand their point, I think DeepL has some fundamental problems that are quite hard to solve, and I'd like to lay some of them out today. While DeepL is the stand-in here for a particular automatic translation service, all of the criticisms equally apply to Google Translate, Reverso, and even AIs like ChatGPT.

While the latter might sound strange to include, I believe the comparison is apt because DeepL is a ChatGPT in disguise: the company has recently started using LLM to predict the best translation, which is the same technology that AI programs like ChatGPT use to make their translations (and generate answers to all other queries as well).

Furthermore, DeepL claims that blind comparisons done by language experts find their translations far better than that by either AI or Google Translate. While they have not shown the research to back up this claim, it does, from my experience reviewing texts from both tools, feel intuitively correct that this is the case, so I will take their word for it.

Translating is more than just finding the correct words

If you have ever tried to put a technical text through DeepL, the program starts to flail hard. These days, it has (mostly but not always) figured out the difference between terms with different meanings, like a birthday party and a political party. Still, it will start hallucinating technical terms and phrases quite quickly or simply not translating them at all.

However, we can go a level deeper than this. DeepL assumes that the way the text is built makes sense in a way that works in both languages, while this is certainly incorrect. Restructuring the text to make it sound intuitive in other languages is one of the critical distinctions between good translators and bots, and even translators of poor(er) quality. As a result, texts translated by DeepL sound mechanical and are sometimes hard to read.

Anchoring bias

Anchoring bias is a phenomenon in psychology research which suggests that people rely too heavily on the first piece of information they receive (hence, the anchor) when making decisions. This is precisely what happens when using DeepL. Reviewing a text translated by DeepL rather than producing one from scratch weakens the ability to create a good translation since the subconscious assumption is there that the text is good on a fundamental level, even when this is not the case.

So, is DeepL of no use at all?

While I will argue that a (good) translator will always do it better than a bot, that is not to say DeepL is useless. Even if you agree with my arguments, there are plenty of cases in which they do not matter! If you are on a street in France, for example, and using DeepL to translate an image of a sign or a roadmap, getting a broad understanding is all that matters.

Similarly, if you are translating terms and conditions or privacy policies—which is not only a lot of work but also dull as dishwater—chances are that DeepL will produce a usable first draft since those are purposefully unreadable and written to be boring anyway.

The difference between 'good' and 'good enough' is key in this debate, and so far, I am unconvinced that DeepL will ever produce something genuinely good. This is partly because it has already started polluting its own database: the company trains its machine on the Internet, which is already filled with translations of its creation. It is hard to improve if it is being trained on what the programme itself has already produced.

Bias and self-interest

As a final point, I would like to acknowledge that I am, of course, biased against the program! However, this might not be to the degree you might think: from what I have seen so far, are that clients on the lower end of the requirements (and budget) spectrum have started using DeepL and other LLM to translate, because the higher-end companies do not have faith in its quality. This is the work that otherwise would have been done by content mills and similar content factories. Since I've been around for a while now – my company is coming up on its 5th anniversary next year, and I did some translation work before that – I didn't work for content mills and similar clients anymore regardless.

Perversely, therefore, it might be in my self-interest, at least in the short to medium term, that DeepL performs well enough. For many freelancers, content mills are the way to get started in the industry, and removing them from the playing field substantially increases the barrier to entry and reduces my direct competition. While this is not how I think about the world (and because I believe good and persistent translators will always find a way to get started), it substantially complicates my self-interest.

Moving forward

I am confident that DeepL and other AI are well behind good translators for the time being. Good translators will find ways to leverage these tools where appropriate - in other words, sparingly - while continuing to provide the human touch that quality translations demand. As for DeepL? I am sure it will keep getting better, at least from a six out of ten to a seven in the next few years, perhaps to an eight if they're lucky. But replacing translators entirely? I wouldn't bet on it - at least not anytime soon.

Let's wrap up this discussion for now since this post is already a bit lengthier than a regular blog. Thanks for reading if you have gotten this far. As always, if you have any thoughts or are looking for your next DeepL-beating translator, feel free to reach out.

Thijmen Zuiderwijk