We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT). Multi-way parallel, machine generated content not only dominates the translations in lower resource languages; it also constitutes a large fraction of the total web content in those languages. We also find evidence of a selection bias in the type of content which is translated into many languages, consistent with low quality English content being translated en masse into many lower resource languages, via MT. Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web.
Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico
As a translator myself, this is entirely unsurprising. Translating is a craft, a skill, and much like with any other craft, you get what you pay for. If you pay your translator(s) a good rate, you get a good translation. If you pay your translator(s) a shit rate, you get a shit translation. If you pay nothing, you get nothing.
I’m definitely seeing more and more people in my industry integrate machine translations, but so far, it’s not been an actual issue – I have no qualms about accepting a job where I take a machine-translated text and whip it into shape and turn it into a human-readable, quality translation… As long as people pay me a reasonable rate for it. Working from a machine translation is often quicker and easier, so the going rate obviously reflects that.
The quality of machine translations is absolutely atrocious, however, and the idea of relying on it for texts other people – customers, clients, employees, etc. – are actually supposed to read and work from is terrifying. Google Translate is an effective tool for personal use, but throwing, I don’t know, your product’s manual at it and dumping the unedited result onto your customers is borderline criminal.
Pay nothing, get nothing.
Pay nothing, get nothing.
#foss?
Isn’t the reason for this is we fully accept/expect developers to provide their time and skills but don’t expect the same from other professionals?
Why arnt language professionals offering their services for free to improve translations in software?
P. S. I am being a bit clickbaity on this to make the point as there is a clear double standard here.
Adurbe,
Interesting point, anyone think there’s a good reason for this?
I don’t know about “expectations”, but I believe there’s a reason indeed that free software (in both senses, for this occurrence) exists.
It started with people wanting to solve their own problems and/or show off their craftsmanship, learning new things in the process (from research and/or from others). Since it turned into something functional, users (personal and corporate) got interested and it lead to the most skilled devs being employed to improve and maintain said software.
There have been free amateur translations floating about for some high profile books (I remember this was a thing for latest Harry Potter releases around the time they were published, for example), that were pretty low quality, so that answers the “problem solving” part, but not the “showing off craftsmanship” part. There are also a fair number of collaborative translation projects, but none as high profile and visible as the most trendy software projects out there, I believe.
Reason might be that’s it’s harder to collaborate on a “minimum valuable product” for translations than for software. Or that the userbase is less visible, generating less interest. Or that translators are actually a rarer breed than developers. Or, more likely, that their skills are less celebrated than those of developers. And adding to that the absence of a high profile copyright-compliant translation project out there in the wild, the incentive for a good pro translator to spend free time on such an endeavour is just nowhere to be found, I guess.
What’s interesting is that despite developers and contributers coming from multiple native languages (but defacto working in English because of its dominance in programing) that those same people don’t contribute to translations.
Eg the recent malicious translations in 23.10. It means not a Single Ukrainian speaker checked the translations. Knowing how big the IT sector is in Ukraine it strikes me as crazy.
I wonder if FOSS is to good/focused on getting Dev contributions and neglecting the complementary skills within its same contribution community?
I half agree (I’m a developer who requires a paycheck to eat). Open source was meant to solve one very serious problem with code (which is that when you buy software, you can’t know what it’s actually doing without source (even then, you still have to trust the binary was compiled from the source you have)).
It didn’t really address any of the rest of the poblem with it, including power imbalances, and the issue of people getting paid for the work.
That said, while open source does undercut the value of “finished software” (as far as that goes), it doesn’t really undercut the value of “we still need someone to make this work”. That second part is arguably always true – you always need someone who can take the code and make it work – that work is never “finished”.
Case in point, there are a lot of hardware companies trying to do what Valve did, but without matching the software side. They have this proprietary mindset where they think, “this software is *finished* so we can just use that”. That’s not true – it was not even true for proprietary software, and it’s even less true for open source software like Steam OS. The value Valve brings is that they have a bunch of avenues going in the same direction – hardware, software, and sales channel, working together to deliver a great product. Even Intel seems to only have recently understood this, as they struggle to make up a multi-decade gap in software development for their Arc GPUs.
Valve’s software and sales channel doesn’t translate to other hardware company initiatives, but those companies could add them, A company with strong software chops, and an extant sales channel – say Epic, could easily follow Steam’s path. Others that don’t have a sales channel, could at least add the software part to their teams and deliver something better. But it’ll never be as profitable as it is for Valve, so they’ll still have the advantage. They never seem to figure all this out. I’d say it’s baffling, but I just explained it. It’s because they have a proprietary mindset.
Anyway, most open source software is written by people on a payroll. It’s a myth that it’s all unpaid hobby stuff. Open source disrupts the value of proprietary “finished” software products, and also provides a better cooperative development method that in the longer run is much much cheaper for individual companies to participate in. Proprietary software is one company’s liability. Open software is a shared liability. There’s still plenty of paid work to be done. We just need to solve the proprietary mind set.
BTW, this all holds true for hardware, and hardware companies. The value is not that Intel, AMD, ARM, or MSI or gigabit, has a particular sku, it’s that they can keep making new skus. EVGA wasn’t the best GPU maker because they had some good skus. They were the best because they had the best people, and were willing to take risks. Notable, most of these hardware companies don’t own the chips they sell. Their work might as well be open source. Is the same process. We’re starting to see open source penetrate even in hardware with stuff like RISC-V. It will likely have the not exactly a real problem perception of hardware designers all of a sudden doing all the work for free, but it won’t be true in hardware either.
CaptainN–,
You’re post is well thought out and well said, however I do feel it only covered one side of the argument when there are developers and projects who struggle because of FOSS. There’s kind of a duality here. It’s easy to use somebody else’s FOSS in your projects without giving anything back, but it can be very hard on the other side of this equation and everyone takes takes takes but don’t contribute back.
As much as I like FOSS, it’s very easy to have one’s work exploited especially if you’re just some random guy supporting FOSS and nobody helping you pay your bills. That’s a real drag. Sometimes even a much bigger company (ie redhat/ubuntu/etc) will take your FOSS software and profit off it themselves.
Here are two off the top of my head I’ve been impacted by.
https://nerdvittles.com/some-asterisk-resolutions-for-the-new-year/
https://www.phoronix.com/news/IT87-Linux-Driver-Axing
Thom Holwerda,
I see some similarity between what you are saying, and what I experience as a developer. Tons of local businesses have been offshoring skills like web development. Since I am unable to match the prices of cheaper foreign labor, I’m seeing quite a lot of “sorry, we went with a cheaper Indian company”. Many of those offshored projects do go bad. I would laugh at their cheap follies if it weren’t impacting my profession so seriously. Sometimes even after a bad experience businesses still prefer to go back to cheap offshore option rather than hire a local dev. I’ve seen this so many times that it irks me.
Anyway, here is the point that might apply to you as a translator, just like it does to me. If a business is adamant on paying less, these arguments for why ML translations are worse, even if entirely true, don’t necessarily mean you won’t loose business to it. It’s not our opinion that matters, but the opinion of the those in the company who actually make the call.
I earnestly hope the best for your career, Thom, just be mindful that your points about ML don’t necessarily mean human translators are safe from automation.
https://www.baselinemag.com/artificial-intelligence-ai/duolingo-embraces-ai-the-impact-on-language-learning/
@Alfman
The problem is often the case those secondary languages aren’t the primary marketing targets, so they get little budget support and are bunched in the “nice if we get some” category.
Until some local agency arises the quality isn’t even on the radar. I recently had a debate about this with a Chinese company, they were adamant the machine translation to English was already good enough, in that it conveys the meaning. They were so convinced the MT is good enough they can’t even be bothered making foreign language versions of their product brochures, they are happy o leave it up to Google Translate. It’s what they don’t know that hurts them, so I pointed out the MT won’t make the sale no matter how clear the result, it’s the impression it leaves that is far more important.
cpcf,
Most companies have no idea how to judge the quality of translations. To them, anything that’s returned as “Done” may equal “Good enough”. To really get the Q/A right, they not only have to commission the original translations, but even more translators to check those translations across all languages. That’s a lot of people involved!
That’s a good point, I’ve relied on google translate to help me read chinese documentation that wasn’t available in english. It got me the information I needed, but it was by no means good.
Another thing I sometimes notice is when documentation is written in valid english but isn’t written by a technically savvy writer. Having translators who are competent in both translating as well as the technical subject matter is probably too much to ask for though.
I think Thom’s approach might be helpful to you, many companies are doing the same, selling at just above the rate for offshore devs, then offshoring the work and touching it up before passing it on the the end user. I’ve known many companies here in the states that do similar work. You won’t get paid as much per work, but you’ll be able to do more projects.
Bill Shooter of Bul,
I’m not a fan of the middle man model in general. It would be hard for me to swallow that pill, but you may be right. Don’t fight it, just accept this is the way the world works.
I think this is where translation and developement analogy breaks. The quality question for translation seems to be defined (correct me if I’m wrong) on the level of sentence or paragraph of test, independently of others.
For a software having it ensure quality on the last moment check is impossible more frequently than not. There are many other aspects than literal code style / readability at play that you can’t simply solve with a PR comment, a day before testing phase.
The result is making a lot of compromises and taking responsibility for something you don’t really control. It’s not a good place to be.
s well as the technical subject matter is probably too much to ask for though.
Google is very bad at automatic translating, that’s true, however there are much better translation services out there such as DeepL. It may occasionally spit out sentences with awkward grammar, but that can be fixed by anyone who has a decent handle on the language being translated to, without needing to know the language being translated from.
Sidenote: I would appreciate if you didn’t insert ads in the middle of an article. I realize I could block them, but I’d rather not, especially since I’ve found some of the ads here useful. The ads in the middle of a post like here are just annoying.
What you don’t understand Thom is that some jurisdictions require some kind of local-language manual to be provided, and machine translation or cheap human translation is the preferred way to achieve minimal compliance with the law.
(before such laws were enacted, there were cases of no local language manual at all being provided, for example, my dad bought a Panasonic HiFi in the early 2000s in Greece and it didn’t ship with a Greek manual at all)