YouTube is automatically deleting comments that contain certain Chinese-language phrases related to criticism of the country’s ruling Communist Party (CCP). The company confirmed to The Verge this was happening in error and that it was looking into the issue.
“This appears to be an error in our enforcement systems and we are investigating,” said a YouTube spokesperson. The company did not elaborate on how or why this error came to be, but said it was not the result of any change in its moderation policy.
But if the deletions are the result of a simple mistake, then it’s one that’s gone unnoticed for six months. The Verge found evidence that comments were being deleted as early as October 2019, when the issue was raised on YouTube’s official help pages and multiple users confirmed that they had experienced the same problem.
Sure, an “error in our enforcement systems” that was reported six months ago. I just don’t believe very specific things like this – and the trigger words are very specific and require contextual knowledge – are implemented by error.
I happen to work for Google, but not for YouTube, and do not know what exactly happened. So all of this is speculation, and don’t take it 100% seriously.
Modern systems are increasingly automated. This produces good results 99% of the time. The other 1% is not so easy.
Remember the “Weapons of mass destruction” so-called Google bomb?
https://en.wikipedia.org/wiki/Google_bombing
The algorithm could be fooled by coordinated external links from blogs (even blog comments, which was later “fixed”): https://www.searchenginejournal.com/blog-comments-link-building/350748/
Before that there was AltaVista, which had a much larger catalog than the Yahoo Directory. However it too was fooled by excessive keyword spam in that same page.
Comment and content moderation is similar. Algorithms are usually very good, but once they fail they make national news.
sukru,
It’s easy to blame the algorithms, but google’s lack of transparency about the algorithms hurts it’s credibility. I don’t buy for a second that the algorithms spontaneously started censoring these phrases without some form of input instructing the algorithms to do it. That’s the thing about neural nets and machine learning, it needs to be trained and I suspect there was no algorithmic fault. I concede this is speculation, but short of a convincing explanation from google (which they failed to provide) the most probable cause is that the training data censored these phrases also.
I wouldn’t expect you to know what happened, but you gotta admit that it’s fair to be somewhat skeptical of google’s official explanation that machine learning is to blame.
> It’s easy to blame the algorithms, but google’s lack of transparency
> about the algorithms hurts it’s credibility
How? Does any other company publish their code and algorithms, and does it affect their credibility? Does Ms? Apple? FB? IBM? Etc.?
> That’s the thing about neural nets and machine learning, it needs to be
> trained and I suspect there was no algorithmic fault.
> the most probable cause is that the training data censored these
> phrases also.
No, the most common issues with NN are 1) training data not representing a particular issue. (then the output cannot be predicted) and 2) training data that is not normalized enough (then the NN reproduces bias present in the training data, in the society).
ChodaSly,
That’s not really my problem. I could have told you that they don’t want to publish their algorithms, but as the expression goes: they cannot have their cake and eat it too. By refusing to be transparent, they leave themselves open to rational skepticism.
While it would be possible to coincidentally flag some posts, it’s not very plausible that the algorithm would consistently censor two exact phrases every time unless those specific phrases held significance in the training data. In all likelihood an independent investigation would reveal that there was no bias with the machine learning algorithms, only with the input training data. This is the simplest and most likely explanation. Alas factually proving or disproving this would require transparency from google, which as you noted is probably not forthcoming, so we’re pretty much stuck at speculations.
I wouldn’t be surprised by something as simple as some sort of report-based training system that has logic like this:
1. A lot of reports come in for messages
2. The system finds something in common between the reported messages, and trains itself to see that common thing as bad
3. The system then matches on that common thing, lowering the threshold for reports to remove a message (or even not requiring any reports)
I could easily see someone implementing that as a way to get the community to moderate things for you, without directly giving them moderator powers, and with being able to adapt to whatever situation (this kind of system may be able to adapt to, say, a new racist dogwhistle, much faster than training human moderators).
…of course, if you have bots mass-reporting maliciously, that logic then can be used to maliciously train the system to censor things automatically for you, against even the system owner’s will (or potentially even knowledge depending on how it’s set up), which is why this kind of thing probably shouldn’t be hooked up to moderation decisions (or at least require manual human review before a final decision is made).
bhtooefr,
Well, you can certainly have users report “objectionable content”, but obviously you could game the system if you auto-moderated based on reports without some kind of review. What they’ve probably done is to train their filters with input from google’s own human moderators and it’s likely google’s filters consistently censored the phrases because the humans did as well.
We can’t do much more than speculate given the lack of transparency, but I would imagine most of the training dataset would have come from google’s own moderators.
bhtooefr,
That would be one of my guesses. There are several different ways data enters the systems, and all these would make good coffee conversations. Or… warm up points in a technical interview. Speculating how a system could fail is good mental fun.
Alfman,
I am at a bit disadvantage here As a general rule I learned to trust judgement of engineers, not only at my workplace, but other organizations, too. I have many friends across different companies, and given my interactions, bad intent would be at the bottom of my list. (Though we set up systems to proactively prevent that. Better be safe than sorry).
That being said, even in a room full of PhD’s it is not possible to design perfect systems.
sukru,
Oh, I didn’t mean to imply there was deliberately bad intent at the corporate level. Perhaps the censorship may have been localized to a few employees. But my point was that I wouldn’t necessarily trust PR spokespersons to paint an accurate picture of what actually happened. As engineers sometimes we’re privy to events & information that don’t always align with the official story for one reason or another.
Yes, that’s the way it’s SUPPOSED to work – balanced with human oversight to correct the automated system when it produces errors. But Google appears to have done away with that second part, or at least doesn’t put anywhere near sufficient resources into it, and seems to have become over-reliant on those automated systems. As a result, at least in my experience, the accuracy of those systems is nowhere near 99%, and the error rate is more like 20-30% – both false-positives and false-negatives.
E.g. on the false-negative side, over the last year or two, I’ve noticed an increase in spam EMails with links to malicious files hosted on Google Drive. If the files were uploaded directly, Google’s scanners would probably catch them – but the spammers realized that, so long as they either put the files inside a less-common archive format, or a password-protected zip file, then they’ll be completely missed. And Google does provide mechanisms for manually reporting malicious files in Drive, but they seem to be purely decorative at this point – there are malicious files hosted on Drive that I reported back in February, and are still online more than 3 months later. I even reported the URLs as malicious separately via Google’s “safebrowsing” service, and that still didn’t make any difference. If anyone’s curious, the URL is:
https://drive.google.com/uc?id=1bVOrRozJg9wPoALxHn2nou_XjZGUit50
(Though obviously don’t open any of the files at that link! At least not on Windows)
And on the false-positive side, it seems that some changes were made to GMail’s filters late in fall/early winter 2019.; since then, GMail seems to be marking as “Dangerous” any PDF attachment that contains a domain name, particularly a .com or .net (initially it seemed to ignore other TLDs like .ca, but it now seems to also consider them “dangerous” as well). As best I can tell, it’s a filter meant to block phishing attempts – but unfortunately it’s an especially braindead filter that casts far too wide a net, and mistakenly catches a non-trivial amount of legitimate messages. I’d go so far as to wager that that filter probably has a error rate higher than its success rate, catching more legitimate EMails than actual malicious ones – I manage a mailserver & report 300-400 spam EMails in a typical day, but I can’t remember the last time I saw a spam EMail that used malicious/phishing link inside a PDF file (probably sometime in March). But legitimate EMails with customer invoices for hosting or domain registration attached as PDF files (which, for obvious reasons, typically contain domain names) are fairly common in my experience.
My gut feeling is that one of their Chinese employees went rogue and blacklisted certain phrases, either on his own or as dictated by the Chinese government.
Incidents like these, however, highlight the fact how little these companies care about their preached “safety” and “children” and whatnots, and how much of those actions are simple reactions to public outcry by American fanatics.