In Google’s First Tensor Processing Unit – Origins, we saw why and how Google developed the first Tensor Processing Unit (or TPU v1) in just 15 months, starting in late 2013.
Today’s post will look in more detail at the architecture that emerged from that work and at its performance.
The Chip Letter
People forget that Google is probably one of the largest hardware manufacturers out of the major technology companies. Sadly, we rarely get good insights into what, exactly, these machines are capable of, as they rarely make it to places like eBay so people can disseminate them.
Its crazy to me that the success of the TPU didn’t translate when they put a subset of it in the Tensor ARM System on a chip. I don’t mean that Tensor Arm chips are bad or anything, but it took them forever to do it. I’m not sure if the issues were technical, or internal politics.
Bill Shooter of Bul,
The hardware means nothing without the associated software for it. Especially more so when the APIs are very new: https://developer.android.com/ndk/guides/neuralnetworks
However it will get better over time. Even if Google themselves don’t directly use it (except for Camera, which is used in excellent ways), third parties are starting to take a look at it:
(From 2019, using GPT-2 natively on the phone):
https://towardsdatascience.com/on-device-machine-learning-text-generation-on-android-6ad940c00911
Why would it? The NPU is not a major IP block, in the big scheme of things, in a mobile SoC.
FWIW Almost every major Android ARM SoC have had NPUs, most even predating Google’s own SoC.
Thank you for the article.
However there are better sources to learn more about TPUs. Especially modern ones, and how they are actually benefiting large scale ML systems.
I would highly recommend this particular one to start with:
https://www.semianalysis.com/p/google-ai-infrastructure-supremacy
Which comes from the same source as the infamous “we have no moat” leak host:
https://www.semianalysis.com/p/google-we-have-no-moat-and-neither