Nearly a dozen processor cores for accelerating machine-learning jobs on clients are racing for spots in SoCs, with some already designed into smartphones. They aim to get a time-to-market advantage over processor-IP giant Arm that is expected to announce its own soon.
The competition shows that much of the action in machine-learning silicon is shifting to low-power client blocks, according to market watcher Linley Gwennap. However, a race among high-performance chips for the data center is still in its early stages, he told EE Times in a preview of his April 11 keynote for the Linley Processor Conference.
“Arm has dominated the IP landscape for CPUs and taken over for GPUs as well, but this AI engine creates a whole new market for cores, and other companies are getting a head start,” said Gwennap.
The new players getting traction include:
Apple’s Bionic neural engine in the A11 SoC in its iPhone
The DeePhi block in Samsung’s Exynos 9810 in the Galaxy S9
The neural engine from China’s Cambricon in Huawei’s Kirin 970 handset
The Cadence P5 for vision and AI acceleration in MediaTek’s P30 SoC
Possible use of the Movidius accelerator in Intel’s future PC chip sets
The existing design wins have locked up many of the sockets in premium smartphones that represent about a third of the overall handset market. Gwennap expects that AI acceleration will filter down to the rest of the handset market over the next two to three years.
Beyond smartphones, cars are an increasingly large market for AI chips. PCs, tablets, and IoT devices will round out the market.
To keep pace, Arm announced in February a blanket effort that it calls Project Trillium. But “what they need to be competitive is some specific hardware accelerator to optimize power efficiency,” said Gwennap.
“Arm is developing that kind of accelerator and plans to release its first product this summer...The fact is that they are behind, which has created an opportunity for the newer companies to jump in.”
Last October, Arm announced it had formed a machine-learning group. In February, it provided a few details of its plans.
Arm is likely to provide product details at its annual October eventin Silicon Valley. But there’s no guarantee that Arm will make up lost ground because there’s not necessarily a close tie between neural net engines and CPUs.
Ultimately, the winning chips in this still-new battle will be the ones with the best combination of performance, power, and die area.
“The problem is that we see the raw performance, but it really comes down to delivered performance on neural networks, so what we need is a good benchmark like the number of images classified per second,” said Gwennap.
Baidu was early to release AI benchmarks as open-source, but they have not been widely adopted. The Transaction Processing Council formed a work group late last year to attack the problem, but it has yet to report any progress.
“It’s easy coming up with benchmark, but hard to get companies to agree and compare results … and things are changing, so any benchmark will have to evolve to stay relevant,” he said.
So far, Gwennap reports that the multi-core v-MP6000 of Videantishas a slight edge in raw performance over its closest rival, Ceva’s NeuPro, which combines a SIMD DSP with systolic MAC array.
Other players include Synopsys with its EV64, combining a SIMD DSP with custom logic for activation and pooling. Like Videantis, AImotive’s AIware uses many custom hardware blocks.
Among low-cost blocks, VeriSilicon’s VIP8000-O delivers the most raw performance using a GPU with up to eight deep-learning engines. Ironically, Cambricon’s CPU with a small matrix engine offers the lowest performance of announced chips, but it still got a significant design win in the Huawei smartphone.
Imagination is also a player with its PowerVR 2NX, a custom, non-GPU architecture with a MAC array. Nvidia hopes to act as a spoiler, making the IP for the NVDLA core in its Xavier processor free and open-source and winning support from Arm.
Overall, Gwennap said that as many as 40 companies are now designing customer AI silicon. Many target the data center, where Nvidia’s Volta GPU currently goes largely unchallenged as the training engine of choice by giants including Amazon.
“The competitors we see now are Google’s TPU and Microsoft’sFPGA-based Brainwave that is being deployed widely, but there’s not a lot of merchant alternatives now,” said Gwennap.
“Wave Computing seems to be ahead of the pack in bringing a new AI data center architecture to production this year.”
Wave’s decision to sell full systems suggests that it is targeting second- and third-tier players, not the largest data centers that prefer making their own optimized boxes.
Intel’s Nervana recently made clear that it will not have production silicon until 2019. Startup Graphcore suggested that it will announce its chip later this year. Another startup, Cerebrus, remains quiet, while bitcoin ASIC maker BitMain announced plans late last year for an AI chip for data centers.
“There’s a ton of companies working on this kind of stuff,” said Gwennap. “People see this as the next gold rush, and they are all trying to jump in.”
在线留言询价
型号 | 品牌 | 询价 |
---|---|---|
BD71847AMWV-E2 | ROHM Semiconductor | |
RB751G-40T2R | ROHM Semiconductor | |
MC33074DR2G | onsemi | |
CDZVT2R20B | ROHM Semiconductor | |
TL431ACLPR | Texas Instruments |
型号 | 品牌 | 抢购 |
---|---|---|
STM32F429IGT6 | STMicroelectronics | |
TPS63050YFFR | Texas Instruments | |
BP3621 | ROHM Semiconductor | |
ESR03EZPJ151 | ROHM Semiconductor | |
IPZ40N04S5L4R8ATMA1 | Infineon Technologies | |
BU33JA2MNVX-CTL | ROHM Semiconductor |
AMEYA360公众号二维码
识别二维码,即可关注