Next-level AI engine comes top in LLM speed showdown

Responses to AI chat prompts not snappy enough? California-based generative AI company Groq has a super quick solution in its LPU Inference Engine, which has recently outperformed all contenders in public benchmarks.Groq has developed a new type of chip to overcome compute density and memory bandwidth issues and boost processing speeds of intensive computing applications like Large Language Models (LLM), reducing “the amount of time per word calculated, allowing sequences of text to be generated much faster.”This Language Processing Unit is an integral part of the company’s inference engine, which processes information and provides answers to queries from an end user, serving up as many tokens (or words) as possible for super quick responses.


Late last year, inhouse testing “set a new performance bar” by achieving more than 300 tokens per second per user through the Llama-2 (70B) LLM from Meta AI. In January 2024, the company took part in is first public benchmarking – leaving all other cloud-based inference providers in its performance wake. Now it’s emerged victorious against the top eight cloud providers in independent tests.

[Read More…]

Add a Comment

Your email address will not be published. Required fields are marked *

Skip to content