Inception, a new Palo Alto-based company founded by Stanford computer science professor Stefano Ermon, has introduced a groundbreaking AI model called a diffusion-based large language model (DLM). This model combines the capabilities of traditional large language models (LLMs) with the power of diffusion models, offering faster performance and reduced computing costs.
While LLMs are commonly used for text generation, diffusion models are primarily used for creating images, video, and audio. Inception’s DLM bridges the gap by providing code generation and question-answering capabilities similar to LLMs, but with the speed and efficiency of diffusion technology.
Ermon’s research at Stanford focused on applying diffusion models to text, as he believed that LLMs were relatively slow compared to diffusion technology. Unlike LLMs, which generate text sequentially, diffusion models start with a rough estimate of the data and then bring it into focus all at once. This parallel approach allows for the generation and modification of large blocks of text simultaneously.
Inception’s breakthrough in parallel text generation led to the founding of the company, with UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov joining as co-leaders. While the company’s funding details remain undisclosed, it has already attracted several customers, including Fortune 100 companies, who value the reduced AI latency and increased speed offered by Inception’s DLMs.
The company’s models leverage GPUs more efficiently, resulting in significant performance improvements. Inception offers an API, as well as on-premises and edge device deployment options, model fine-tuning support, and a range of pre-built DLMs for various use cases. According to Inception, their DLMs can run up to 10 times faster than traditional LLMs while costing 10 times less.
In fact, Inception’s “small” coding model rivals OpenAI’s GPT-4o mini in performance while being more than 10 times faster. Their “mini” model outperforms open-source models like Meta’s Llama 3.1 8B, achieving over 1,000 tokens per second. This impressive speed demonstrates the potential impact of Inception’s DLMs on the industry.
In conclusion, Inception’s diffusion-based large language models offer a unique and efficient approach to text generation. With their faster performance and reduced computing costs, Inception is poised to revolutionize the way language models are built and utilized.