Inception Unveils a New Type of AI Model After Stealth Development

Inception is a new company based in Palo Alto, founded by Stefano Ermon, a computer science professor at Stanford. The company has created a unique artificial intelligence (AI) model that uses a technology called “diffusion.” They refer to their model as a diffusion-based large language model, or DLM for short.
Currently, most of the popular AI models can be grouped into two main categories: large language models (LLMs) and diffusion models. LLMs, which rely on a technology known as the transformer architecture, are primarily used to generate text. On the other hand, diffusion models are mainly used for creating images, videos, and audio, like those seen in platforms such as Midjourney and OpenAI’s Sora.
Inception claims that its model combines the advantages of traditional LLMs, such as generating code and answer questions, but it does this more quickly and at a lower cost. Ermon explained to TechCrunch that he has been interested in finding ways to apply diffusion models to text data for a long time. His research suggested that traditional LLMs are often sluggish compared to the diffusion technology.
With LLMs, there is a step-by-step process where you must wait for the first word to be generated before moving on to the next. Ermon pointed out that this sequential method restricts speed. He believed that using a diffusion model could allow for more efficient text generation since diffusion models start with a rough draft of the information and then refine it all at the same time.
After years of experimentation, Ermon and one of his students made a significant discovery, which they reported in a research paper published last year. Recognizing the potential of what they had developed, Ermon founded Inception last summer, bringing along two former students—Aditya Grover from UCLA and Volodymyr Kuleshov from Cornell—to help lead the company.
Although Ermon didn’t reveal specific details about Inception’s funding, it’s been reported that the Mayfield Fund has invested in the startup. Inception has already attracted several clients, including unnamed Fortune 100 companies, by addressing their need for faster AI solutions and reduced waiting times.
Ermon said, “Our models can make better use of GPUs,” referring to the computer chips that are commonly used to run AI models. He believes this innovation could transform how language models are created. Inception provides various options for using its technology, including through an application programming interface (API), on-premises deployments, and support for fine-tuning its models.
The company claims that its DLMs can operate up to ten times faster than traditional LLMs while also being ten times cheaper. One representative from Inception noted that their small coding model performs on par with OpenAI’s GPT-4o mini but is over ten times quicker. Additionally, their “mini” model beats various small open-source models, achieving speeds of more than 1,000 tokens per second.
In this context, “tokens” refer to pieces of data. Achieving a speed of 1,000 tokens per second is quite impressive, provided that Inception’s claims are accurate. This new approach in AI technology shows promise and could potentially reshape the landscape of how language models are developed and utilized.