AI Models are Converging
This issue examines new research around how large AI models are converging, and implications for investors.
01 | AI models are converging
I recently came across a fascinating paper on AI model convergence from MIT researchers: The Platonic Representation Hypothesis.
The crux: large, generalized language and vision models, the backbone of a growing number of AI-power applications, are converging on the same statistical model of reality.[1] This happens even when models are trained with different objectives, on different data sets (language vs images), and with different architectures and training methods.
In other words, these models aren't just completing tasks with similar proficiency; they're interpreting the underlying structures, patterns, and relationships of data in the same way.
What's even more interesting is that model size [2] has a positive correlation with convergence – the larger these models become, the more they "think" the same way.
At first, this seemed counterintuitive. Wouldn’t we expect greater differentiation and specialization as models with different training methods and architectures scale up?
But on deeper reflection, it makes sense. The paper explains that the tasks human brains perform to understand the world—breaking down information, detecting patterns, and classifying objects—are the same tasks we train neural networks to do. So, it's logical that these models would start to mirror the ways our brains process information, even when they’re built in fundamentally different ways.
Two thousand years ago, Plato envisioned a universal truth underlying our human perceptions. Today, AI model convergence suggests that these technologies are uncovering universal patterns in our data.
02 | Implications for investing
I'm thinking about these findings in two ways:
1. Standardization in model performance. As general-purpose AI models scale, they will become more proficient (fewer hallucinations,[3] higher accuracy), but we might expect less differentiation between models. Early-stage investors should focus on the unique parts of the stack—e.g., applications that apply standardized models to proprietary data sets and specialized tasks, dev tools that reduce the barriers to building on top of models and can expand the addressable market by attracting new or non-technical users, devices and new UX that offer a distinct form factor. Developers should prioritize building on models that (a) offer the best cost structure for their specific needs and (b) are supported by a strong ecosystem of tools and a robust network of developers. Model proficiency – in and of itself – may become a less differentiating metric.
2. Growing model interoperability. The paper finds that as AI models start to process and represent information in similar ways, it will become easier to integrate and stitch [4] them together. This interoperability allows for more modular AI systems, where several different models can be easily combined and ripped-and-replaced without extensive reconfiguration or additional training. Startups like Kindo, OpenPipe, and Flowise are making this easy, automated, and secure for both technical and non-technical users.
The future of AI lies not just in the power of individual models, but in the seamless integration and specialized application of these technologies. Model convergence makes this even more imperative.
[1] In this paper, researchers define model convergence as the degree to which vector embeddings from different models match one another based on the same prompt. The more the vectors match, the more the models align in the ways they understand, interpret, and represent data. Vector embeddings are mathematical representations that transform words, images, or other data into numerical vectors, capturing their meanings and relationships, and allowing that data to be processed and understood by computers. Each dimension captures some aspect of the data point's properties or its relationship to other data points. For example, if an LLM is tasked with evaluating the sentiment of a movie review, words are converted into vectors, processed through the model, and transformed into an output vector indicating sentiment. A movie review like "I loved the movie" is transformed into a sequence of vectors. Let's say "I" is represented as [0.1, 0.2], "loved" as [0.8, 0.7], "the" as [0.2, 0.1], and "movie" as [0.9, 0.8]. These vectors are input into the AI model. Words like “happy” and “joyful” might have similar vectors because they share similar meanings. The model processes these numerical representations through layers of neurons, where each layer transforms the vectors into new representations by combining and modifying them. For example, it might combine the vectors for "loved" and "movie" to create a new vector that emphasizes the sentiment of the phrase. The final layer of the model might output a single vector that represents the overall sentiment of the review. Researchers measure similarity in model performance by comparing these output vectors. If different models produce similar vectors for the same input, they are understanding and representing the data similarly. For instance, it could produce a vector like [0.9] to indicate a high probability that the review is positive. If we have two different models processing the same review, we can compare their output vectors. If both models produce similar vectors, such as [0.9] and [0.85], it means they both think the review is positive and are representing the sentiment in a similar way.
[2] Model size is defined as the number of parameters within the model. Larger models have more parameters, which means they have more capacity to learn and represent complex patterns in the data. These parameters are the weights and biases that are adjusted during training to optimize the model’s performance on a given task.
[3] A hallucination refers to the generation of information that appears plausible but is actually incorrect or fabricated. This can include: false facts (the model generates information that is not true or has no basis in reality), inconsistent information (the model provides contradictory output), or misleading details (the model includes specific details that are not accurate or verifiable). Hallucinations occur because the model relies on patterns in the training data rather than understanding or verifying facts. As a result, the output might be linguistically coherent and contextually appropriate but factually incorrect.
[4] Model stitching is the process of combining parts of different AI models to create a single, more powerful system. A modular AI system is designed with separate, interchangeable components or modules, each responsible for specific tasks or functions.
Comments