A week has passed since the first developer-focused OpenAI conference, and the initial dust has settled. I’m doing a keynote week-after, sharing my impressions and thinking about what OpenAI has announced.
GPT-4 Turbo
GPT-3.5 was a midstep between a line of fascinating research (GPT series of models) and a sophisticated product. Yes, ChatGPT took off on the back of GPT-3.5, but those using it for tasks beyond simple poem writing or crude brainstorming often found its reasoning lacking. The release of GPT-4 marked a milestone where we could start offloading serious work to AI.
GPT-4 was available to developers via API but was expensive, slow, and had a limited context window. The utility of LLMs is directly tied to how much context you can give them, especially when you move beyond a simple Q&A.
The GPT-4 Turbo announcement made progress in all these areas and more:
- Context length increased from 4k to a whopping 128k tokens
- You can ask for JSON output directly, which is how you tie AI into existing software
- Knowledge cutoff is updated from Sept 2021 to Apr 2023
- Much faster inference: early testing is showing 3x to 5x compared to GPT-4
- GPT-4 Turbo is roughly 2.5x cheaper than GPT-4
Current LLMs over long context windows tend to exhibit a U-shaped recall pattern: they pay more attention to the beginning and end of the context window, often neglecting the content in the middle. This defeats the value of an extended context window. It’s unclear if OpenAI found ways to alleviate that issue, but we’ll know as evaluations start rolling in. The initial testing yields mixed results.
The reduced pricing is welcome, but a single call to GPT-4 Turbo with all 128k tokens stuffed costs $1.1. I can’t think of an interactive application where GPT-4 Turbo price is still not an order of magnitude too expensive. Ditto for application development, where you run your LLM calls hundreds of times a day.
GPT-4 Vision
Another exciting announcement was that GPT-4 with Vision capabilities finally arrived. A few alpha testers had access to it for months, but I can imagine the infrastructure buildup required to open up the access was very substantial, hence the delay.
I don’t have any direct experience with GPT-4V, but the early Twitter commentary is very positive. The ML/AI graveyard is filled with prototypes that demo really well and disappoint with hit-and-miss results under closer scrutiny. More from first principles, GPT-4V is consequential for three reasons:
- Multi-modal data enhances the model’s reasoning. The working hypothesis is that multi-modal data forces the neuro net to form a more sophisticated “world model.” I’m guessing the leap from GPT-3.5 and GPT-4 can be more attributed to the variety of modalities beyond scraped text than just training on more data. So, the existence of GPT-4V is important even if you only work with text.
- OpenAI will get exposed to lots of vision data. It might not use it directly for training the next iteration of models (especially if the data came via API calls), but it will definitely instrument inference and identify the gaps. Filling in these gaps will lead to a better “world model” and better reasoning.
- And yes, vision lets you build “full-stack” vision products where Dalle3 is the artist, and GPT-4V is the critic who mentors the artist.
GPTs as Apps
The big announcement was of “GPTs,” which Sam Altman described:
GPTs are tailored version of ChatGPT for a specific purpose. You can build a GPT — a customized version of ChatGPT — for almost anything, with instructions, expanded knowledge, and actions, and then you can publish it for others to use. And because they combine instructions, expanded knowledge, and actions, they can be more helpful to you. They can work better in many contexts, and they can give you better control. They’ll make it easier for you accomplish all sorts of tasks or just have more fun, and you’ll be able to use them right within ChatGPT. You can, in effect, program a GPT, with language, just by talking to it. It’s easy to customize the behavior so that it fits what you want. This makes building them very accessible, and it gives agency to everyone.
That triggered a widespread belief that a large number of startups building on top of OpenAI are getting obliterated. GPT wrappers, as these startups are pejoratively called, are done as OpenAI is slowly folding in the functionality into the main ChatGPT product.
I tend to disagree and think of GPTs as a prototyping and demo platform. OpenAI’s main priority for the moment is increased experimentation. They want to get developers excited and lower the activation cost of trying out ideas. Heck, even non-developers can now stuff, e.g., a book into a custom GPT, and build an app for their kid to chat about the book and generate personalized illustrations. The possibilities seem endless.
Security? What security
We’re not witnessing an App Store moment just yet. One of the big innovations of the App Store model was increased security. It’s not something people would name as a driver for using smartphones, but if smartphones were plagued with viruses that siphon your personal data, smartphone adoption would collapse. Similarly for application source code: imagine you could just look and copy the source code of any app on the iPhone. How many developers would be left on the platform?
But that’s essentially a reality of the LLM world: no part of a custom GPT is protected. You can exfiltrate the whole knowledge base by simple prompting. The same goes for your custom GPT prompts. There’s no protection against clones, and it’s a rather fundamental problem of LLMs having no security of that kind, so it won’t get fixed with an incremental update.
The demo store
Another issue is that GPTs give you little control over knowledge retrieval. As I mentioned before, with LLMs, the right context is often the difference between delivering a satisfying product experience or a bunch of gibberish. Private datasets and stuffing them into LLMs the right way is where a lot of value creation will happen, but both are domain-specific.
OpenAI has already tried building an App Store with ChatGPT Plugins. They didn’t achieve product-market fit for several reasons. GPTs are plugins 2.0, and they improve on the most glaring issue with Plugins: discoverability and a confusing UI. However, I don’t think GPTs is OpenAI’s App Store moment. Yes, there will be equivalents of flashlight apps on the early iPhone, off which people will make good money. I don’t think GPTs will offer enough depth due to limitations in knowledge retrevial and security to become a viable platform like App Store.
Still, I applaud them for trying. The prototyping platform holds enormous value; it will inspire developers to build on top of OpenAI’s raw API, and fill in the gaps discovered at the prototyping phase with GPTs. OpenAI wins either way.
Shipping
Having said all all this, it’s amazing to see the pace at which OpenAI is shipping updates. To metamorphose from essentially a research lab 12 months ago into a fast-moving product organization firing on all cylinders is a remarkable achievement. And I hope in the next round of updates, we’ll see further cost gains, which I think are the largest impediment for serious, world-altering deployment. The pressure from Open Source models that are 12-18 months behind but catching up quickly, will help.
As for “GPT wrappers”, I find this to be a midwit of a term. Every startup started off as some kind of wrapper. Airbnb, Shopify, etc. were “SQL db wrappers”. Amazon was originally a drop-shipper. The GPT wrapper have the job as it always has been: shipping something people want and increasing product’s depth over time.