Is Gemini 2.0 Flash production-ready (as of February 2025)?

Yes. After roughly two months in preview, 2.0 Flash became “fully available for production use with higher rate limits.” As Logan tells it, developers spent that window saying “we love two point zero Flash … let us use it in production,” and the team kept replying “Please, wait” while they put the finishing touches on it before the production rollout.

What is the difference between Gemini 2.0 Flash-Lite and 2.0 Pro?

Flash-Lite is the cost play: it holds “the same price point that we had with 1.5 flash” to “remove the economic burden” for developers shipping AI to users. Pro is the frontier play — a continuation of December’s Gemini 1206 model, “one of the highest ranked models for coding use cases,” built to “push the frontier of coding and agents.” For most workloads, Logan says 2.0 Flash is great and fully multimodal; reach for Pro on harder coding and agentic tasks.

How does Gemini 2.0 handle function calling and tools?

Function calling is “the core enabler of a lot of these agentic workflows,” and 2.0 is trained to natively know when to invoke first-party tools like search and code execution — fixing LLM gaps such as stale world knowledge and “embarrassingly bad mistakes” on calculator-solvable problems. New in 2.0, compositional function calling (available in the API) lets you “describe the sequence and the chain” of functions, because call order “actually matters a lot.” There is “no weird pricing story” — at scale “you just pay for the tokens that are created.”

How does Gemini support long context, and why do TPUs matter?

Long context runs “all the way up to 2,000,000 tokens with the Pro series.” Logan says research breakthroughs enabled it on the algorithmic side, but “the only reason we’re able to put long context into production … at the scale that it is, is because of TPUs” — it is “just as much an algorithmic breakthrough story as it is an infrastructure breakthrough story.” Owning the silicon is also why Google can pass cost savings on to developers.

Why do so many AI products fail in production?

Logan estimates that of 10 teams putting AI in front of users, “at least five out of 10 … would not have a good eval story” — they do not even know what success looks like for the AI they shipped. The “magic silver bullet” framing breaks down because real deployments need infrastructure that often did not exist: eval infrastructure and ops dashboarding for things like A/B testing prompts. It takes work, and the foundation has to be built underneath the product.

Episodes · S2 E12 ← Prev Next →

The Making of Gemini 2.0: DeepMind's Approach to AI Development and Deployment | Logan Kilpatrick

Feb 12, 2025 · Logan Kilpatrick , Google DeepMind · 41 min

AI Evaluation & Reliability AI Hardware Enterprise AI Multimodal AI

Listen on any app

Key takeaways

Flash-Lite holds the line on price; Pro pushes the frontier. Google kept Flash-Lite at “the same price point that we had with 1.5 flash” specifically to “remove the economic burden” for developers putting AI in front of users — “if cost is the barrier for you to build with AI, like, let’s not have that continue to be the case.” The 2.0 Pro model is a continuation of the December Gemini 1206 model — “one of the highest ranked models for coding use cases” — built to “push the frontier of coding and agents.”
Function calling is the foundation of the agentic era. Logan calls tool use “the core enabler of a lot of these agentic workflows” and says the team is “screaming it from the mountaintops internally” that the model has to be great at it. Gemini 2.0 was trained to natively know when to invoke first-party tools — search and code execution — to fix two core LLM limits: no access to “updated world knowledge,” and the “embarrassingly bad mistakes” on problems “you can actually solve with a calculator or by running a little bit of code.”
No weird pricing, and compositional function calling ships with 2.0. Search and code execution are free to try, and at scale “you just pay for the tokens that are created” — “there’s, like, no weird pricing story.” Compositional function calling, new in 2.0 and available in the API, lets developers “describe the sequence and the chain” of functions, because the order in which tools are called “actually matters a lot.”
Multimodal AI is barely in production — and that is the opportunity. “Basically multimodal AI is not in production at this point.” As a former ML engineer, Logan recalls that solving one domain-specific computer-vision problem took “on the order of nine to twelve months in a successful case.” Now segmentation, bounding boxes, and object detection “just work” out of the box, so the founders who were blocked because they lacked “a bunch of, like, ML PhDs” can finally ship vision use cases.
Long context is an infrastructure story as much as an algorithmic one. Gemini works “all the way up to 2,000,000 tokens with the Pro series” — and Logan stresses it is “just as much an algorithmic breakthrough story as it is an infrastructure breakthrough story,” only viable at scale “because of TPUs.” He frames Gemini’s first-mover list: first to ship native search, first to ship caching, first with a native multimodal LLM that could take in video, images, and audio.
Agents will drive a 100-1000x jump in inference compute — which only pencils out at Flash-tier cost. Logan expects “100 to 1000x more inference compute” as agents, not humans, become the ones generating tokens. That future “only … ends up being possible … with models that are at the cost of the Gemini two point zero flash models,” because otherwise the products get too expensive to reach most people.

Frequently asked questions

Is Gemini 2.0 Flash production-ready (as of February 2025)?: Yes. After roughly two months in preview, 2.0 Flash became “fully available for production use with higher rate limits.” As Logan tells it, developers spent that window saying “we love two point zero Flash … let us use it in production,” and the team kept replying “Please, wait” while they put the finishing touches on it before the production rollout.
What is the difference between Gemini 2.0 Flash-Lite and 2.0 Pro?: Flash-Lite is the cost play: it holds “the same price point that we had with 1.5 flash” to “remove the economic burden” for developers shipping AI to users. Pro is the frontier play — a continuation of December’s Gemini 1206 model, “one of the highest ranked models for coding use cases,” built to “push the frontier of coding and agents.” For most workloads, Logan says 2.0 Flash is great and fully multimodal; reach for Pro on harder coding and agentic tasks.
How does Gemini 2.0 handle function calling and tools?: Function calling is “the core enabler of a lot of these agentic workflows,” and 2.0 is trained to natively know when to invoke first-party tools like search and code execution — fixing LLM gaps such as stale world knowledge and “embarrassingly bad mistakes” on calculator-solvable problems. New in 2.0, compositional function calling (available in the API) lets you “describe the sequence and the chain” of functions, because call order “actually matters a lot.” There is “no weird pricing story” — at scale “you just pay for the tokens that are created.”
How does Gemini support long context, and why do TPUs matter?: Long context runs “all the way up to 2,000,000 tokens with the Pro series.” Logan says research breakthroughs enabled it on the algorithmic side, but “the only reason we’re able to put long context into production … at the scale that it is, is because of TPUs” — it is “just as much an algorithmic breakthrough story as it is an infrastructure breakthrough story.” Owning the silicon is also why Google can pass cost savings on to developers.
Why do so many AI products fail in production?: Logan estimates that of 10 teams putting AI in front of users, “at least five out of 10 … would not have a good eval story” — they do not even know what success looks like for the AI they shipped. The “magic silver bullet” framing breaks down because real deployments need infrastructure that often did not exist: eval infrastructure and ops dashboarding for things like A/B testing prompts. It takes work, and the foundation has to be built underneath the product.

Concepts in this episode

AI terms discussed here — each links to a plain-language definition.

Tool Use (Function Calling)AI Agent Tokenization Agentic Workflow Multimodal AI Inference Reasoning Models AI Benchmark Artificial General Intelligence (AGI)Context Window

Chapters

03:49Gemini 2.0 Updates and Developer Highlights
06:08Agentic Use Cases and Function Calling
11:29Multimodal Capabilities
16:15Putting AI in Production
21:06Gemini's Differentiation and Hardware
31:22Future Vision for Gemini and G Suite Integration
35:23Gemini for Developers
39:02Conclusion and Farewell

Show notes

Google’s strength in AI has often seemed to get lost in the midst of OpenAI announcements or DeepSeek fervor - yet Gemini 2.0 is more than good for many tasks; it’s the model to beat - and we have the research to back it up.

This week, Logan Kilpatrick, senior product manager at Google DeepMind, joins us to discuss Gemini’s creation story, its emergence as the premiere model in the AI race, and why the launch of Gemini 2.0 is great news for developers.

During the conversation Conor and Logan explore the exciting world of multimodal AI, Gemini's strengths in agentic use cases, and its unique approach to function calling, compositional function calling, and the seamless integration of tools like search and code execution.

They also chat about Logan’s vision for a future where AI interacts with the world more naturally, offering a view of the potential of vision-first AI agents, and why Google's hardware advantage is enabling Gemini's impressive performance and long context capabilities.

Follow along with the discussion using Galileo’s AI Agent Leaderboard:https://huggingface.co/spaces/galileo-ai/agent-leaderboard

Chapters:00:00 DeepMind's Role in Gemini's Development

Follow Logan

Twitter:@OfficialLoganK

LinkedIn:https://www.linkedin.com/in/logankilpatrick/

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

Show Notes

Try Gemini for yourself:gemini.google.com

Gemini for Developers:aistudio.google.com

Check out Galileo

⁠⁠Try Galileo⁠⁠

Transcript

103 segments

Logan Kilpatrick 0:00 The line between research and product has never been more blurry. Like, if you're someone who's a researcher, like, you can really transition to, like, building products and, like, creating a lot of value and vice versa. So it was just this, like, really interesting time that we're that we're living in.

Conor Bronsdon 0:24 Welcome to the Chain of Thought podcast. I'm your host, Conor Bronson. And today, we're joined by Logan Kilpatrick, senior product manager at Google DeepMind. You may know him from his popular LinkedIn or Twitter accounts as well where he opines on AI.

Logan Kilpatrick 0:37 Logan, welcome to the show. I love that. That's that's a me opining on AI is probably the most apt description of of what I do online, and, I appreciate that.

Conor Bronsdon 0:47 I think it's a perfect framing for this podcast too, because that's what we're doing here. And we're really excited to talk all things Gemini with you. Candidly, our producer is a big fan, as am I, because often it seems like Gemini isn't talked about in the way OpenAI's models often are. It seems that anytime the news latches onto a story about AI or the future of AI,

Conor Bronsdon 1:08 the go to is to run a story on ChatGPT with a baked in presumption that ChatGPT is the clear AI leader even when it's not born out in the data. Gemini is incredibly performant and efficient. In fact, we just released Galileo's new agent leaderboard evaluating 17 different AI models for their effectiveness and usage within AI agents, and Gemini models took first, third, and fourth in our analysis.

Conor Bronsdon 1:32 But before we dive into all the numbers, let's back up a bit. I'm curious about the role DeepMind played in Gemini's development. Let's talk about the makings of Gemini. What were the core motivations and goals behind the development of Gemini?

Logan Kilpatrick 1:45 Yeah. This is a great question. I think the like, at the foundation the foundational level, like, Google exists to organize the world's information and make it universally accessible. And if you think about the sort of arc of of Google search and and a lot of that work that Google's been doing, like, it's funny. Like, there's there's some amount of narrative around, like, Google being late to the AI game. If if you actually, like, sit around and and look at the product services that we've had for the last ten years,

Logan Kilpatrick 2:12 transformers and LLMs and AI has been, like, a deep part of the entire product experience that Google has brought to the world, like, in in sort of the most prominent product experience, which is search. And I think the the teams across Google have learned a ton about this. And historically, like, GDM was doing Google DeepMind was doing a bunch of sort of foundational

Logan Kilpatrick 2:31 research into building AGI, doing all these collaborations with Google Brain, etcetera, to Google Research. And I think as sort of the LLM moment started to happen, all those teams came together and said, let's join forces and really sort of build an organization inside of Google that can build the world's best models, deliver them to the world, and then also build product experiences

Logan Kilpatrick 2:52 around the models. And that's been the the product experience piece has been the, like, latest addition to this story, but makes perfect sense. The Gemini app moving over from, from being sort of part of, the search organization. And then our product service area, Google AI Studio and the and the Gemini Developer API, also moving over to DeepMind to sort of close in this collaboration.

Logan Kilpatrick 3:14 I think it's the same sort of goals that we've always had, which is, like, how do we put the best models in the hands of of the world? And I think there's actually a lot of the I was having a conversation with one of the Google Cloud folks in leadership who was previously in academia, and he mentioned something to me which has continued to resonate, which is the line between

Logan Kilpatrick 3:35 research and product has never been more blurry. Like, if you're someone who's a researcher, like, you can really transition to, like, building products and, like, creating a lot of value and vice versa. So it was just this, like, really interesting time that we're living in. Absolutely.

Conor Bronsdon 3:49 And it's particularly an interesting time, not just on the broad front, but on the specific, as I know last week Gemini announced major updates, including additional availability of Gemini two point zero Flash, two point zero Flashlight, and Gemini two point zero Pro to more developers and production uses. Obviously, this is exciting time. Is there anything you wanna highlight around

Logan Kilpatrick 4:12 these new public availability of these models? For the last two months, developers have been saying, Hey, we love two point zero Flash. This model is awesome. Let us use it in production. I think we've been saying, Please, wait. Give us a little bit of time to sort of put the finishing touches on it. And I think we did that work now, the model is now fully available for production use with higher rate limits. We also updated some other stuff on the API side.

Logan Kilpatrick 4:33 And Flashlight is really a continuation of this sort of story that we've been building with developers, which is trying to remove the economic burden for people who want to put AI into production or in front of their users. And so we kept the same price point that we had with 1.5 flash, with two point zero flashlight, and made it so that you could continue to, like, if cost is the barrier for you to build with AI, like, let's not have that continue to be the case and keep the prices low. And then on the pro side, it's how do we push the frontier? How do we push the frontier of coding and agents and a bunch of the new use cases which people care a lot about? And I think those two are really where you start to see the value of the pro model. I think for a lot of, for like most stuff, you know, two point o flash is gonna be really great. It's fully multimodal,

Logan Kilpatrick 5:19 etcetera, etcetera. But for coding use cases, the the pro new pro model is a continuation of the Gemini 12 o six model that we released back in December, which developers have been loving and was one of the highest ranked, models for for coding use cases.

Conor Bronsdon 5:34 I love that you're highlighting these differentations, because in particular, we've been really impressed by how Gemini has performed around agentic use cases. We actually released, today, it should be out as of this episode, the results of Galileo's AI agent leaderboard that I alluded to at the start of the episode. And Gemini stole the show, with two point o Flash topping the charts and performance at a fraction of the cost of other models was the performance champion

Conor Bronsdon 5:59 with 93% score at just around 7¢ per million tokens, and excelled in both complex tasks and in our safety features metrics. It's really impressive, and I'm curious to get your perspective on how Gemini is set up for these agentic use cases.

Logan Kilpatrick 6:16 Yeah. This is the core part of the two point zero story. So I think the original one point zero Gemini story was, let's build this model to be natively multimodal. The chapter two of the Gemini story is let's build this model to keep doing all the things it does now and do them well, but also build for this agentic era. And I think we're sort of just getting to the point now where the models are good enough, the infrastructure is good enough, there's enough knowledge,

Logan Kilpatrick 6:41 sort of shared among developers and founders and builders about, like, how to actually build agents where it's the right place at the right time to go and do this. And, specifically, there's a couple of different fronts for agent stuff. Like, one fundamental tool use, I think, is continues to be really important. Chirasta, who's one of my coworkers, drives a bunch of our function calling work streams. And like, this is like top of mind. We're screaming it from the mountaintops internally

Logan Kilpatrick 7:05 that like the model needs to be incredibly good at Fungi calling. That is the core enabler of a lot of these agentic workflows. But we've gone, like, a couple of steps beyond that, training the model to sort of first in in a first class way, know when to call specific first party tools right now. And and the two examples of this are search and code execution. So you can start to think about, like, you know, what are some of the fundamental limitations of LLMs?

Logan Kilpatrick 7:31 You know, they don't have access to updated world knowledge, and there's a lot of things which they make embarrassingly bad mistakes. A lot of them are related to, like, simple problems that, like, you can actually solve with a calculator or by running a little bit of code. And that's where code execution and search come in. And they're, like, really foundational pieces of, like, what you would want this intelligent system to be able to do for you. And the model has now been trained to know, like, hey, this is the type of question which if the developer wants to make it available, they have the option to make some of these, like, first party tools available. And we sort of take care of all the scaffolding work and the abstraction work, and it just kind of, you know, works out of the box for you. And for for search, like, you can use it. You can play around with it for free. It's available for developers. You don't have to sign up for anything paid. And then if you wanna scale to production, you can you can do so. Same thing with code execution. There's, like, no weird pricing story. You don't have to, like, pay per session or anything like that. It's like you just pay for the tokens that are created as you're doing this code execution process. So it should be super simple for folks to to get started and and use both of those.

Conor Bronsdon 8:34 There are so many interesting threads we can pull on from what you just mentioned. But I wanna mention one in particular, which is that function support piece and actually making it easy for developers to use. To call back to the data I mentioned, you know, we looked at DeepSeek too, which obviously has kind of taken the world by storm here the last several weeks, and it has really limited function support. It's not as effective for these kind of agentic use cases. You can't really call tools with it. And this is where I see the effort of Gemini and of the Google team to really be specific and to set up developers

Conor Bronsdon 9:07 for everything they wanna do instead of adding these extra steps that you need to take in order to leverage these other reasoning models. Can you discuss some of the specific design choices and maybe technical innovations that went into the creation of Gemini as it's obviously made it so effective for developers to leverage?

Logan Kilpatrick 9:25 Yeah. I I think a lot of this is maybe even one level abstracted away from, like, the design decisions that the research team has made. I think it comes back to, like, the fundamental goal for the research teams. And I think, like, this was I didn't know this until I sort of came to Google and met with a lot of the folks on our research team. But like the fundamental goal in their mind is like, hey, we're not doing this for a chat application.

Logan Kilpatrick 9:50 We're not doing this to make Google search really great. Like, we really are trying to build this sort of generic, very broad sense of capabilities that enables everyone to be able to use this. And like developers are actually this like perfect representation of that. You know, if you're in your internal company, you know, Google's a big company and there's lots of different product services. But even at Google, like our internal

Logan Kilpatrick 10:10 product areas don't cover all the different things that external developers might be doing. And I think like that mentality trickles down into all of the design decisions that happen. And I think like foundational function calling continues to be this like critical pillar of support that we're pushing on. I think one of the good examples of this is compositional function calling. So this was part of the feedback

Logan Kilpatrick 10:35 from developers is, hey, you know, as the model is using tools, pretty simply, like the order in which the tools are called actually matters a lot. And compositional function calling, which is something that came out with two point zero, which we haven't talked a ton about externally, but is available in the API now, lets you sort of describe the sequence and the chain of the the functions. And, like, that is, like, purely to enable developers, like giving developers the control, and that made it into the research roadmap because because developers

Logan Kilpatrick 11:04 are top of mind for our research team.

Conor Bronsdon 11:07 I love that, especially because it aligns to what you said earlier about the blurring line between product development and research here. And this design principle of let's make it easier for developers to use, let's make it efficient for developers to use, is a fantastic one to enable problem solving with Gemini. Can you provide some examples of other ways that your team is enabling developers to build with Gemini?

Logan Kilpatrick 11:30 It's hard to not find examples of this because we spend our whole everything we do is to enable developers to build with Gemini. I think maybe a couple of fun examples of this that are top of mind is in December, we shipped the multimodal live API, which is this really incredible experience. If folks haven't tried this before, a istudio.google.com/live gets you into this experience where you can sort of show Gemini stuff and share your screen and and text prompt with it and your audio back. This really interesting multimodal experience,

Logan Kilpatrick 12:06 really showcasing sort of what I think is going to be possible in the near future, which is this like AI co presence with you sort of being able to collaborate with you and see the things that you see. And this is actually one of the big, and I'm curious like how you all think about this from like a leaderboard benchmark perspective in the agents world. But I think one of the big limitations of agents is the models

Logan Kilpatrick 12:31 and the sort of agentic software is not able to do the things fundamentally that we're able to do. Like, it can't see and interact in the way that we do, and this leads to, like, having to build a bunch of scaffolding in order to make this possible, and I think this, like, multimodal live version of the world is the version of the world in which you can actually, like, really dramatically simplify the scaffolding that needs to exist to enable agents, and the models can just see here and, like, control the same the same sort of controls that we're able to as humans,

Logan Kilpatrick 13:03 which I'm excited for for two really quick reasons. One, I think, like, reducing the barrier for developers to actually build stuff, I think, is really important. But two, I think there's like a lot of, you know, people in AI talk about the bitter lesson, which is this story of how the sort of general purpose approach to solving a problem, like oftentimes beats out the like very like

Logan Kilpatrick 13:25 specific approach to solving the problem. And it feels like this, like, vision type of workflow is is kind of a a potential example of the better lesson. I guess we'll find out over time whether or not this is true, but it it feels like vision first is going to be a really interesting feature for for AI agents.

Conor Bronsdon 13:43 I really agree because to me, it's kind of felt like these initial flurry of agents from 2024 were really like async junior digital employees. They can help you with some tasks, but they weren't quite ready for prime time at in-depth levels. And now we're seeing with not only these improved models, but improved understanding of how to approach multimodal, which has even more potential, then obviously these multi agent architectures,

Conor Bronsdon 14:08 there's some real problems getting Major workflows are being automated. And it seems like the impact is only continuing to accelerate here.

Logan Kilpatrick 14:17 Yeah, I agree with you. I think it is funny, and I was remarking to someone yesterday about how the narrative flip flop between like AI hitting a wall to like what is very clearly this like dramatic progress, both in the product side and in the research side. And I think this continues back to the thread of just these really blurry lines between like what innovation product or research innovation like enables these use cases to work. And I think that the takeaway for me is you need great models, but like, it actually also takes a long time to build great products. And that's part of, we're all still figuring out as an ecosystem, what are the right product experiences to bring AI to life? I think we're even seeing in the last couple of months this whole text to app explosion,

Logan Kilpatrick 15:01 which I think is now, I was just literally in the meeting before this, was talking to someone about this and like how it's this new frontier of use cases, which is super exciting.

Conor Bronsdon 15:11 Seems like there's been this kind of viewpoint, I think, sold from early AI about, oh, AI is gonna be this magic silver bullet. And then we all kind of had a moment where he reckoned with and said, oh, it is really magical in some ways, but there's also infrastructure needs. There's also work we have to do to like set it up for success. And as we build out new capabilities, we continue to find new areas of support we need, new opportunities.

Conor Bronsdon 15:35 But a lot of this kind of perspective, I think, particularly in the media and from some critics has been about like, oh, this wall we're hitting was more about, well, we're not seeing the magic. We're not seeing the silver bullet. And really, once you dig into the problem solving piece, you go, this is just part of scaling. This is just part of doing this well. It's just part of actually applying it to production use cases and moving out of research. And honestly, the rapidity with which this has all happened, the velocity at which with which the research and the products are moving,

Conor Bronsdon 16:04 is incredible if you zoom out. But it's easy in the moment to get caught up in, doesn't quite work yet,

Logan Kilpatrick 16:10 or this isn't 100% of the way there, instead of seeing how far it's coming, even just six months. Yeah, I think the narrative is for folks who are sort of close to the action, it's almost funny because if you go and talk to customers, like people who are putting AI into production, like a great example of how early we still are and how much juice there is to be squeezed out is

Logan Kilpatrick 16:31 could probably, if you went and talked to 10 customers, 10 people who are putting AI in production in front of their users, I bet you at least five out of 10 of those folks would not have a good eval story. And they don't actually even know They've put AI in production. They don't even know what success looks like for the AI that they put into production. Of course we haven't gotten to the product experience, so it's this magic

Logan Kilpatrick 16:54 silver bullet because that takes a lot of work and it takes a lot of, to your point, infrastructure. You have to Historically, maybe there wasn't great eval infrastructure. Maybe there wasn't great, like, ops sort of dashboarding in order to, like, keep track of do AB testing for different prompts. All this sort of infrastructure had to be built, and the ecosystem is running so fast that it is, like

Logan Kilpatrick 17:19 it it's almost like the the road is being built at the top level, and then, like, there is no like, the foundation has to also be built in order to, like, support that. And, like, that's happening in real time, but it's, like, there's some lag for a lot of the stuff. And to really have the successful use cases and the right product experiences, you need to have that foundation. And I think we're

Logan Kilpatrick 17:39 it feels like we're gonna go through this again kind of with this current multimodal agentic situation, which is I think a lot of the scaffolding that was built was around enabling this text heavy sort of your analogy of, like, this, like, junior digital employee. I think there needs to be this new set of scaffolding that enables sort of the next layer of this this agentic workflow, And that's gonna take time. Like, I think if you don't think that's true, like, you should reset your priors because, like, it is going to take time for some of those things to happen.

Conor Bronsdon 18:07 Absolutely. And I I'd be remiss if I didn't mention to folks listening that you should check out galleo.ai for eval infrastructure and some of our earlier episodes where we talk a lot more about this. Because I think you've got a great point here, which is there's so much AI infrastructure that needs to be built out still. There's so much that is so new still to all these products we're developing.

Conor Bronsdon 18:26 And multimodal capabilities are particularly an example of this, where we're just starting to scratch the surface here. How are you thinking about multimodal capabilities and how they could be leveraged in real world applications moving forward?

Logan Kilpatrick 18:40 Yeah, this has been a core part of my thesis for a long time about what the future is going to look like, which is that if you look around, basically multimodal AI is not in production at this point. Like, there is very little successful deployments. Like, there's a lot of successful text deployments at this point that's happening. I think we're starting to see sort of the inklings of some of these early multimodal use cases, but that is it's going to fundamentally change a lot of the way that the world works, partially because

Logan Kilpatrick 19:10 previous if you think about, like, what was the flow that had to exist, it was and I actually used to be, like, a machine learning engineer doing training computer vision models, and, I lived this in my life as a machine learning engineer, which is the amount of time it would take to solve one of these, like, domain specific computer vision problems was on the order of nine to twelve months in a successful case. Like, you had to you had to go and collect all the data. You had to train the model end to end. You had to put that thing in production. You had to, like, build all this infrastructure

Logan Kilpatrick 19:38 to make sure, like, how is it fault tolerant and reliable and all this stuff. And I have to imagine I I reflect on my personal career and journey every once in a while, and I have to imagine that if you if I could go back in and drop, like, a vision language model or the multimodal capabilities of one of these models into my hands just literally five years ago, it would have been like, this is alien technology. Like, it would have been the craziest thing in the world because it would have just solved those problems right out of the box. Like, the models can do image segmentation now, and they can do bounding boxes and object detection. It's like all of the domain specific computer vision problems

Logan Kilpatrick 20:17 just work from these models today, not even including, like, fine tuning for a bunch of these use cases to, like, take them to the the reliability degree that you need. And I I think the the thread of this that gets me excited is who who is this stopping from putting, like, really interesting vision use cases into production? It was, like, the early stage founders who had some crazy or, like, the business owner who has some crazy idea who, like, can't do that thing because they don't have a bunch of, like, ML PhDs who can go and train models for them from scratch. And, like, now those folks can literally just string up

Logan Kilpatrick 20:51 an LLM API and put whatever it is that the use case that they want and and bring that thing to life. And that gets me so excited for the world. And I think that's the promise of this technology, like actually coming to fruition of sort of democratizing access to be able to build some of these things.

Conor Bronsdon 21:06 Logan, I'd love to talk more about how Gemini is going to differentiate itself from other LLMs in the market. What's the approach that the development team is taking to not just enable developers, but differentiate from these other solutions?

Logan Kilpatrick 21:22 Yeah, this is this is an interesting thread to pull on. I think we've I think there's two sides of this coin. Like, one, sort of foundationally in some cases, like, we let research guide what's going to happen. And and in that context, it's like, you know, the researchers are close to this magic. Let's have them continue to do research and then, like, find out how to bring whatever that capability that's being unlocked by research to the outside world. And then on the other hand of the coin, it's like, hey. We know developers need this thing. Let's go and figure out how to make that happen. And and perhaps, like, a good example of the latter is long context. Like, we sort of the Gemini team knew, like, hey. This is a capability that would matter a lot. Let's go figure out how to make the algorithmic progress in order to enable long context and then, like, the infrastructure work in order to make that actually work at scale in production.

Logan Kilpatrick 22:10 And it's, I mean, it's been one of the biggest differentiators for Gemini. It's like Gemini has this long context window. We also have had this, you know, we were the first to ship native search. We were the first to ship caching. We were the first to have a native multimodal LLM that could take in video, images, audio. Hopefully, we'll be the the first folks to actually have the the capability generally available to output audio and images as well, is sort of the the sort of rounding out of that native multimodality

Logan Kilpatrick 22:41 story, which I'm really excited about. The other piece of this is you get a bunch of really interesting because of, and this is like a thing that I, is is one of the the many benefits of of Google and the breadth that Google has, which is as all the the different parts of Google go and deploy these models into production, there's, like, really bespoke constraints.

Logan Kilpatrick 23:03 Like, a an example of this is, you know, search has, you know, it has like, one of the reasons that made Google search so successful initially was, like, the speed at which you could get results back. And I think, like, there are these core design constraints that all of the different parts of Google have, and you end up with really interesting approaches to how to solve some of these problems,

Logan Kilpatrick 23:23 whether that's like a we make a smaller model or whether that's, you know, caching, etcetera, etcetera, as you sort of look at all these different constraints. So I think there's this strong partnership between research and all the all the product areas inside of Google as we as we think about, like, what are the right models and the right sort of architecture decisions to make as we bring these models to the world.

Conor Bronsdon 23:44 It also seems like there's an underrated part of the story that I don't even hear Google talking about as much as maybe I expect, which is that Google has a multi year advantage as far as experience with aligning their hardware and custom silicon to their models. What do you see this providing Gemini and enabling going forward?

Logan Kilpatrick 24:04 Yeah, this is a great, exciting thread. I think it's something that just like kind of makes sense to me. I mean, I'm not a hardware expert. I think I'm a beneficiary because I love the Gemini models and because we work on bringing them to the world of the custom silicon that we have. It's really like the two things are hand in hand. Gemini is optimized for this experience of running on on the TPUs that we have, the tensor processing units,

Logan Kilpatrick 24:30 and as this sort of innovation keeps happening on the TPU side, it it enables the models to be faster, better, cheaper, all this stuff. Like, you know, what why are we able to bring models to the world at the price point that we're able to? It's because we actually have control of the hardware too, and the cost basis is sort of included in part of that process, which really is a great advantage for us, but also the beneficiaries

Logan Kilpatrick 24:54 of the advantage is the developer community, which I love. It's such an awesome outcome for the world that this cost saving is passed on to developers and enables the external world to actually benefit from this.

Conor Bronsdon 25:06 Absolutely, and I mean, going back to our data again here, when we looked at these 17 models, Gemini was not only at the very top as far as performance, but also very close to the top end cost as far as being the cheapest. And like, this just enables you to solve harder problems without the incredible spending that sometimes it would take otherwise. And I just feel like it's such an underrated part of the story. Like you alluded to long context earlier. My understanding from reading

Conor Bronsdon 25:34 Google's announcements here is that long context has really been enabled by the incredible silicon advantage you have here with the custom chips. Is that correct?

Logan Kilpatrick 25:46 Yeah, I think So there's a bunch of research breakthroughs that enabled long context on the algorithmic side. But I think the only reason we're able to put long context into production, like at the scale that it is, is because of TPUs. I think the TPU story and without getting into all the details, like it is just as much a algorithmic breakthrough story as it is like an infrastructure breakthrough story. The fact that long context is able to work all the way up to 2,000,000

Logan Kilpatrick 26:13 tokens with the Pro series and model. So I'm excited to keep seeing this thread play out over time, which is like, we'll get longer context. And I think this can inform the decisions that we make as far as what hardware to build. Like, hey, we want 10,000,000 token context or 100,000,000 token context. How do we do that? Like, maybe there's a hardware story to look into there. Oh, man. I wish we had two hours for this conversation. I'm gonna have to have you recommend someone from the the hardware research side because I I think it would be great to dive deeper on this topic. I know we wanna focus this conversation

Conor Bronsdon 26:44 around what's happening with Gemini, around enabling developers. And one of the really interesting things that you kind of alluded to earlier is that Gemini often has been ahead of the game as far as its competitors with getting different features out in research, and yet it's not talked about as much. Of this is marketing. Some of this is maybe there wasn't a complete feature set for this. One example that I think is really interesting that I have to give credit to our producer Adam for highlighting is deep research, where Gemini has had a deep research capacity

Conor Bronsdon 27:21 for weeks, months now. And, you know, depending when you listen to this back episode, maybe it's years when you listen to And yet it was OpenAI's announcement of the same features with the same name that seems to have generated all this buzz in the last week or so. And not to say that's not a cool product, I'm also a fan of what OpenAI is doing. But it does seem like there is this

Conor Bronsdon 27:47 almost hesitation in the ecosystem to talk about some of the really incredible stuff that Google is doing. I'd love to understand how you see deep research enabling more agentic workflows or more problem solving for devs, Because it it seems like such an interesting tool.

Logan Kilpatrick 28:02 Yeah. No. It it and and deep research feels like a a product experience, which, like, Google is very uniquely suited to bring to the world. And just in the last, like, the time of this recording and sort of at the time of release, like, will have now rolled out to all you to to users on on Android and users on iOS, which I'm really excited about. So you can sort of on the go bring,

Logan Kilpatrick 28:25 bring deep research with you and and kick off some of those long those long running search queries. It's similar to the story of, like, how we bring search to developers in the API. I think, like, deep research is is such a a product that Google is so uniquely suited to to build and and really starts to push on this thread of like, what is the world going to actually look like in the

Logan Kilpatrick 28:52 next like three to five years? The piece that I like deep research through this lens specifically is like this, you know, long running asynchronous experience where you sort of, I don't need to sit around and wait for this thing to happen. I just like sort of get an email or I'm notified via my mobile app push notification that this thing is now done. And you could actually imagine where proactively

Logan Kilpatrick 29:17 this starts to happen for you in the future where you have Gmail and you're getting maybe an investment pitch or someone saying, Hey, let's go on a trip somewhere. And then deep research is behind the scenes. It's like, Hey, let me actually just be proactive here and compile a bunch of information for you so that you can make an informed decision. And in many cases, maybe you ignore all that because you're like, Yeah,

Logan Kilpatrick 29:39 I don't have time to read through all this stuff. But for people who want to, I think about girlfriend. We spend a ton of time looking at all the options of everything that we might do, whether it's going to dinner or going on a trip somewhere. And to sort of take away that burden and just like get access to that information, I think is going to be a really, it's going to be such a powerful product experience.

Logan Kilpatrick 30:01 And I also think like the thread around trying to, I think the Gemini app has made a ton of progress in eliminating the time delay between like maybe a model like an experimental model availability and a rollout into the Gemini app. I think they've made a ton of strides there. And like even this week, when we rolled out with Pro and Flashlight and the two point o flash model, they actually shipped the two point o flash model,

Logan Kilpatrick 30:26 before we did and then rolled out the two point o pro model at the same time that we did. So I think getting the models out to everyone as soon as possible is is super critical And the the sort of feature set as well as far as, like, deep research and other things, getting that out to more users as soon as possible, I think, is gonna be is gonna be awesome. And and also actually seeing the

Logan Kilpatrick 30:46 the deep research experience benefit from our latest models. Like we have two point zero Flash and two point zero Pro Experimental in our reasoning model. And seeing what that experience looks like with those three models is going to be truly awesome. I absolutely think this integration story with other Google products is a really exciting one for Gemini,

Conor Bronsdon 31:04 both with everything that's happened with the Google Suite, and I have to give major credit to, I think both Google and Microsoft for how they're thinking through integrations with the G Suite products and Microsoft case with their Office products. Like there's a clear story here for like, we're just going to help you do everything you're doing better. But where I see Google really differentiating is, hey, look, we also have this incredible

Conor Bronsdon 31:26 hardware opportunity with our mobile devices, with Pixel, with Android. Talk to me a bit about what's coming or kind of the vision for the future with Gemini on mobile.

Logan Kilpatrick 31:37 Yeah. I I think my mind was initially blown on this. Back at Google IO, I got my hands on one of the Pixel devices that was running the Astra build and getting to, like, spend I spent, like, the better part of a few hours, like, going around and trying a bunch of different use cases and demos and stuff like that, just in, like, normal ways. I actually tweeted out this video of trying that Astra experience

Logan Kilpatrick 32:02 natively on Pixel using one of those really confusing espresso machines. Every time I walk by one of those an office or somewhere, I'm like, There's no shot that I'm gonna try to This use is like foreign space technology is what it looks like to me. And I was literally live going through this experience of showing the the Astra experience. Like, how do I actually do this? And it was giving me step by step instructions,

Logan Kilpatrick 32:29 and I actually did make the coffee successfully, and now I know how to do this. So it was, like, this really cool experience of something really simple, you start to think about, like, what are the the hardware implications of this? And, again, like, I'm not in the hardware space, and I'm not in the Pixel team, but I'm excited about, like, the models actually becoming available and, like, the entire experience living on your phone and, like, doing it in a way that's that's private and secure makes a ton of sense. And, like, oh, the the same innovation that enables our large Gemini models is the same innovation that ends up enabling both the open source Gemma models, but also Gemini Nano, our smallest model,

Logan Kilpatrick 33:06 that works on device. So there's this really great trickle of research, into the hardware products as well.

Conor Bronsdon 33:12 Fantastic stuff. And I I also think it's exciting to see what's happening with G Suite. And it's been interesting for me as someone who I hope I can be honest with you here. I was mostly using Claude and ChatGPT until Gemini rolled into the G Suite migration. Now I'm finding myself using Gemini way more. It's really exposed me in a way to some of the fantastic stuff that Google is doing with a lot of your models. And obviously now with our research here, I'm kind of going, oh, okay, I need to be using this for agents more. It seems like there's this major opportunity to have,

Conor Bronsdon 33:44 I'll call it Gemini awareness, expand not only across developers, but across the entire range of folks worldwide who are using all these G Suite products, whether for school, for work, or otherwise.

Logan Kilpatrick 33:56 Yeah, I agree with you. And I think like the core of this is we have to earn it by building a great product experience, by building a great model experience. And I think the last few months has been this incredible amount of momentum progress that we've made. And I think on the model side, the models are the best they've ever been. We have the strongest offering of Gemini models that we ever have. I think the product story is is becoming extremely compelling well across

Logan Kilpatrick 34:23 the whole suite of of Google's offering which is just yeah. It gets me excited. And like, again, at the end of the day, it's like remove friction from users, enable people to get more work done, make sure that they don't do the things they don't wanna do. It's like, this is just such a it's such a win for for folks and to be able to do more stuff, I I get jazzed about it. Absolutely. I I think it's really cool to see the opportunities to

Conor Bronsdon 34:49 find time for more creative or deep work tasks when you can offload some things to an AI agent or, you know, AI integration to help you through this. And it it's just a really exciting future where not only developers, but normal everyday folks are gonna be enabled to just solve more problems, with some of these Gemini integrations. And it's really cool to see this vision coming to fruition here. And I, for one, am really excited to see how

Conor Bronsdon 35:17 you and your teams at Google continue to roll this out and kind of what the future looks like. Is there anything you wanna share for us as far as kind of a cap or this conversation about, in your mind, what that future should look like for developers? What should they be thinking about as they start to interact with Gemini, try out these new models, and really push the boundaries if it's possible?

Logan Kilpatrick 35:41 Yeah, a couple of things. One, on the cost side, I think the big challenge with the age of AI is builders are economically disincentivized from building with AI. Like, the more AI you add into a product experience, the higher your cost is, the lower your margins end up being, etcetera, etcetera. And that's why the cost piece really matters. And I think the future is going to look a lot like probably,

Logan Kilpatrick 36:06 you know, in the near term, probably 100 to 1000x more inference compute that's happening as agents themselves start to actually be the ones who are Like right now, there's almost this one to one with a couple of exceptions, this one to one correlation between a human using some input device and AI inference or tokens being generated. And you imagine the future where that's actually not the case. I have an entire team of agents across all these domains that are helping me live my life and do the things that I do more effectively.

Logan Kilpatrick 36:41 In that world, the only way that that world ends up being possible is with models that are at the cost of the Gemini two point zero flash models. Like, I think it's just really difficult to imagine the the products are gonna be really expensive if that's not the case. And, like, I think the the the world needs the products to be actually reachable by most people if the value is going to be there. I'm happy that that's the direction that we've been pushing in that we're gonna continue to push in. And

Logan Kilpatrick 37:08 I think the second piece of this, which I'm excited about, and we don't have, yeah, we don't we don't have anything in this space yet, but is like a direction that I'm really I'm jazzed about is around memory. I think there's a lot of stuff around the right product experience for these agent workflows or these different tools is like, again, when you go into the G Suite and you're in Google Docs, like, I want the model to know, like, when I kick open a blank Google Doc, I was just, you know, pinging people about this context of the doc that I'm gonna write and like, you know, maybe the framing

Logan Kilpatrick 37:42 of why this is a problem that I wanna try to And like all that context needs to be captured and brought with me, not just in the sort of domain of the Google products, but everywhere I go. Everyone should be using tons of different tools, and as I go to other tools, I want those things to have the context that I had as I was just building stuff. And I think that world is not going to happen

Logan Kilpatrick 38:06 instantaneously, but it is the future that we're going towards. Yes. And again, you think about, like, the agents themselves. I don't want some generic agent that, like, has knows nothing about me and, like, doesn't actually have any of context, solving problems on my behalf. And this is actually partially why it doesn't work today because people are The agents themselves are like, You have this task. You know nothing about who you're doing this task for. You're sort of just this invisible

Logan Kilpatrick 38:33 force behind the scenes. And really, for those tasks to be solved in the way that I think makes users happy, there has to be that level of context. There has to be that level of understanding of the user. And this memory piece, just like humans, where the things that aren't helpful kind of get deprioritized but are still somewhat accessible in certain contexts. And the stuff that's relevant is sort of baked in and helps the model get better over time and the experience get better is just such a cool future.

Conor Bronsdon 39:02 It's a really exciting future. And Logan, thank you so much for coming on the show to talk about it and share this vision for Gemini. It was a pleasure to have you with us today. Thanks so much. Yeah. This was a ton of fun. Thank you for having me. Where can folks who are listening go to learn more about your work and the work you're doing at DeepMind and with Gemini?

Logan Kilpatrick 39:20 Yeah. If you wanna try out Gemini, if you're if you're sort of in the consumer world, gemini.google.com. If you're a developer, you can go to aistudio.google.com.

Conor Bronsdon 39:31 I'm on Twitter. I'm on LinkedIn. You can email me. I email all the time. So whatever is easiest, always happy to help on Gemini stuff. Fantastic. We're gonna link all of that in the show notes. Thank you so much for listening everyone. And thank you again, Logan, for coming on. You can find the new AI agent leaderboard we discussed linked in the show notes or live on galleo.ai.

Conor Bronsdon 39:49 And if you are checking out the show notes, go ahead and hit that subscribe button. Whatever platform you're on, you know, we'd love to have you listening every week. It helps us bring more incredible guests like Logan on the show. And maybe we'll have someone on from the Google research side come sometime soon. So stay tuned for that. And don't forget to check out our Galileo YouTube channel for full episodes,

Conor Bronsdon 40:09 clips of our favorite moments from each guest, webinars, and so much more, like recordings from our events like Productionize. We will see you next week. Thank you so much, Logan. Thank you.