- Google AI Edge Gallery brings Gemma local AI to macOS, letting Mac users run five Gemma models completely offline.
- The new Gemma local AI 12B model is multimodal and designed to run on consumer MacBooks with 16GB of RAM.
- Google also launched AI Edge Eloquent, a free on-device dictation app with polishing and style options for Mac.
- Unlike Ollama or LM Studio, Google’s app currently locks users into its own model lineup — no third-party models allowed.
- Google AI Edge Gallery brings Gemma local AI to macOS, letting Mac users run five Gemma models completely offline.
- The new Gemma local AI 12B model is multimodal and designed to run on consumer MacBooks with 16GB of RAM.
- Google also launched AI Edge Eloquent, a free on-device dictation app with polishing and style options for Mac.
- Unlike Ollama or LM Studio, Google’s app currently locks users into its own model lineup — no third-party models allowed.
Google Brings Gemma Local AI to the Mac
Google has quietly made a significant move for privacy-conscious Mac users. The company launched Google AI Edge Gallery for macOS this week, giving Mac owners a polished, first-party way to run Gemma local AI models entirely on their own hardware — no Wi-Fi, no cloud, no data leaving the device. Alongside that comes the debut of the Gemma 4 12B model and a new on-device dictation app called Google AI Edge Eloquent. It’s a lot to drop in one day.
The timing isn’t accidental. There’s been a slow but steady groundswell of interest in running AI models locally. Privacy concerns around cloud-based tools, growing skepticism about how companies handle conversation data, and just the practical appeal of working offline have all pushed more users toward local inference. Google clearly wants a seat at that table — and it wants the experience to feel native, not like something you cobbled together from command-line tools.
Why Anyone Would Want to Run AI Offline in the First Place
For most people, interacting with an AI assistant means sending your words to OpenAI, Anthropic, or Google’s own servers. That works well most of the time, but it has real tradeoffs. Your conversation goes somewhere else. You need a reliable connection. And in enterprise or sensitive personal contexts, that data transit can be a genuine problem.
Local models flip that entirely. They run on your machine’s own CPU, GPU, or in Apple Silicon‘s case, its unified memory architecture — meaning your prompts and responses never leave your laptop. The tradeoff has traditionally been capability: local models are smaller, and smaller has usually meant worse. But the gap has been closing fast. What a 7 or 8 billion parameter model could do two years ago looks very different from what today’s architectures can achieve at similar sizes. Gemma local AI is a direct example of that trend.
What’s Actually in Google AI Edge Gallery for Mac
The app currently offers five models, all from Google’s own Gemma family. The full list breaks down like this — where “it” stands for instruct-tuned, meaning they’re shaped to follow instructions rather than just autocomplete text:
- Gemma-4-12B-it — the headline release, new as of this week
- Gemma-4-E2B-it
- Gemma-4-E4B-it
- Gemma-3n-E2B-it
- Gemma-3n-E4B-it
That walled-garden approach is worth flagging immediately. Platforms like Ollama and LM Studio — the two tools that have dominated the local AI scene for the past couple of years — let you pull in virtually any compatible open-weight model: Llama, Mistral, Qwen, Phi, you name it. Hugging Face hosts thousands of options. Google AI Edge Gallery, by contrast, is Google’s models only, at least for now. That’s a significant constraint, and it’s the kind of decision that’ll either feel like a smart, curated experience or a frustrating limitation depending on who you are.
The Gemma 4 12B Model Is the Real Story Here
The standout piece of today’s launch is undoubtedly the Gemma 4 12B. Google describes it as designed to bring “agentic, multimodal intelligence directly to your laptop,

