Introducing Gemma 3n

The Hacker News discussion about Gemma and Gemini, particularly in the context of on-device AI, reveals several key themes: confusion about the distinction between the two, licensing and accessibility differences, performance claims, and broader commentary on Google's communication practices.

Confusion Between Gemma and Gemini

A primary point of discussion is the lack of clarity regarding the difference between Gemma and Gemini, especially for on-device applications, as both are described as not requiring network access.

"I still don't understand the difference between Gemma and Gemini for on-device, since both don't need network access." (wiradikusuma)
The user points out that the description of Gemini Nano ("Gemini Nano allows you to deliver rich generative AI experiences without needing a network connection or sending data to the cloud") would still be valid if "Gemini" were replaced with "Gemma," highlighting the perceived overlap. (wiradikusuma)
"Perplexity.ai gave an easier to understand response than Gemini 2.5 afaict. Gemini nano is for Android only. Gemma is available for other platforms and has multiple size options. So it seems like Gemini nano might be a very focused Gemma everywhere to follow the biology metaphor instead of the Italian name interpretation" (readthenotes1) This suggests that Gemma is presented as more broadly available, while Gemini Nano is explicitly for Android.
"Your reply adds more confusion, imo." (zackangelo) This comment indicates that explanations, even those from Google, are not successfully clarifying the distinction.
"I can't figure out what the high-level news is. What does this concretely do that Gemma didn't already do, or what benchmark/tasks did it improve upon?" (lucb1e) This user expresses a fundamental difficulty in understanding the concrete advancements or unique selling points of one product over the other.

Licensing and Accessibility

A significant theme revolves around the licensing terms and how freely users can access and utilize the models, with Gemma generally perceived as more open.

"Licensing. You can't use Gemini Nano weights directly (at least commercial ly) and must interact with them through Android MLKit or similar Google approved runtimes. You can use Gemma commercially using whatever runtime or framework you can get to run it." (tyushk) This highlights a critical difference: Gemini Nano requires specific Google-approved runtimes, implying more restrictions, while Gemma offers greater freedom in how it's deployed.
"Gemma is open source and apache 2.0 licensed. If you want to include it with an app you have to package it yourself." (jabroni_salad) This statement emphasizes Gemma's open-source nature and the self-packaging requirement for developers.
"Closed source but open weight. Let’s not ruin the definition of the term in advantage of big companies." (nicce) This comment pushes back on the "open source" label for Gemma, suggesting that while weights are available, the full training process and codebase might not be, thus preserving commercial advantage for Google despite the "open weight" nature.
"The inference code and model architecture IS open source[0] and there are many other high quality open source implementations of the model (in many cases contributed by Google engineers[1]). To your point: they do not publish the data used to train the model so you can't re-create it from scratch." (zackangelo) This elaborates on the "open weight" vs. "open source" debate, noting that the code and architecture are available, but the training data is not, preventing full reproduction.
"If for some reason you had the training data, is it even possible to create an exact (possibly same hash?) copy of the model? Seems like there are a lot of other pieces missing like the training harness, hardware it was trained on, etc?" (candiddevmike) This user questions the practical feasibility of perfectly replicating a model even with access to training data, pointing to other variables like training infrastructure.

Performance and Real-World Usage

The discussion touches upon the practical performance of these models, with some users expressing skepticism about Google's claims and others sharing their experiences with local deployments.

"I suspect the difference is in the training data. Gemini is much more locked down and if it tries to repeat something from the draining data verbatim you will get a 'recitation error'." (impure) This user speculates that differences in how training data is handled might lead to different behaviors, specifically mentioning a potential "recitation error" for Gemini.
"Made some GGUFs if anyone wants to run them! ... I'm also working on an inference + finetuning Colab demo! I'm very impressed since Gemma 3N has audio, text and vision!" (danielhanchen) This demonstrates active community engagement in making the models accessible for local use.
"This looks amazing given the parameter sizes and capabilities (audio, visual, text). I like the idea of keeping simple tasks local. I’ll be curious to see if this can be run on an M1 machine…" (turnsout) This user expresses enthusiasm for the model's capabilities and the prospect of running it locally on consumer hardware.
"This should run fine on most hardware - CPU inference of the E2B model on my Pixel 8 Pro gives me ~9tok/second of decode speed." (bigyabai) A user shares positive local inference speeds on a mobile device.
"Sure it can, easiest way is to get ollama, then ollama run gemma3n" (Fergusonb) This provides a concrete method for users to run Gemma locally.
"LM Studio has MLX variants of the model out: ... However it's still 8B parameters and there are no quantized models just yet." (minimaxir) This indicates community efforts to adapt models for different hardware and platforms, noting limitations like the absence of quantized versions.
"Anyone have any idea on the viability of running this on a Pi5 16GB? I have a few fun ideas if this can handle working with images (or even video?) well." (Workaccount2) A user explores the potential for running the model on low-power, single-board computers.
"The 4-bit quant weighs 4.25 GB and then you need space for the rest of the inference process. So, yeah you can definitely run the model on a Pi, you may have to wait some time for results." (gardnr) This user provides practical advice on running the model on a Raspberry Pi, highlighting the trade-off between model size and inference speed.
"I've been playing around with E4B in AI Studio and it has been giving me really great results, much better than what you'd expect from an 8B model. In fact I'm thinking of trying to install it on a VPS so I can have an alternative to pricy APIs." (impure) This user finds Gemma's performance to be surprisingly good for its size and considers using it as a more cost-effective alternative to commercial APIs.
"The APK that you linked, runs the inference on CPU and does not run it on Google Tensor." (catchmrbharath) This points out that a specific APK might not be leveraging dedicated hardware (Google Tensor) as expected for on-device performance.
"Open weights" (awestroke) This is a concise summary of a perceived key characteristic differentiating Gemma.

Google's Communication and Marketing Practices

Several comments express frustration and skepticism regarding Google's clarity and honesty in communicating about their AI products, with accusations of misleading claims and a lack of transparency.

"The fact that you need HN and competitors to explain your offering should make Google reflect…" (ridruejo) This user suggests that Google's own explanations are insufficient, requiring community forums to clarify product distinctions.
"The Gemini billing dashboard makes me feel sad and confused." (gardnr) This points to usability issues even with Google's product interfaces.
"refulgentis: Somethings really screwy with on-device models from Google, I can't put my finger on what, and I think being ex-Google is screwing with my ability to evaluate: Cherry-picking something that's quick to evaluate: 'High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences.' You can download an APK from the official Google project for this, linked from the blogpost: ... If I download it, run it on Pixel Fold, actual 2B model which is half the size of the ones the 60 fps claim is made for, it takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff photos). Generation speed is shown at 4-5 tokens per second, slightly slower than what llama.cpp does on my phone. (I maintain an AI app that inter alia, wraps llama.cpp on all platforms) So, 0.16 frames a second, not 60 fps. The blog post is so jammed up with so many claims re: this is special for on-device and performance that just...seemingly aren't true. At all. - Are they missing a demo APK? - Was there some massive TPU leap since the Pixel Fold release? - Is there a lot of BS in there that they're pretty sure won't be called out in a systematic way, given the amount of effort it takes to get this inferencing? - I used to work on Pixel, and I remember thinking that it seemed like there weren't actually public APIs for the TPU. Is that what's going on? In any case, either: A) I'm missing something, big or B) they are lying, repeatedly, big time, in a way that would be shown near-immediately when you actually tried building on it because it "enables real-time, on-device video analysis and interactive experiences." Everything I've seen the last year or two indicates they are lying, big time, regularly. But if that's the case: - How are they getting away with it, over this length of time? - How come I never see anyone else mention these gaps?" (refulgentis) This is a detailed and scathing critique of Google's performance claims for on-device models, contrasting marketing statements with actual observed performance and questioning transparency around underlying hardware (TPU) and potential exaggerations or outright falsehoods.
"Are they missing a demo APK?" (refulgentis) This question reflects a lack of readily available, functional demonstration tools.
"Is there a lot of BS in there that they're pretty sure won't be called out in a systematic way, given the amount of effort it takes to get this inferencing?" (refulgentis) This expresses suspicion that misleading claims are made with the expectation that most users won't have the resources or motivation to verify them.
"I used to work on Pixel, and I remember thinking that it seemed like there weren't actually public APIs for the TPU." (refulgentis) This points to a potential systemic issue of inaccessible hardware acceleration.
"A) I'm missing something, big or B) they are lying, repeatedly, big time, in a way that would be shown near-immediately when you actually tried building on it..." (refulgentis) This stark dichotomy captures the user's significant doubt about Google's honesty.
"Everything I've seen the last year or two indicates they are lying, big time, regularly." (refulgentis) This is a strong accusation of consistent misrepresentation from Google.
"How are they getting away with it, over this length of time?" (refulgentis) This question highlights the apparent lack of accountability for alleged deceptive practices.
"How come I never see anyone else mention these gaps?" (refulgentis) This suggests that the community might be overlooking significant issues or that the user's experience is unique, possibly due to their technical background and access to testing.
"This is completely offtopic, but in case your question in genuine: [link to video on pronunciation]" (svat) While strictly off-topic, this interaction can be seen as a microcosm of a broader tendency for discussions to diverge, potentially further obscuring the core product information.