What is the new MediaPipe LLM Inference API for Android phones?

Google's MediaPipe LLM Inference API lets large AI models run directly on Android phones. This makes AI features faster and more private because they don't need the internet. It is currently for research and testing.

How does MediaPipe allow AI models to run on Android phones?

MediaPipe uses special tricks to make AI models smaller and faster for phones. This includes making the models use less data and improving how they work. Developers can use this API to add AI to their apps.

What AI models can run on Android phones with MediaPipe?

Several models like Gemma 3 1B, DeepSeek, and Phi-4 Mini can run on Android phones. Some models might work better on the phone's main processor (CPU) while others might use the graphics chip (GPU).

What do Android developers need to use the MediaPipe LLM Inference API?

Developers need Android Studio and a phone running Android 7.0 or newer. They can use code examples provided by Google to build apps that use these on-device AI models. The app uses modern tools like Jetpack Compose for its look.

Android phones can now run AI models using MediaPipe

Google's 'MediaPipe LLM Inference API' is enabling the operation of Large Language Models (LLMs) directly on Android devices. This development, intended for experimental and research use, allows for on-device AI processing across various platforms, a feat made possible by significant optimizations within the on-device stack. These include new operational components, quantization techniques, caching mechanisms, and weight sharing. The API supports model weights compatible with its architecture, broadening the scope of usable models.

MediaPipe LLM Inference Android Example - GitHub - 1

The 'MediaPipe LLM Inference API' offers a simplified route for integrating LLMs into Android applications, supporting both standard model paths and remote URLs for model weights. Developers can implement this by configuring LlmInferenceOptions, specifying parameters like the model path, maximum tokens, temperature, and optionally a LoRA (Low-Rank Adaptation) path for model fine-tuning. The process involves creating an instance of the LlmInference task using these options and then employing methods such as generateResponse() or generateResponseAsync() to interact with the model.

MediaPipe LLM Inference Android Example - GitHub - 2

The 'MediaPipe Samples' repository provides a functional Android demo application, tested with 'Android Studio Hedgehog'. Building this demo requires a physical Android device running a minimum OS version of 'SDK 24' ('Android 7.0 - Nougat') with developer mode activated. The application itself follows a modern Android architecture, utilizing 'Jetpack Compose' for its user interface and the 'MVVM (Model-View-ViewModel)' pattern for state management.

MediaPipe LLM Inference Android Example - GitHub - 3

Key constants and methods are defined within the InferenceModel class, which manages the core LLM operations. These include MAX_TOKENS for response length limits and DECODE_TOKEN_OFFSET to ensure sufficient response capacity. Core methods include createEngine() for loading models, createSession() for setting up inference parameters, generateResponseAsync() for processing prompts, and estimateTokensRemaining() for calculating available context.

MediaPipe LLM Inference Android Example - GitHub - 4

The Model enum within the repository outlines available LLM options, such as 'Gemma 3 1B', 'DeepSeek', and 'Phi-4 Mini', specifying their intended hardware backend (CPU or GPU) and any authentication requirements. Some models, like 'Gemma', necessitate authentication with 'Hugging Face' for model downloads. The system handles various UI states, with specific implementations for different models, ensuring proper prompt formatting and state management during interactions. The application navigation is managed through 'Jetpack Navigation', featuring distinct screens for loading processes and user interaction.

The integration extends to advanced use cases, including the adaptation of MediaPipe demos for 'Kotlin Multiplatform' projects. Specific tests have highlighted performance differences between CPU and GPU versions of models like 'Gemma 2B', with the CPU version sometimes providing more reasonable outputs. The framework's architecture supports customization, such as adapting resource loading methods from R to Res for 'Compose Multiplatform'.

The 'MediaPipe LLM Inference API' is presented as a straightforward method to incorporate LLMs onto devices, with optimizations aiming to enhance performance. It's noted that the framework has been evolving, with guidance available for converting other models and running LLM inference with LoRA adapters. The project is under active development, with releases and examples continually updated.

Android phones can now run AI models using MediaPipe

Frequently Asked Questions

NewsRadar

The Present

Search Records

Explore

Android phones can now run AI models using MediaPipe

Frequently Asked Questions

Know What Changed

Why AI Agents Forget Data in May 2026 and How Engineers Fix It

iOS 26.5 Update Causes iPhone Battery Drain May 2026

What are Copilot+ PCs and how do they differ from AI PCs in 2026?

Android iPhone File Transfer Now Uses QR Codes for Easy Sharing

Quantum Stocks May Rise 20% by End of 2026

iOS 27: Apple Camera App Gets New Custom Controls and Siri Mode

NewsRadar

The Present

Search Records

Explore