Google has effectively bifurcated its machine learning stack, releasing TensorFlow 2.21 as a maintenance-focused production tool while shifting all mobile and edge weight to a rebranded runtime called LiteRT. This new framework, the successor to TensorFlow Lite, introduces MLDrift, a GPU acceleration layer that claims a 1.4x speed increase over previous delegates. The update prioritizes hardware-level access to Qualcomm and MediaTek Neural Processing Units (NPUs), attempting to bypass the fragmentation of vendor-specific SDKs.

"While TensorFlow continues to provide stability for production, we recommend exploring our latest updates for Keras 3, JAX, and PyTorch for new work in Generative AI." — Google Developers Blog
Hardware Access and Performance Shifts
The LiteRT framework is no longer strictly tethered to the TensorFlow ecosystem. It now acts as a cross-platform sink for models originated in PyTorch and JAX, using new conversion libraries to pull external weights into the .tflite flatbuffer format.

The LiteRT Torch library allows direct conversion from PyTorch.
MLDrift replaces older GPU delegates with a tensor-based data organization that uses asynchronous execution to reduce CPU idling.
Early access for NPU support focuses on abstraction, meaning a single API call now targets diverse silicon from different phone chip makers without requiring manual driver-level tuning.
Comparing the Runtimes
The transition from TFLite to LiteRT marks a shift from general-purpose mobile math to specific Generative AI orchestration.

| Feature | Old (TFLite) | New (LiteRT) |
|---|---|---|
| GPU Engine | Standard Delegate | MLDrift (1.4x faster) |
| NPU Support | Vendor SDKs required | Unified NPU API (Qualcomm/MediaTek) |
| PyTorch Flow | Complex/Indirect | LiteRT Torch Converter (Direct) |
| Large Models | Not Optimized | LiteRT-LM Orchestration |
The Decay of the TensorFlow Brand
The release of TensorFlow 2.21 signals a narrow focus on "stability" rather than growth. Google’s internal commitments for TF are now limited to a specific list of legacy modules: TF.data, TensorFlow Serving, TFX, and TensorBoard. By explicitly pointing developers toward JAX and PyTorch for Generative AI, the organization acknowledges the industry's departure from the original TensorFlow graph architecture.
Read More: Nvidia CEO Jensen Huang gets new $4 million bonus goal for 2027 to grow AI chip sales

Deployment of Small Language Models
Parallel to the runtime changes, Google is pushing Gemma 3 and Gemma 3n models into the LiteRT ecosystem.
These models use zero-copy buffer interoperability to move data between the NPU and GPU without wasting cycles on memory duplication.
The LiteRT-LM library is a new layer designed to handle the "kv-cache" and memory-bound issues inherent in running LLMs on phones.
For developers, the AI Edge Portal provides a central point for benchmarking these models across different mobile hardware.
Background: Why LiteRT?The rebranding follows years of "TensorFlow" becoming a synonym for rigid, complex graph structures. As Meta's PyTorch became the default for research, Google's "LiteRT" (Lite Runtime) represents a pragmatic move to save its mobile dominance by detaching the runtime from the struggling parent framework. It is an admission that the runtime matters more than the training library in the age of "Edge AI."
Read More: TECNO Shows New Thin Modular Phone Concept That Attaches Parts with Magnets