Google LiteRT Replaces TensorFlow Name for Mobile AI Speed

Google's new LiteRT framework is 1.4x faster for mobile AI thanks to the MLDrift GPU layer, compared to the old TensorFlow Lite.

Google has effectively bifurcated its machine learning stack, releasing TensorFlow 2.21 as a maintenance-focused production tool while shifting all mobile and edge weight to a rebranded runtime called LiteRT. This new framework, the successor to TensorFlow Lite, introduces MLDrift, a GPU acceleration layer that claims a 1.4x speed increase over previous delegates. The update prioritizes hardware-level access to Qualcomm and MediaTek Neural Processing Units (NPUs), attempting to bypass the fragmentation of vendor-specific SDKs.

Budget has increased debt burden, say BJP and JD (S) - 1

"While TensorFlow continues to provide stability for production, we recommend exploring our latest updates for Keras 3, JAX, and PyTorch for new work in Generative AI." — Google Developers Blog

Hardware Access and Performance Shifts

The LiteRT framework is no longer strictly tethered to the TensorFlow ecosystem. It now acts as a cross-platform sink for models originated in PyTorch and JAX, using new conversion libraries to pull external weights into the .tflite flatbuffer format.

Budget has increased debt burden, say BJP and JD (S) - 2
  • The LiteRT Torch library allows direct conversion from PyTorch.

  • MLDrift replaces older GPU delegates with a tensor-based data organization that uses asynchronous execution to reduce CPU idling.

  • Early access for NPU support focuses on abstraction, meaning a single API call now targets diverse silicon from different phone chip makers without requiring manual driver-level tuning.

Comparing the Runtimes

The transition from TFLite to LiteRT marks a shift from general-purpose mobile math to specific Generative AI orchestration.

Budget has increased debt burden, say BJP and JD (S) - 3
FeatureOld (TFLite)New (LiteRT)
GPU EngineStandard DelegateMLDrift (1.4x faster)
NPU SupportVendor SDKs requiredUnified NPU API (Qualcomm/MediaTek)
PyTorch FlowComplex/IndirectLiteRT Torch Converter (Direct)
Large ModelsNot OptimizedLiteRT-LM Orchestration

The Decay of the TensorFlow Brand

The release of TensorFlow 2.21 signals a narrow focus on "stability" rather than growth. Google’s internal commitments for TF are now limited to a specific list of legacy modules: TF.data, TensorFlow Serving, TFX, and TensorBoard. By explicitly pointing developers toward JAX and PyTorch for Generative AI, the organization acknowledges the industry's departure from the original TensorFlow graph architecture.

Read More: Nvidia CEO Jensen Huang gets new $4 million bonus goal for 2027 to grow AI chip sales

Budget has increased debt burden, say BJP and JD (S) - 4

Deployment of Small Language Models

Parallel to the runtime changes, Google is pushing Gemma 3 and Gemma 3n models into the LiteRT ecosystem.

  • These models use zero-copy buffer interoperability to move data between the NPU and GPU without wasting cycles on memory duplication.

  • The LiteRT-LM library is a new layer designed to handle the "kv-cache" and memory-bound issues inherent in running LLMs on phones.

  • For developers, the AI Edge Portal provides a central point for benchmarking these models across different mobile hardware.

Background: Why LiteRT?The rebranding follows years of "TensorFlow" becoming a synonym for rigid, complex graph structures. As Meta's PyTorch became the default for research, Google's "LiteRT" (Lite Runtime) represents a pragmatic move to save its mobile dominance by detaching the runtime from the struggling parent framework. It is an admission that the runtime matters more than the training library in the age of "Edge AI."

Read More: TECNO Shows New Thin Modular Phone Concept That Attaches Parts with Magnets

Frequently Asked Questions

Q: Why did Google change the name from TensorFlow Lite to LiteRT?
Google renamed TensorFlow Lite to LiteRT to focus on mobile and edge AI. This new framework is separate from the main TensorFlow for production use.
Q: What is the new MLDrift layer in LiteRT and why is it important?
MLDrift is a new GPU acceleration layer in LiteRT. It is 1.4x faster than older methods and helps speed up AI tasks on mobile phones.
Q: Can LiteRT use AI models made with PyTorch or JAX?
Yes, LiteRT can now use AI models from PyTorch and JAX. New tools help convert these models into the format LiteRT uses, making it more flexible.
Q: How does LiteRT improve NPU support on phones?
LiteRT offers a single API to work with NPUs from different chip makers like Qualcomm and MediaTek. This makes it easier for developers to use the hardware without special tuning.
Q: What is Google saying about using TensorFlow for new Generative AI work?
Google suggests using Keras 3, JAX, and PyTorch for new Generative AI projects. TensorFlow 2.21 is now mainly for stability and older systems.
Q: How does LiteRT help run large AI models like Gemma on phones?
LiteRT has a new LiteRT-LM library to handle large language models on phones. It uses special techniques to manage memory and speed up tasks like the 'kv-cache'.