Chapter 35: TensorFlow Lite & Mobile Deployment¶
“A model isn’t truly intelligent until it fits in your pocket.”
📱 Introduction: Why Deploy on Mobile?¶
TensorFlow Lite (TFLite) is TensorFlow’s lightweight version tailored for mobile and edge devices like Android, iOS, Raspberry Pi, and microcontrollers. With the rise of real-time AI on phones (face unlock, voice assistants, image classifiers), deploying efficient models to run locally is now a core part of ML engineering.
In this chapter, you’ll learn how to:
- Convert TensorFlow models to
.tflite
- Optimize for performance with quantization
- Deploy and test on Android/iOS
- Run inference in Python or embedded systems
Step 1: Convert a TensorFlow Model to TFLite¶
We’ll begin with a simple image classifier (e.g., trained on MNIST or MobileNet).
import tensorflow as tf
# Load or train a model (example: MobileNetV2 pretrained)
model = tf.keras.applications.MobileNetV2(weights="imagenet", input_shape=(224, 224, 3))
# Convert to TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model
with open("mobilenetv2.tflite", "wb") as f:
f.write(tflite_model)
Step 2: Optimize the Model with Quantization¶
Reducing model size and improving speed, especially for devices with limited resources.
# Dynamic Range Quantization (fastest + simplest)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_model = converter.convert()
with open("mobilenetv2_quant.tflite", "wb") as f:
f.write(quantized_model)
-
Post-training integer quantization
-
Full integer quantization (with representative dataset)
-
Float16 quantization for GPUs
Step 3: Run Inference with TFLite Interpreter (Python)¶
import numpy as np
from PIL import Image
# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="mobilenetv2_quant.tflite")
interpreter.allocate_tensors()
# Get input/output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Prepare image input
img = Image.open("cat.jpg").resize((224, 224))
input_data = np.expand_dims(np.array(img, dtype=np.float32) / 255.0, axis=0)
# Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("Prediction:", np.argmax(output_data))
📲 Step 4: Deploy to Android/iOS¶
You can now take your .tflite
model and integrate it into:
-
Android Studio using
TensorFlow Lite Task Library
orInterpreter API
-
iOS apps using Swift or Objective-C
-
Flutter apps using the
tflite_flutter
plugin
💡 Bonus: Use ML Model Binding in Android Studio for no-code integration of .tflite
models with image or text inputs!
Other Deployment Targets¶
TensorFlow Lite isn’t just for phones:
Platform | Use Case |
---|---|
Raspberry Pi | Smart cameras, IoT vision |
Coral Edge TPU | Ultra-fast inference with acceleration |
Microcontrollers | TinyML with TensorFlow Lite for Microcontrollers |
Summary¶
In this chapter, you learned:
-
How to convert Keras models into efficient .tflite files
-
How to optimize models with quantization
-
How to run inference in Python, Android, or iOS
-
That TFLite opens the door to AI at the edge — fast, offline, and private
TensorFlow Lite brings ML to environments where connectivity and power are limited, but the need for real-time intelligence is high — from smartwatches and cameras to drones and industrial machines.