Chapter 36: TensorFlow Extended (TFX) for Production Pipelines¶
“Models aren’t magic — they’re the output of a pipeline that’s reproducible, testable, and scalable.”
Introduction: From Experiment to Deployment¶
Training a model in a notebook is one thing — deploying it at scale in a robust production environment is a whole different story. That’s where TFX (TensorFlow Extended) comes in.
TFX is an end-to-end ML platform for production-grade ML pipelines built around TensorFlow. It handles:
-
Data ingestion
-
Data validation
-
Feature engineering
-
Model training
-
odel validation
-
Deployment
All with pipeline reproducibility, CI/CD, and version control baked in.
Key Components of a TFX Pipeline¶
Component | Purpose |
---|---|
ExampleGen | Ingest and split raw data (CSV, TFRecord, etc.) |
StatisticsGen | Computes statistics over data |
SchemaGen | Infers data schema from stats |
ExampleValidator | Detects anomalies and missing values |
Transform | Performs feature engineering |
Trainer | Trains a model using TensorFlow |
Evaluator | Measures model quality (blessing or rejection) |
Pusher | Pushes the model to serving environment |
Example: Building a TFX Pipeline (Basic)¶
pip install -q tfx
Step 1: Directory Setup¶
import os
PIPELINE_NAME = "sentiment_pipeline"
PIPELINE_ROOT = os.path.join("pipelines", PIPELINE_NAME)
METADATA_PATH = os.path.join("metadata", PIPELINE_NAME, "metadata.db")
Define Pipeline Components¶
from tfx.components import CsvExampleGen
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
context = InteractiveContext()
example_gen = CsvExampleGen(input_base="data/")
context.run(example_gen)
Step 3: Feature Engineering and Model Training¶
from tfx.components import Transform, Trainer
from tfx.proto import trainer_pb2
trainer = Trainer(
module_file='model.py', # Your model logic here
examples=example_gen.outputs['examples'],
train_args=trainer_pb2.TrainArgs(num_steps=1000),
eval_args=trainer_pb2.EvalArgs(num_steps=500)
)
context.run(trainer)
Step 4: Evaluating and Pushing¶
from tfx.components import Evaluator, Pusher
evaluator = Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model']
)
pusher = Pusher(
model=trainer.outputs['model'],
push_destination=tfx.proto.pusher_pb2.PushDestination(
filesystem=tfx.proto.pusher_pb2.PushDestination.Filesystem(
base_directory="serving_model/"
)
)
)
context.run(pusher)
Optional: Orchestration Tools¶
TFX integrates with orchestrators like:
-
Apache Airflow
-
Kubeflow Pipelines
-
Vertex AI Pipelines (GCP)
-
Dagster (community)
This lets you schedule, version, and monitor your pipelines in production environments.
Bonus Features¶
Model Blessing: Only deploy models that pass thresholds
CI/CD for ML: Automate training, evaluation, and deployment
ML Metadata Tracking: Reproducibility and lineage
TensorBoard Integration: For monitoring and debugging
Summary¶
In this chapter, you learned:
-
What TensorFlow Extended (TFX) is and why it matters in production
-
How to build a simple TFX pipeline: ingest → validate → train → deploy
-
How to scale with orchestration and CI/CD tools
-
How TFX promotes reliability, observability, and reproducibility in ML workflows
TFX is your bridge between research and reality. It ensures your models are not only accurate, but also trusted, trackable, and repeatable in the messy world of production systems.