Chapter 8: Ragged, Sparse, and String Tensors¶
“Not all data fits in neat boxes. TensorFlow still makes it work.”
8.1 What Are Non-Standard Tensors?¶
Not all data comes in a clean matrix shape like [batch_size, features]. Real-world examples often include:
- Sentences of different lengths (NLP)
- Feature vectors with missing values
- Text tokens, file paths, categorical strings
Enter:
- tf.RaggedTensor
- tf.SparseTensor
- tf.Tensor (with dtype=tf.string)
8.2 Ragged Tensors – For Variable-Length Sequences¶
Use Case:¶
Think of sentences with different numbers of words:
sentences = [
["Hello", "GPT-san"],
["TensorFlow"],
["Welcome", "to", "deep", "learning"]
]
✅ Code:¶
import tensorflow as tf
rt = tf.ragged.constant([
[1, 2, 3],
[4, 5],
[6]
])
print(rt)
print("Shape:", rt.shape)
Key Features:¶
- Use .ragged_rank to inspect hierarchy
- Can still use many standard ops like indexing, slicing
- Great for tokenized text or nested lists
8.3 Sparse Tensors – For Efficiency in Mostly-Zero Data¶
Use Case:¶
When most values in a tensor are zero, storing all of them is wasteful. Use tf.SparseTensor to store just the non-zeros.
✅ Code:
st = tf.sparse.SparseTensor(
indices=[[0, 1], [1, 0]],
values=[10, 20],
dense_shape=[3, 3]
)
dense = tf.sparse.to_dense(st)
print(dense)
Key Features:¶
- Saves memory for large sparse data (e.g. recommender systems, one-hot vectors)
- Can convert to/from dense tensors
- Used heavily in embedding lookup and graph data
8.4 String Tensors – For Text Data¶
Use Case:¶
NLP often starts with raw strings—TensorFlow supports them natively.
str_tensor = tf.constant(["Tensor", "Flow", "Rocks"])
print(str_tensor)
print(tf.strings.length(str_tensor)) # Character length
print(tf.strings.upper(str_tensor)) # Uppercase conversion
print(tf.strings.join([str_tensor, "!"])) # Add exclamations
Key Features:¶
- Native support for Unicode
- Integrates with tf.strings, tf.text, and TextVectorization
- First step before tokenization
8.5 Summary¶
- Ragged tensors store data with uneven lengths (e.g. variable-length sentences).
- Sparse tensors store only non-zero elements—ideal for memory efficiency.
- String tensors enable natural language input processing natively.
- These types unlock real-world workflows where structure is messy or incomplete.
“Not all data fits in neat boxes. TensorFlow still makes it work.”