Skip to content

Chapter 23: RNNs & LSTMs

Words are not isolated—they remember what came before. RNNs give models a sense of time.


Words in a sentence form a sequence—and meaning often depends on order. Traditional models like Bag-of-Words or TF-IDF treat words as independent, which limits their ability to capture structure or context.

Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) units were among the first neural architectures designed to remember context over time—a fundamental shift in how machines processed text.

By the end of this chapter, you’ll:

  • Understand how RNNs and LSTMs work
  • Use tf.keras.layers.SimpleRNN and LSTM
  • Train a model on sequential data (e.g., text sentiment)
  • Visualize how memory affects predictions

What Is an RNN?

RNNs process sequences one element at a time, passing hidden state from one time step to the next.

Input:      I → love → TensorFlow
Hidden:    h0 → h1 → h2
Output:     y0 → y1 → y2
Each output depends not just on the current input, but also on past context.


The Problem: Vanishing Gradients

RNNs struggle with long-term dependencies—earlier words lose influence as the sequence grows. That’s where LSTMs come in.


LSTMs: Memory with Gates

LSTMs introduce internal cell states and gates (forget, input, output) to regulate information flow.

  • Forget Gate: What to forget from the previous cell
  • Input Gate: What new information to store
  • Output Gate: What to pass to the next step
  • This design helps retain useful context across longer sequences.

Implementing an LSTM in TensorFlow

import tensorflow as tf
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.models import Sequential

model = Sequential([
    Embedding(input_dim=10000, output_dim=64),
    LSTM(128),
    Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Dataset: IMDB Sentiment (Binary)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=10000)
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=200)
x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=200)

Train the Model

model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)


Visualizing Memory

Use the LSTM output layer to view how sentiment builds over time. You can create attention overlays or extract intermediate states:

intermediate_model = tf.keras.Model(inputs=model.input, outputs=model.layers[1].output)
lstm_output = intermediate_model.predict(x_test[:1])


When to Use RNNs / LSTMs

Use Case Recommended Model
Short sequences RNN
Long-range memory LSTM or GRU
Streaming data RNN/LSTM
Parallelization needed Transformer (Next Chapter)

GRU: Simpler Alternative

Gated Recurrent Unit (GRU) is a simplified version of LSTM:

  • Combines forget and input gates
  • Fewer parameters, faster to train
from tensorflow.keras.layers import GRU
model = Sequential([Embedding(10000, 64), GRU(128), Dense(1, activation='sigmoid')])

Summary

In this chapter, you:

  • Learned how RNNs and LSTMs retain sequential memory
  • Implemented an LSTM for text sentiment classification
  • Understood their advantages and limitations
  • Explored GRU as a lightweight alternative

RNNs and LSTMs were the backbone of NLP before Transformers. They taught us that order and memory matter in language.