What is the difference between batch prediction and online prediction?

Batch prediction is asynchronous and used for many predictions at once. Online prediction is synchronous and used when an application needs quick prediction responses.

What does model monitoring check?

Model monitoring checks whether production data changes over time, including training-serving skew, feature drift, input changes, and other signals that may affect model quality.

Why do ML models need monitoring after deployment?

Models need monitoring because real-world data changes. A model that performed well during training can become less reliable when user behavior, source systems, or business conditions change.

Vertex AI Prediction and Model Monitoring Guide

Training a model is only one part of machine learning. After a model is trained and validated, it must serve predictions. Then the team must monitor whether the model continues to behave well in production.

This guide explains batch prediction, online prediction, and model monitoring in simple terms.

Quick Answer

Use batch prediction when you need many predictions at once and do not need an instant response. Use online prediction when an application needs a fast prediction through an endpoint. Use model monitoring to detect changes in production data, training-serving skew, drift, and behavior that may reduce model quality.

Key Takeaways

Batch prediction is better for offline or scheduled prediction jobs.
Online prediction is better for real-time applications.
Pre-built containers can simplify prediction serving.
Custom containers are useful when serving needs custom logic.
Model monitoring helps detect skew and drift after launch.
Alert thresholds should match the business risk of the model.

Batch Prediction

Batch prediction is used when many prediction requests can be processed together.

Examples:

score all customers overnight,
predict demand for next week,
classify a large set of documents,
update risk scores in a table,
generate recommendations for many users.

Batch prediction is usually asynchronous. You submit a job and review results when the job finishes.

Online Prediction

Online prediction is used when an application needs a quick response.

Examples:

show a recommendation while a user is on a site,
classify a support request as it arrives,
predict fraud risk during a transaction,
estimate delivery time during checkout.

Online prediction usually uses a deployed model endpoint.

Batch vs Online Prediction

Question	Batch prediction	Online prediction
Response needed immediately?	No	Yes
Works well for many records?	Yes	Sometimes
Used by applications?	Usually indirectly	Yes
Common pattern	Scheduled job	API endpoint
Main concern	Throughput and cost	Latency and availability

Serving Containers

Vertex AI can use pre-built containers or custom containers for serving.

Pre-built containers are helpful when the model format and framework are supported. They reduce setup work.

Custom containers are useful when:

the model needs custom preprocessing,
the serving logic is special,
the framework is not supported by a pre-built container,
the container must handle custom health checks or prediction routes.

What Model Monitoring Checks

Model monitoring helps track whether production behavior changes.

Important signals:

training-serving skew,
feature drift,
prediction distribution changes,
missing input values,
unusual input categories,
data quality issues,
prediction volume changes,
latency and error rate.

Alert Thresholds

Alert thresholds should not be copied blindly. They depend on the use case.

A model used for marketing recommendations may tolerate more drift than a model used for risk review or safety-sensitive decisions.

Set thresholds based on:

business impact,
data volatility,
model importance,
review capacity,
past monitoring results,
acceptable false alarms.

Practical Monitoring Workflow

Define the model owner.
Decide what features should be monitored.
Set initial thresholds.
Capture serving inputs and prediction outputs.
Review alerts regularly.
Investigate drift or skew.
Retrain, rollback, or update preprocessing when needed.

Common Mistakes

deploying without monitoring
ignoring input data changes
using online prediction when batch would be simpler
setting alert thresholds too tight or too loose
not assigning alert ownership
monitoring technical metrics but not business impact

Bottom Line

Prediction makes the model useful, but monitoring keeps it trustworthy. Choose batch or online prediction based on the workflow, then monitor production data and behavior so the team knows when the model needs attention.