Feature engineering can happen before training, inside the model pipeline, or inside a data warehouse. Two practical options are Keras preprocessing layers and BigQuery ML transformations.
This guide explains when to use each approach and what patterns learners should understand first.
Quick Answer
Use Keras preprocessing layers when you want preprocessing packaged with a TensorFlow model. Use BigQuery ML feature engineering when your data already lives in BigQuery and you want SQL-based transformations close to the warehouse.
Key Takeaways
- Keras preprocessing layers can normalize numbers, encode categories, vectorize text, and create embeddings.
- BigQuery ML can transform features with SQL and built-in ML preprocessing functions.
- The same transformations used during training must also be applied during prediction.
- BigQuery ML
TRANSFORMhelps keep training and prediction transformations consistent. - Always compare feature engineering changes against a baseline metric.
Feature Engineering In Keras
Keras preprocessing layers help build models that accept raw or lightly processed data and transform it inside the model pipeline.
Common layers include:
| Layer | Use |
|---|---|
Normalization | Standardize numeric features |
StringLookup | Map string categories to indexes |
CategoryEncoding | Convert categories to encoded vectors |
TextVectorization | Convert raw text to tokenized representations |
Embedding | Learn dense representations for categories or tokens |
Keras Workflow
A practical workflow:
- Create a training dataset.
- Identify numeric, categorical, and text columns.
- Build preprocessing layers for each feature type.
- Adapt layers on training data where needed.
- Connect preprocessing outputs to the model.
- Train and evaluate.
- Export the model with preprocessing included.
This reduces the chance that training and serving use different transformations.
Example Keras Patterns
For numeric features:
normalizer = tf.keras.layers.Normalization()
normalizer.adapt(train_numeric_values)
For string categories:
lookup = tf.keras.layers.StringLookup(output_mode="one_hot")
lookup.adapt(train_category_values)
For text:
vectorizer = tf.keras.layers.TextVectorization(max_tokens=10000)
vectorizer.adapt(train_text_values)
The exact implementation depends on the dataset, but the concept is the same: convert raw values into model-ready tensors.
Feature Engineering In BigQuery ML
BigQuery ML is useful when the training data already lives in BigQuery and the team wants feature engineering in SQL.
Useful patterns include:
- extracting dates and times,
- calculating distances,
- creating ratios,
- bucketizing numeric values,
- crossing categorical features,
- expanding polynomial features,
- filtering bad training examples,
- using
TRANSFORMfor repeatable preprocessing.
BigQuery ML Feature Functions
| Function | Use |
|---|---|
ML.FEATURE_CROSS | Combine categorical features |
ML.BUCKETIZE | Convert numeric values into buckets |
ML.POLYNOMIAL_EXPAND | Create polynomial combinations |
ML.NGRAMS | Create text n-grams |
ML.STANDARD_SCALER | Standardize values |
Why TRANSFORM Matters
The TRANSFORM clause can define feature transformations as part of the model. That helps ensure the same logic is used when training and when calling ML.PREDICT.
This reduces training-serving skew, which happens when the model sees features one way during training and a different way during prediction.
Example BigQuery ML Ideas
For a taxi fare model, useful engineered features might include:
- trip distance,
- pickup hour,
- pickup day of week,
- pickup and dropoff location buckets,
- pickup-hour feature cross,
- geographic distance,
- passenger count,
- toll-adjusted fare.
The feature set should be tested against a baseline model using a metric such as RMSE for regression.
Keras vs BigQuery ML
| Decision | Keras | BigQuery ML |
|---|---|---|
| Best for | TensorFlow model pipelines | SQL-first ML workflows |
| Data location | Files, tensors, pipelines | BigQuery tables |
| Preprocessing | Model layers | SQL and ML functions |
| Serving consistency | Export with model | Use TRANSFORM |
| Learner fit | Python/TensorFlow users | SQL/data warehouse users |
Related AI Charcha Reading
- Feature Engineering for Machine Learning
- How to Choose Good Machine Learning Features
- Vertex AI Feature Store Guide
FAQ
How can Keras do feature engineering?
Keras can do feature engineering with preprocessing layers such as Normalization, StringLookup, CategoryEncoding, TextVectorization, and Embedding.
How can BigQuery ML do feature engineering?
BigQuery ML can do feature engineering with SQL transformations, preprocessing functions such as ML.FEATURE_CROSS, ML.BUCKETIZE, ML.POLYNOMIAL_EXPAND, and the TRANSFORM clause.
Bottom Line
Keras and BigQuery ML both support practical feature engineering. Use Keras when preprocessing belongs inside the model pipeline. Use BigQuery ML when SQL-first feature engineering close to warehouse data is the simplest path.