Feature engineering can happen before training, inside the model pipeline, or inside a data warehouse. Two practical options are Keras preprocessing layers and BigQuery ML transformations.

This guide explains when to use each approach and what patterns learners should understand first.

Quick Answer

Use Keras preprocessing layers when you want preprocessing packaged with a TensorFlow model. Use BigQuery ML feature engineering when your data already lives in BigQuery and you want SQL-based transformations close to the warehouse.

Key Takeaways

  • Keras preprocessing layers can normalize numbers, encode categories, vectorize text, and create embeddings.
  • BigQuery ML can transform features with SQL and built-in ML preprocessing functions.
  • The same transformations used during training must also be applied during prediction.
  • BigQuery ML TRANSFORM helps keep training and prediction transformations consistent.
  • Always compare feature engineering changes against a baseline metric.

Feature Engineering In Keras

Keras preprocessing layers help build models that accept raw or lightly processed data and transform it inside the model pipeline.

Common layers include:

LayerUse
NormalizationStandardize numeric features
StringLookupMap string categories to indexes
CategoryEncodingConvert categories to encoded vectors
TextVectorizationConvert raw text to tokenized representations
EmbeddingLearn dense representations for categories or tokens

Keras Workflow

A practical workflow:

  1. Create a training dataset.
  2. Identify numeric, categorical, and text columns.
  3. Build preprocessing layers for each feature type.
  4. Adapt layers on training data where needed.
  5. Connect preprocessing outputs to the model.
  6. Train and evaluate.
  7. Export the model with preprocessing included.

This reduces the chance that training and serving use different transformations.

Example Keras Patterns

For numeric features:

normalizer = tf.keras.layers.Normalization()
normalizer.adapt(train_numeric_values)

For string categories:

lookup = tf.keras.layers.StringLookup(output_mode="one_hot")
lookup.adapt(train_category_values)

For text:

vectorizer = tf.keras.layers.TextVectorization(max_tokens=10000)
vectorizer.adapt(train_text_values)

The exact implementation depends on the dataset, but the concept is the same: convert raw values into model-ready tensors.

Feature Engineering In BigQuery ML

BigQuery ML is useful when the training data already lives in BigQuery and the team wants feature engineering in SQL.

Useful patterns include:

  • extracting dates and times,
  • calculating distances,
  • creating ratios,
  • bucketizing numeric values,
  • crossing categorical features,
  • expanding polynomial features,
  • filtering bad training examples,
  • using TRANSFORM for repeatable preprocessing.

BigQuery ML Feature Functions

FunctionUse
ML.FEATURE_CROSSCombine categorical features
ML.BUCKETIZEConvert numeric values into buckets
ML.POLYNOMIAL_EXPANDCreate polynomial combinations
ML.NGRAMSCreate text n-grams
ML.STANDARD_SCALERStandardize values

Why TRANSFORM Matters

The TRANSFORM clause can define feature transformations as part of the model. That helps ensure the same logic is used when training and when calling ML.PREDICT.

This reduces training-serving skew, which happens when the model sees features one way during training and a different way during prediction.

Example BigQuery ML Ideas

For a taxi fare model, useful engineered features might include:

  • trip distance,
  • pickup hour,
  • pickup day of week,
  • pickup and dropoff location buckets,
  • pickup-hour feature cross,
  • geographic distance,
  • passenger count,
  • toll-adjusted fare.

The feature set should be tested against a baseline model using a metric such as RMSE for regression.

Keras vs BigQuery ML

DecisionKerasBigQuery ML
Best forTensorFlow model pipelinesSQL-first ML workflows
Data locationFiles, tensors, pipelinesBigQuery tables
PreprocessingModel layersSQL and ML functions
Serving consistencyExport with modelUse TRANSFORM
Learner fitPython/TensorFlow usersSQL/data warehouse users

FAQ

How can Keras do feature engineering?

Keras can do feature engineering with preprocessing layers such as Normalization, StringLookup, CategoryEncoding, TextVectorization, and Embedding.

How can BigQuery ML do feature engineering?

BigQuery ML can do feature engineering with SQL transformations, preprocessing functions such as ML.FEATURE_CROSS, ML.BUCKETIZE, ML.POLYNOMIAL_EXPAND, and the TRANSFORM clause.

Bottom Line

Keras and BigQuery ML both support practical feature engineering. Use Keras when preprocessing belongs inside the model pipeline. Use BigQuery ML when SQL-first feature engineering close to warehouse data is the simplest path.