What is Microsoft Foundry used for?

Microsoft Foundry is used to create projects, deploy models, manage connections, test models, build agents, add data, evaluate outputs, and develop generative AI applications on Azure.

When should you use RAG instead of fine-tuning?

Use RAG when the model needs current or private context from your data. Use fine-tuning when you need the model to respond with a more consistent tone, style, or behavior.

What should teams evaluate before deploying a generative AI app?

Teams should evaluate response quality, groundedness, relevance, safety risks, cost, latency, and whether the app has a responsible deployment plan.

How to Build Generative AI Apps in Azure with Microsoft Foundry

Building a generative AI app in Azure is not only about deploying a model. A useful app needs the right project setup, a suitable model, secure service connections, grounding data, safety controls, and a repeatable evaluation process.

This guide turns the AI-102 training material on Microsoft Foundry into a practical build path for teams that want to plan, develop, and evaluate generative AI applications on Azure.

Quick Answer

To build a generative AI app in Azure with Microsoft Foundry, create a Foundry project, choose and deploy a model from the model catalog, connect to the project with the Microsoft Foundry SDK, build a chat or RAG workflow, add safety controls, and evaluate the app before broad release.

Key Takeaways

Start with the use case, risk level, and data needs before choosing a model.
Use the model catalog to compare vendors, licenses, cost, deployment options, and benchmarks.
Use prompt engineering and RAG before fine-tuning when the model mainly needs better context.
Use the Microsoft Foundry SDK to connect securely to project resources and deployed models.
Add content filters, prompt shields, and a responsible release plan before production.
Evaluate quality, groundedness, relevance, safety, latency, and cost with manual and automated tests.

Step 1: Define the AI App You Are Building

Before opening the portal, define what the app should do.

Useful starting questions:

Is the app answering questions, writing content, summarizing documents, extracting information, or automating a workflow?
Does the app need private business data?
Does it need to remember conversation history?
What type of output is acceptable?
What harms could occur if the app gives a poor answer?

This matters because a simple chat app, a document-grounded assistant, and an agentic workflow need different architecture choices.

Step 2: Create the Right Microsoft Foundry Project

Microsoft Foundry gives teams a project workspace for building generative AI solutions. In a project, you can manage models, service connections, data, indexes, prompt flows, evaluations, content safety, and agents.

For a basic generative AI app, start with a Microsoft Foundry project. Use a hub-based setup when your organization needs stronger shared project structure, connected services, storage, key vault, compute, and governance across multiple projects.

At the project stage, decide:

who owns the project,
which Azure subscription and region are appropriate,
which services need to be connected,
whether the app needs data storage or search indexes,
what environment will be used for testing and production.

Step 3: Choose a Model from the Catalog

The model catalog is where you compare and select models. The training deck highlights several practical selection factors:

vendor and license,
model size and cost,
performance metrics,
deployment options,
regional availability.

Do not choose a model only because it is popular. Choose it because it fits the workflow.

For example:

A support assistant may need low latency, predictable answers, and strong grounding.
A writing assistant may need fluency, tone control, and broad language quality.
A coding assistant may need strong reasoning and code generation.
A private enterprise assistant may need regional availability, data controls, and predictable cost.

Step 4: Compare Benchmarks, But Do Not Stop There

Benchmarks help you compare model quality, accuracy, cost, coherence, fluency, groundedness, relevance, and latency. They are useful, but they do not replace testing with your own prompts and data.

Use benchmarks to shortlist models. Then test your actual use case.

Good test prompts include:

common user questions,
confusing or incomplete questions,
sensitive requests,
questions that require source context,
prompts that should be refused or redirected,
examples where the ideal answer is already known.

Step 5: Deploy the Model

A model must be deployed to an endpoint before your application can use it.

The training material distinguishes deployment choices such as standard deployment, serverless compute, and managed compute. The right option depends on the model type, billing model, hosting requirements, and operational needs.

Before deploying, confirm:

expected traffic,
token usage,
budget limits,
latency needs,
region support,
whether the model supports your required deployment option.

Step 6: Use Prompt Engineering First

Prompt engineering is usually the fastest improvement path.

Use the system message to define:

role,
tone,
answer format,
safety boundaries,
source expectations,
what the model should avoid.

Use user prompts to provide the specific task and context. Conversation history can help with follow-up questions, but it can also increase token cost and introduce stale assumptions. Keep history useful, not endless.

Step 7: Add RAG When the App Needs Your Data

Use retrieval augmented generation when the model needs information that is not reliably available in its training data.

RAG works by:

Receiving the user input.
Retrieving relevant grounding data.
Adding that data to the prompt.
Generating a contextual response.

In Microsoft Foundry, a common Azure pattern is to use Azure AI Search for indexing and retrieval, then include search connection details in the chat client configuration.

RAG is a strong fit for:

product catalogs,
policy documents,
support knowledge bases,
internal procedures,
research collections,
customer-facing FAQ assistants.

Step 8: Fine-Tune Only When Behavior Needs to Change

Fine-tuning is additional training on examples of prompts and responses. It is useful when you want the model to act in a more consistent way.

Use fine-tuning for:

consistent tone,
strict response style,
repeatable formatting,
domain-specific interaction patterns.

Do not use fine-tuning just to add facts. If the model needs current business knowledge, RAG is usually the better first option.

Fine-tuning data should be carefully prepared. The training material shows a JSON Lines pattern with messages for system, user, and assistant roles. Each example should teach the behavior you want repeated.

Step 9: Build the App with the Microsoft Foundry SDK

The Microsoft Foundry SDK helps developers connect securely to a project and use project connections.

The common development flow is:

Create or select a Foundry project.
Deploy a model.
Use the project endpoint to create an authenticated project client.
Retrieve project connections.
Create a chat client for the deployed model.
Call the model through the appropriate API.
Add grounding, safety, and evaluation logic around the model call.

Use project connections so the app can access resources such as Azure OpenAI, Azure AI Search, Azure AI Services, and agent services without hardcoding credentials into application logic.

Step 10: Add Responsible AI Controls

Responsible AI should be designed before launch, not patched in afterward.

A practical responsible AI plan includes four actions:

Map possible harms.
Measure whether those harms appear in outputs.
Mitigate risk at multiple layers.
Manage the app with an operational readiness plan.

Useful mitigation layers include:

model choice,
system message,
grounding data,
content filters,
prompt shields,
user experience design,
phased rollout,
monitoring and feedback.

Step 11: Use Content Filters and Prompt Shields

Content filters help reduce harmful input and output. The training deck covers filter categories such as violence, hate and unfairness, sexual content, and self-harm.

Prompt shields help defend against jailbreak-style attempts to bypass safety rules.

For production apps, define:

which content categories should be blocked,
whether default filters are enough,
whether custom filters are needed,
what users see when content is blocked,
how blocked events are logged and reviewed.

Step 12: Evaluate Before Release

Evaluation is what turns a demo into a deployable system.

Manual evaluation is useful when a human needs to judge whether responses are acceptable for specific prompts. Automated evaluation is useful when you need repeatable testing across a dataset, model, or prompt flow.

Evaluate for:

coherence,
fluency,
similarity to expected answers,
groundedness,
relevance,
safety risk,
latency,
cost.

If the app uses grounding data, groundedness and relevance are especially important. A response can sound fluent while still being poorly supported by the retrieved context.

Common Mistakes to Avoid

Choosing a model before defining the workflow.
Treating benchmark scores as a substitute for real use-case testing.
Using fine-tuning when RAG would solve the real problem.
Putting too much responsibility into the prompt and not enough into architecture.
Skipping content filters because the app is still a pilot.
Launching without manual and automated evaluation.
Forgetting to monitor cost, latency, and user feedback after release.

Simple Azure Generative AI Build Checklist

Use this checklist before you move from prototype to pilot:

The use case is clear.
The risk level is documented.
The model choice is justified.
The model deployment option matches cost and performance needs.
The system message defines behavior and boundaries.
RAG is used when private or current data is required.
Fine-tuning is used only when consistency of behavior requires it.
Project connections are used instead of hardcoded credentials.
Content filters and prompt shields are configured.
Manual evaluation covers realistic prompts.
Automated evaluation covers quality and safety metrics.
The release plan includes monitoring and feedback.

FAQ

Is Microsoft Foundry only for chat apps?

No. Microsoft Foundry can support chat apps, RAG apps, prompt flows, model evaluation, content safety, agents, and applications that connect multiple Azure AI resources.

Should I use RAG or fine-tuning for my Azure AI app?

Use RAG when the app needs accurate answers from your own data. Use fine-tuning when the app needs a more consistent response style, tone, or behavior.

What is groundedness in generative AI?

Groundedness means the response is aligned with the context or data provided to the model. It is important when the app must answer using specific business documents, product data, policies, or search results.

How do I know if a generative AI app is ready for production?

It is closer to production-ready when the model choice, prompt design, grounding data, safety controls, evaluation results, deployment plan, monitoring process, and ownership model are all documented and tested.

Bottom Line

Microsoft Foundry can help teams build generative AI apps on Azure, but the best results come from disciplined design. Start with the workflow, choose the model carefully, use RAG for context, reserve fine-tuning for behavior, add responsible AI controls, and evaluate the system before scaling it.