Building a generative AI app in Azure is not only about deploying a model. A useful app needs the right project setup, a suitable model, secure service connections, grounding data, safety controls, and a repeatable evaluation process.
This guide turns the AI-102 training material on Microsoft Foundry into a practical build path for teams that want to plan, develop, and evaluate generative AI applications on Azure.
Quick Answer
To build a generative AI app in Azure with Microsoft Foundry, create a Foundry project, choose and deploy a model from the model catalog, connect to the project with the Microsoft Foundry SDK, build a chat or RAG workflow, add safety controls, and evaluate the app before broad release.
Key Takeaways
- Start with the use case, risk level, and data needs before choosing a model.
- Use the model catalog to compare vendors, licenses, cost, deployment options, and benchmarks.
- Use prompt engineering and RAG before fine-tuning when the model mainly needs better context.
- Use the Microsoft Foundry SDK to connect securely to project resources and deployed models.
- Add content filters, prompt shields, and a responsible release plan before production.
- Evaluate quality, groundedness, relevance, safety, latency, and cost with manual and automated tests.
Step 1: Define the AI App You Are Building
Before opening the portal, define what the app should do.
Useful starting questions:
- Is the app answering questions, writing content, summarizing documents, extracting information, or automating a workflow?
- Does the app need private business data?
- Does it need to remember conversation history?
- What type of output is acceptable?
- What harms could occur if the app gives a poor answer?
This matters because a simple chat app, a document-grounded assistant, and an agentic workflow need different architecture choices.
Step 2: Create the Right Microsoft Foundry Project
Microsoft Foundry gives teams a project workspace for building generative AI solutions. In a project, you can manage models, service connections, data, indexes, prompt flows, evaluations, content safety, and agents.
For a basic generative AI app, start with a Microsoft Foundry project. Use a hub-based setup when your organization needs stronger shared project structure, connected services, storage, key vault, compute, and governance across multiple projects.
At the project stage, decide:
- who owns the project,
- which Azure subscription and region are appropriate,
- which services need to be connected,
- whether the app needs data storage or search indexes,
- what environment will be used for testing and production.
Step 3: Choose a Model from the Catalog
The model catalog is where you compare and select models. The training deck highlights several practical selection factors:
- vendor and license,
- model size and cost,
- performance metrics,
- deployment options,
- regional availability.
Do not choose a model only because it is popular. Choose it because it fits the workflow.
For example:
- A support assistant may need low latency, predictable answers, and strong grounding.
- A writing assistant may need fluency, tone control, and broad language quality.
- A coding assistant may need strong reasoning and code generation.
- A private enterprise assistant may need regional availability, data controls, and predictable cost.
Step 4: Compare Benchmarks, But Do Not Stop There
Benchmarks help you compare model quality, accuracy, cost, coherence, fluency, groundedness, relevance, and latency. They are useful, but they do not replace testing with your own prompts and data.
Use benchmarks to shortlist models. Then test your actual use case.
Good test prompts include:
- common user questions,
- confusing or incomplete questions,
- sensitive requests,
- questions that require source context,
- prompts that should be refused or redirected,
- examples where the ideal answer is already known.
Step 5: Deploy the Model
A model must be deployed to an endpoint before your application can use it.
The training material distinguishes deployment choices such as standard deployment, serverless compute, and managed compute. The right option depends on the model type, billing model, hosting requirements, and operational needs.
Before deploying, confirm:
- expected traffic,
- token usage,
- budget limits,
- latency needs,
- region support,
- whether the model supports your required deployment option.
Step 6: Use Prompt Engineering First
Prompt engineering is usually the fastest improvement path.
Use the system message to define:
- role,
- tone,
- answer format,
- safety boundaries,
- source expectations,
- what the model should avoid.
Use user prompts to provide the specific task and context. Conversation history can help with follow-up questions, but it can also increase token cost and introduce stale assumptions. Keep history useful, not endless.
Step 7: Add RAG When the App Needs Your Data
Use retrieval augmented generation when the model needs information that is not reliably available in its training data.
RAG works by:
- Receiving the user input.
- Retrieving relevant grounding data.
- Adding that data to the prompt.
- Generating a contextual response.
In Microsoft Foundry, a common Azure pattern is to use Azure AI Search for indexing and retrieval, then include search connection details in the chat client configuration.
RAG is a strong fit for:
- product catalogs,
- policy documents,
- support knowledge bases,
- internal procedures,
- research collections,
- customer-facing FAQ assistants.
Step 8: Fine-Tune Only When Behavior Needs to Change
Fine-tuning is additional training on examples of prompts and responses. It is useful when you want the model to act in a more consistent way.
Use fine-tuning for:
- consistent tone,
- strict response style,
- repeatable formatting,
- domain-specific interaction patterns.
Do not use fine-tuning just to add facts. If the model needs current business knowledge, RAG is usually the better first option.
Fine-tuning data should be carefully prepared. The training material shows a JSON Lines pattern with messages for system, user, and assistant roles. Each example should teach the behavior you want repeated.
Step 9: Build the App with the Microsoft Foundry SDK
The Microsoft Foundry SDK helps developers connect securely to a project and use project connections.
The common development flow is:
- Create or select a Foundry project.
- Deploy a model.
- Use the project endpoint to create an authenticated project client.
- Retrieve project connections.
- Create a chat client for the deployed model.
- Call the model through the appropriate API.
- Add grounding, safety, and evaluation logic around the model call.
Use project connections so the app can access resources such as Azure OpenAI, Azure AI Search, Azure AI Services, and agent services without hardcoding credentials into application logic.
Step 10: Add Responsible AI Controls
Responsible AI should be designed before launch, not patched in afterward.
A practical responsible AI plan includes four actions:
- Map possible harms.
- Measure whether those harms appear in outputs.
- Mitigate risk at multiple layers.
- Manage the app with an operational readiness plan.
Useful mitigation layers include:
- model choice,
- system message,
- grounding data,
- content filters,
- prompt shields,
- user experience design,
- phased rollout,
- monitoring and feedback.
Step 11: Use Content Filters and Prompt Shields
Content filters help reduce harmful input and output. The training deck covers filter categories such as violence, hate and unfairness, sexual content, and self-harm.
Prompt shields help defend against jailbreak-style attempts to bypass safety rules.
For production apps, define:
- which content categories should be blocked,
- whether default filters are enough,
- whether custom filters are needed,
- what users see when content is blocked,
- how blocked events are logged and reviewed.
Step 12: Evaluate Before Release
Evaluation is what turns a demo into a deployable system.
Manual evaluation is useful when a human needs to judge whether responses are acceptable for specific prompts. Automated evaluation is useful when you need repeatable testing across a dataset, model, or prompt flow.
Evaluate for:
- coherence,
- fluency,
- similarity to expected answers,
- groundedness,
- relevance,
- safety risk,
- latency,
- cost.
If the app uses grounding data, groundedness and relevance are especially important. A response can sound fluent while still being poorly supported by the retrieved context.
Common Mistakes to Avoid
- Choosing a model before defining the workflow.
- Treating benchmark scores as a substitute for real use-case testing.
- Using fine-tuning when RAG would solve the real problem.
- Putting too much responsibility into the prompt and not enough into architecture.
- Skipping content filters because the app is still a pilot.
- Launching without manual and automated evaluation.
- Forgetting to monitor cost, latency, and user feedback after release.
Simple Azure Generative AI Build Checklist
Use this checklist before you move from prototype to pilot:
- The use case is clear.
- The risk level is documented.
- The model choice is justified.
- The model deployment option matches cost and performance needs.
- The system message defines behavior and boundaries.
- RAG is used when private or current data is required.
- Fine-tuning is used only when consistency of behavior requires it.
- Project connections are used instead of hardcoded credentials.
- Content filters and prompt shields are configured.
- Manual evaluation covers realistic prompts.
- Automated evaluation covers quality and safety metrics.
- The release plan includes monitoring and feedback.
Related AI Charcha Reading
- How to Choose the Right AI Model
- How to Build an AI Tool Stack for Small Teams
- How to Evaluate AI Tool Privacy Before Your Team Uses It
- Vector Databases and RAG in 2026
- AI Workflow Auditability Framework for 2026
FAQ
Is Microsoft Foundry only for chat apps?
No. Microsoft Foundry can support chat apps, RAG apps, prompt flows, model evaluation, content safety, agents, and applications that connect multiple Azure AI resources.
Should I use RAG or fine-tuning for my Azure AI app?
Use RAG when the app needs accurate answers from your own data. Use fine-tuning when the app needs a more consistent response style, tone, or behavior.
What is groundedness in generative AI?
Groundedness means the response is aligned with the context or data provided to the model. It is important when the app must answer using specific business documents, product data, policies, or search results.
How do I know if a generative AI app is ready for production?
It is closer to production-ready when the model choice, prompt design, grounding data, safety controls, evaluation results, deployment plan, monitoring process, and ownership model are all documented and tested.
Bottom Line
Microsoft Foundry can help teams build generative AI apps on Azure, but the best results come from disciplined design. Start with the workflow, choose the model carefully, use RAG for context, reserve fine-tuning for behavior, add responsible AI controls, and evaluate the system before scaling it.