How to choose an AI implementation vendor

In short

Choose an AI implementation vendor by studying how they think when the demo is over. The right team asks about workflows, data ownership, evals, integrations, approval rules, rollout, and support. The wrong team promises automation in week one and avoids the uncomfortable question: what exactly changes in the business if this works?

Gartner's warning that many GenAI projects get abandoned after proof of concept is useful because it points to the real selection risk. You are not buying a prototype. You are buying the ability to turn a prototype into a measured workflow.

This is a buyer guide, not a procurement checklist. Use it before the first call, during vendor interviews, and when comparing proposals that look similar on price but differ sharply in risk.

Start with the job, not the vendor category

There are three common vendor types.

A platform integrator configures existing tools. This can be right for a narrow support bot, CRM add-on, or internal search project where the company accepts the platform's limits.

A custom AI team builds around your workflow. This is usually better when the agent must read several systems, call tools, route approvals, respect permissions, and deal with messy user language. It costs more because it handles more responsibility.

A strategy consultancy helps choose use cases, estimate value, and coordinate change. This can be useful for a portfolio of projects, but strategy without delivery can produce beautiful slides and no working system.

The mistake is choosing the type before defining the job. A simple FAQ does not need a custom agent. A sales assistant that touches CRM, call transcripts, discount rules, and finance approvals probably does.

The first interview should feel like discovery

A serious vendor should slow you down a little. Not in a bureaucratic way. In a useful way.

They should ask for real examples. They should ask who owns the workflow. They should ask what the agent is allowed to do without approval. They should ask what failure costs. They should ask where the source of truth lives. If the use case is customer-facing, they should ask about escalation and tone. If the use case is internal, they should ask about access rights and source freshness.

Be cautious if the first call jumps straight to model choice. Models matter, but in implementation they are one part of the system. The harder questions are usually about process design, retrieval, logging, evaluation, and maintenance.

Questions that expose the real capability

Ask what data they need before they can estimate the project. A good answer includes real tickets, chats, documents, CRM exports, edge cases, current handling rules, and an owner who can judge outputs. A weak answer is "we can start with your website" for a workflow that clearly depends on internal data.

Ask how they will evaluate quality. A mature answer includes a test set with expected behavior, failure categories, regressions, human review, and regular checks after launch. The vendor may reference evals for AI projects or OpenAI's evaluation guidance, but the important part is whether they can explain how evals apply to your workflow.

Ask what the agent will be allowed to do. The answer should separate draft, recommendation, read-only lookup, write action, and irreversible action. Sensitive actions need approval.

Ask who owns prompts, code, logs, and documentation. A good proposal has a repository or handoff model, change history, environment separation, and support rules. A black box is not a production plan.

Proposal comparison: the useful table

Do not compare proposals only by price. Compare what risk is included.

A weak proposal lists features: chatbot, admin panel, integration, analytics.

A stronger proposal describes the workflow: intake, retrieval, draft, approval, CRM update, logs, evals, rollout.

The best proposal tells you what will not be automated in the first release. That sounds less exciting, but it is usually a sign of maturity. AI systems fail when they get too much authority before they have evidence.

For a first AI pilot in 30 days, the proposal should name the workflow, channel, data sources, integration boundary, success metric, and scale decision. If those pieces are vague, the pilot will become a demo review.

Red flags

The vendor promises ROI without seeing baseline numbers. They show one generic chatbot demo for every industry. They cannot explain how they maintain a knowledge base. They do not mention logs. They cannot say what should remain human-reviewed. They avoid security questions by saying the model provider is secure. They want to connect every system in the first month. They talk about AI transformation but not about who changes the customer status, approves the answer, or fixes a bad source.

One red flag is not fatal. A pattern is.

What good delivery looks like

Good delivery has a rhythm.

First, discovery turns a broad wish into a narrow workflow. Second, sample data becomes a test set. Third, the team builds a controlled version with limited integrations. Fourth, real users try it in review mode. Fifth, the vendor and business owner decide whether to stop, extend, or scale.

This rhythm is similar whether the project is support, sales, HR, finance, or documents. The details change, but the discipline does not.

For agentic systems, a good vendor will also explain tool boundaries. OpenAI's tool-calling docs and LangChain's human-in-the-loop patterns exist for a reason: once an AI system can call tools, approvals and state become part of product design.

Contract points worth making explicit

Spell out data access, retention, log visibility, environments, ownership of prompts and code, support windows, knowledge-base maintenance, integration-change handling, human approvals, and acceptance criteria.

If the vendor resists these details, they may be selling a prototype while you think you are buying production.

Platform or custom?

Use a platform when the task is well-bounded, low-risk, and close to the platform's native behavior. A scripted lead form, simple FAQ, appointment intake, survey, or internal document search may not need custom development.

Custom is safer when the workflow crosses systems, involves permissions, needs source-aware answers, or requires a human approval loop. Examples include a sales agent that reads CRM and proposes next steps, a finance assistant that checks invoices, an HR agent that screens candidates, or a support assistant that retrieves policy and escalates exceptions.

Custom does not mean building everything from scratch. It means the architecture follows the workflow rather than forcing the workflow into a generic bot builder.

FAQ

Should we choose the vendor with the best model access?

No. Model access is widely available. Choose the team that can design the workflow, data layer, evaluation, and rollout.

What proof should we ask for?

Ask for similar workflow examples, not just industry logos. A vendor that has handled messy documents may be better for finance than a vendor with a flashy customer-service bot.

Can we start small without locking ourselves in?

Yes. Require structured logs, exported test cases, documented prompts, and a clear integration map. That keeps the pilot useful even if you change vendors.

Where should we begin internally?

Before vendor calls, prepare the materials in what to prepare before implementing AI and estimate the project economics with why AI projects do not pay off.