A chatbot that knows your business — and cites its sources
Custom RAG assistants grounded in your docs and data, with citations, streaming, and guardrails. Not a generic widget — an assistant that answers from your knowledge, not the open internet.
A bare ChatGPT widget hallucinates about your product and frustrates customers. A grounded RAG assistant answers from your actual documentation, links to the source, escalates when it doesn't know, and gets smarter as you add content. That's the difference between a gimmick and a support deflection tool.
Cited
Every answer links back to its source passage
<1s
Time-to-first-token targeted on streamed replies
Deflection
Built to resolve common questions before they reach support
0
PII leaked — filtered inputs and scoped retrieval
Trusted by founders & teams in
Everything included in every engagement
No upsells. No surprise change orders. One scope, one price.
RAG grounded in your content
Ingestion pipeline for docs, help center, PDFs, and database content, with chunking, embeddings, hybrid search, and reranking so answers come from your knowledge — not the model's guesses.
Citations & honest 'I don't know'
Every answer links to the source passage, and the bot is tuned to escalate or admit uncertainty instead of confidently making things up.
Streaming chat UI
A fast, on-brand chat interface with token-by-token streaming, markdown, code blocks, and suggested follow-ups — embeddable as a widget or built into your app.
Actions & handoff
Beyond answers: book a demo, create a ticket, look up an order, or hand off to a human agent with full conversation context when needed.
Safety & rate limiting
Prompt-injection defenses, content filtering, per-user rate limits, and PII handling so your chat endpoint isn't an abuse vector or a data-leak risk.
Analytics & continuous tuning
Conversation logs, unanswered-question reports, thumbs-up/down feedback, and a loop to fill content gaps — the bot improves every week.
The tools I actually use in production
Modern, battle-tested, and chosen for fit — not hype.
Models
- GPT-4o
- Claude
- Haiku
- Embeddings
RAG
- pgvector
- Pinecone
- Weaviate
- Cohere Rerank
UI
- Next.js
- Vercel AI SDK
- Streaming SSE
- shadcn/ui
Ops
- LangSmith
- Helicone
- Resend
- PostHog
How we'll work together
Predictable, written-down, no surprises.
- 01
Content & use-case scoping
What should it answer, from which sources, and what should it never do. Define escalation and tone.
- 02
Ingestion + retrieval
Build the pipeline, tune chunking and reranking, and validate answer quality against a question set.
- 03
UI + guardrails
Streaming chat interface, citations, rate limits, and safety filters wired in.
- 04
Launch + improve
Ship the widget, watch the logs, close content gaps, and tune weekly.
Pricing that matches the work
Starting prices. Final quote in writing after a 30-minute scoping call.
Starter Bot
Docs/FAQ assistant
$2,800starting
- RAG over one content source
- Streaming widget + citations
- Delivered in 1–2 weeks
Support Assistant
Real support deflection
$7,500starting
- Multi-source RAG + actions
- Human handoff + ticketing
- Analytics + safety guardrails
Retainer
Ongoing tuning & content
$2,500/mostarting
- Weekly answer-quality tuning
- Content-gap closing
- Model + cost optimisation
Me vs. an agency vs. hiring in-house
Three ways to get this built. Here's the honest comparison.
Best value Solo Dev (me) $80–$120 /hr or fixed | Agency $150–$300 /hr blended | In-house hire $80–$120K /yr + benefits | |
|---|---|---|---|
| Start date | 1–2 weeks from quote | 4–8 weeks onboarding | 8–16 weeks to hire |
| Who writes the code | Senior dev — every single line | Junior assigned to your account | Whoever you manage to hire |
| Communication | Direct — you talk to who codes | Via account manager first | Direct, but management overhead |
| Flexibility | Scale up or down any time | Locked to contract length | Fixed headcount, hard to change |
| Code ownership | 100% yours, full handover docs | Depends on contract terms | Yours, but bus factor risk |
| Risk | Weekly demos, fixed scope | Scope creep & handoff gaps | Wrong hire = months lost |
Questions I get asked first
How is this different from just embedding ChatGPT?+
A raw ChatGPT embed answers from the model's general training and hallucinates about your specifics. A RAG assistant retrieves your actual documentation before answering, cites the source, and stays on-topic. For anything customer-facing, grounding is non-negotiable.
Where does it get its knowledge?+
From sources you control — help center, docs, PDFs, knowledge base, even database records. When you update the content, the bot updates. It does not invent answers from the open web unless you explicitly want that.
Can it do more than answer questions?+
Yes — it can take actions like creating a support ticket, looking up an order, or booking a demo, and hand off to a human with full context when needed. For heavier multi-step automation, that crosses into agent territory — see /services/ai-agent-development.
What does it cost to run?+
Usually far less than people expect. I default to smaller models like Claude Haiku or GPT-4o-mini for most turns, cache aggressively, and reserve premium models for hard questions. I'll give you a realistic monthly token estimate up front.
Let's scope your project
Tell me what you're building. I'll reply with a written estimate within 24 hours — no sales call required.
Related services
Often paired with ai chatbot development.
AI Integration
OpenAI, Anthropic Claude, and open-source LLMs wired into your app with RAG, structured outputs, evals, and the discipline that keeps it cheap and reliable at scale.
AI Agent Development
Tool-using, multi-step AI agents wired into your real systems — with the evals, guardrails, and human-in-the-loop checkpoints that make autonomy safe in production.
Web Development
From the database schema to the deployed Next.js frontend, I ship modern web apps designed to rank, convert, and scale. One engineer, full ownership.
API Development
Well-versioned, well-documented REST or GraphQL APIs with auth, rate limiting, and webhooks. Built to be consumed by partners and customers — not only your own frontend.