AI智能客服系统源码解析：从零搭建高可用对话引擎-程序员充电站

背景痛点：传统客服系统为何总被吐槽“答非所问”

过去两年，我帮三家客户从“关键字+正则”的老旧客服升级到 AI 方案，总结下来最痛的点无非三条：

意图识别准确率低于 75%，一旦用户口语化或带倒装句，规则引擎直接“宕机”。
高并发场景下，会话状态放在 JVM 内存，重启即丢，用户重连后被迫“从头再来”。
业务高峰时 QPS 涨到 150 以上，Redis 连接池被打爆，接口 RT 从 300 ms 飙到 2 s，客服页面卡成 PPT。

这些坑逼着我们用“NLU+对话管理+服务治理”的全新思路重写一套可水平扩展的源码级方案。下面把踩坑、调优、压测的完整过程拆开聊。

架构对比：规则、检索、生成三条路线怎么选

先把结论放在前面：

规则引擎——冷启动快，适合 FAQ 固定且人力不足的场景；
检索式——需要历史日志，可控性高，是“工业界最稳”方案；
生成式——体验最像人，但不可控、难合规，适合 C 端尝鲜。

决策树如下，直接保存到本地 PPT 就能汇报。

核心实现一：Python 端 BERT 意图分类

标注格式统一为intent<TAB>query，保证后续可换任何模型。
用transformers官方脚本微调，学习率 2e-5，epoch=3 即可在 2 W 条数据上达到 94%+ 准确率。
训练完导出 ONNX，配合fastapi做异步推理，GPU 机器单卡 QPS≈120，满足中小业务。

关键代码（含异常与日志）：

# intent_api.py import logging, time from fastapi import FastAPI, HTTPException from pydantic import BaseModel from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch, onnxruntime as ort logging.basicConfig(level=logging.INFO) logger = logging.getLogger("intent") class PredictRequest(BaseModel): query: str app = FastAPI() tok = AutoTokenizer.from_pretrained("bert-base-chinese") sess = ort.InferenceSession("bert_intent.onnx") @app.post("/predict") def predict(req: PredictRequest): try: t0 = time.time() inputs = tok(req.query, return_tensors="pt", truncation=True, max_length=32) logits = sess.run(None, {"input_ids": inputs["input_ids"].numpy(), "attention_mask": inputs["attention_mask"].numpy()})[0] prob = torch.softmax(torch.tensor(logits), dim=-1) intent_id = int(prob.argmax(-1)) logger.info(f"query={req.query}, intent={intent_id}, cost={time.time()-t0:.3f}s") return {"intent_id": intent_id, "confidence": float(prob.max())} except Exception as e: logger.exception("predict error") raise HTTPException(status_code=500, detail=str(e))

核心实现二：Java 对话状态机（线程安全版）

每会话一个StateMachine实例，放在ConcurrentHashMap<String, StateMachine>中，实现内存级隔离。
状态节点用enum定义，转移条件通过Guava EventBus解耦，方便后续插拔新业务。
引入ReentrantLock做方法级锁，保证多轮填槽（Slot Filling）时的原子性。

// DialogService.java @Slf4j @Service public class DialogService { private final ConcurrentHashMap<String, StateMachine> smMap = new ConcurrentHashMap<>(); private final RedisTemplate<String, Object> redis; public void handle(String uid, String query) { StateMachine sm = smMap.computeIfAbsent(uid, u -> { StateMachine fresh = new StateMachine(u); fresh.start(); // 初始节点 return fresh; }); trywl.lockInterruptibly(); try { sm.sendEvent(new QueryEvent(query)); // 持久化最新状态 redis.opsForValue().set("sm:" + uid, sm.getCurrentState(), Duration.ofMinutes(30)); } catch (Exception e) { log.error("statemachine error uid={}", uid, e); throw new BizException("dialog error"); } finally { sm.unlock(); } } }

生产考量：压测、Redis 与敏感词

压测指标
- 目标 QPS>200，4 核 8 G 容器可顶住，但前提是：
  - Redis 连接池maxTotal=200，maxIdle=50，timeout=200ms；
  - 开启tcp-keepalive，防止 LB 静默断开。
- 使用 Gatling 脚本连续跑 5 min，99th RT 稳定在 280 ms 以内视为合格。
敏感词过滤
- 采用 AC 自动机，一次性构建 6 K 敏感词，匹配复杂度 O(n)。
- 对命中词做“*”替换，并异步上报审计日志，避免阻塞主流程。

// AhoCorasick.java public class AhoCorasick { private final TrieNode root = new TrieNode(); public void build(List<String> words) { ... } public List<Hit> search(String text) { ... } }

避坑指南：超时、熔断与日志

对话超时
会话 30 min 无交互即回收，但注意：
- 清除smMap前，先redis.del("sm:" + uid)，否则重启会“借尸还魂”。
- 对前端返回{"code": 408}，由客户端决定是否重拉历史。
第三方 API 熔断
使用 Resilience4j，配置 50% 错误率或 500 ms 响应时间即打开，冷却 30 s 后半开探测。
降级策略：返回静态文案“功能维护中”，保证核心链路可用。

CircuitCircuit circuit = CircuitCircuit.ofDefaults("nlpAPI"); Supplier<String> decorated = CircuitCircuit .decorateSupplier(circuit, () -> restTemplate.getForObject(url, String.class)); String resp = Try.ofSupplier(decorated) .recover(throwable -> "功能维护中").get();