第十篇：《健康检查与自愈：探针（Probe）机制》-程序员充电站

Kubernetes 能够自动重启失败的容器、替换无响应的 Pod，这一切依赖探针（Probe）机制。通过配置存活探针（livenessProbe）、就绪探针（readinessProbe）和启动探针（startupProbe），你可以精细控制容器健康状态、流量接入时机以及慢启动容器的保护。本文详细讲解三种探针的用法、探测方式、常见场景以及最佳实践。

一、探针的作用

典型场景：

应用启动需要较长时间（如 Java 应用加载类），用 startupProbe 保护。

应用偶尔死锁，用 livenessProbe 自动重启。

应用启动后需要预热（加载缓存），用 readinessProbe 暂不接收流量，直到就绪。

二、探测方式
三种探针都支持以下检测方式：

三、配置参数详解
所有探针共享以下参数（示例基于 livenessProbe）：

livenessProbe:httpGet:path:/healthport:8080httpHeaders:-name:Custom-Headervalue:AwesomeinitialDelaySeconds:30# 容器启动后延迟多久开始探测periodSeconds:10# 探测周期（秒）timeoutSeconds:5# 探测超时时间successThreshold:1# 从失败到成功所需连续成功次数failureThreshold:3# 连续失败多少次后判定为失败（重启或摘除）

四、livenessProbe：自动重启不健康的容器
4.1 HTTP GET 示例

apiVersion:v1kind:Podmetadata:name:nginx-livenessspec:containers:-name:nginximage:nginxlivenessProbe:httpGet:path:/port:80initialDelaySeconds:10periodSeconds:5

如果 Nginx 的 / 路径返回非 2xx/3xx，kubelet 将重启容器。

4.2 exec 示例（自定义检查）

livenessProbe:exec:command:-/bin/sh--c-"ps aux | grep myapp || exit 1"initialDelaySeconds:5periodSeconds:10

五、readinessProbe：控制流量接入
Pod 启动后，Service 的 Endpoints 会立即包含该 Pod 的 IP。如果应用需要加载大量配置或预热，会导致请求超时。readinessProbe 可以延迟 Pod 被加入 Endpoints，直到探测成功。

5.1 示例

readinessProbe:httpGet:path:/readyport:8080initialDelaySeconds:5periodSeconds:3failureThreshold:2

只有当 /ready 返回 200 后，该 Pod 才会被添加到 Service 的后端。

5.2 应用场景
应用启动后需要加载数据到内存。

依赖外部服务（如数据库），待连接成功后再接流量。

高并发下需要预热连接池。

六、startupProbe：保护慢启动容器
对于启动时间非常长的应用（如 Java Spring Boot 可能需要 1 分钟以上），普通的 livenessProbe 会在 initialDelaySeconds 后开始探测，如果还没启动完成就会被重启。解决方案是使用 startupProbe。

6.1 示例

startupProbe:httpGet:path:/healthport:8080failureThreshold:30periodSeconds:10

配置 failureThreshold=30，periodSeconds=10，允许最多 5 分钟启动时间。

在 startupProbe 成功之前，livenessProbe 和 readinessProbe 不会介入。

如果 startupProbe 失败（连续 30 次），容器会被重启。

6.2 典型配置组合
对于慢启动应用，推荐同时设置三种探针：

startupProbe:httpGet:path:/healthport:8080failureThreshold:60periodSeconds:2livenessProbe:httpGet:path:/healthport:8080initialDelaySeconds:0# 由 startupProbe 接管periodSeconds:10failureThreshold:3readinessProbe:httpGet:path:/readyport:8080initialDelaySeconds:0periodSeconds:5failureThreshold:2

七、探针的最佳实践
为所有生产容器配置 readinessProbe，确保流量不会打到未就绪的 Pod。

为关键服务配置 livenessProbe，自动恢复死锁或崩溃的容器。

避免将检查逻辑过于复杂（如调用下游服务），避免雪崩。

initialDelaySeconds 设置合理值：太小可能导致容器未启动就失败，太大则会延迟故障检测。

failureThreshold 结合业务容忍度：对于网络波动大的环境，设置较高的阈值。

timeoutSeconds 应大于检查本身执行时间。

readinessProbe 和 livenessProbe 使用不同的端点：例如 /ready 检查依赖（数据库、缓存），/health 只检查进程存活。避免因依赖短暂不可用而重启容器（readiness 已足够）。

八、探针的排错
查看 Pod 状态：kubectl describe pod 中的 Events 会显示探针失败事件。

查看探针日志：容器内应用日志中是否有健康检查请求（如 HTTP GET）。

手动模拟：使用 kubectl exec 进入容器，手动执行探针命令或 curl 健康端点。

九、综合示例：一个带有完整探针的 Deployment

apiVersion:apps/v1kind:Deploymentmetadata:name:myappspec:replicas:3selector:matchLabels:app:myapptemplate:metadata:labels:app:myappspec:containers:-name:myappimage:myapp:latestports:-containerPort:8080env:-name:DEPENDENCY_URLvalue:"http://external-service"startupProbe:httpGet:path:/healthport:8080failureThreshold:60periodSeconds:2livenessProbe:httpGet:path:/healthport:8080periodSeconds:10failureThreshold:3readinessProbe:httpGet:path:/readyport:8080initialDelaySeconds:5periodSeconds:5failureThreshold:2

十、小结
探针是 Kubernetes 自愈能力的关键。livenessProbe 决定是否重启容器，readinessProbe 决定是否接收流量，startupProbe 保护慢启动应用。合理配置这些探针，可以显著提高应用的稳定性和用户体验。

第十篇：《健康检查与自愈：探针（Probe）机制》

深度解析AzurLaneAutoScript：重构碧蓝航线自动化生态的技术架构

1MRK002122-ABR01接口模块

多维聚合中的立方体原生操作：从pandas到xarray的范式升级

3-流形伪同构与基本群无限性的拓扑研究

AI开发者生产力悖论：当智能工具放大协作熵值

生成式AI如何重构推荐系统：开发者实战指南