news 2026/6/23 11:20:21

HG_REPMGR autofailvoer自动故障转移

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
HG_REPMGR autofailvoer自动故障转移

文章目录

  • 文档用途
  • 详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继

续对外提供服务。示例如下。

  1. 配置 postgresql.replication.conf 文件(所有节点)

在上述 postgresql.replication.conf 的基础上,添加如下参数:

shared_preload_libraries='repmgr'

或者

altersystemsetshared_preload_libraries=pg_pathman,timescaledb,repmgr;

重启数据库:

pg_ctl restart
  1. 配置 hg_repmgr.conf(所有节点)

在现有的 hg_repmgr.conf 文件中添加如下参数:

failover=automatic promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n'

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)

  1. 开启 repmgrd 进程(所有节点)
repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[highgo@dbrsconf]$ repmgrd-d-p/tmp/hg_repmgrd.pid[2019-05-0614:02:42][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:02:42][INFO]connectingtodatabase""[2019-05-0614:02:43][ERROR]repmgr extensionnotfoundonthis node[2019-05-0614:02:43][DETAIL]repmgr extensionisavailable butnotinstalledindatabase"highgo"[2019-05-0614:02:43][HINT]checkthat this nodeispartofa repmgr cluster[highgo@dbrsconf]$ highgo=# \cYou are now connectedtodatabase"highgo"asuser"highgo".createextension repmgr;[highgo@dbrsconf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:21][INFO]connectingtodatabase"host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrsconf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]startingmonitoringofnode"dbrs"(ID:1)[2019-05-0614:21:21][NOTICE]monitoring clusterprimary"dbrs"(node ID:1)[highgo@dbrs2conf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:50][INFO]connectingtodatabase"host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrs2conf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]startingmonitoringofnode"dbrs2"(ID:2)[2019-05-0614:21:50][INFO]monitoring connectiontoupstream node"dbrs"(node ID:1)[highgo@dbrsconf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrsconf]$[highgo@dbrs2conf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrs2conf]$

提示:这个后台进程,每次重启服务器,都要手动启动吗?

开发回复:目前是,后期会修改为自动

查看集群状态

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|running|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$

模拟主节点故障

1)在 node1 上关闭数据库

pg_ctl stop

2)在 node2 上查看集群状态

[highgo@dbrs2conf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|-failed||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2WARNING:followingissues were detected-unabletoconnecttonode"dbrs"(ID:1)[highgo@dbrs2conf]$

此时 node2 已经提升为 primary

日志

[highgo@dbrs2conf]$[2019-05-0614:24:14][WARNING]unabletoconnecttoupstream node"dbrs"(node ID:1)[2019-05-0614:24:14][INFO]checking stateofnode1,1of6attempts[2019-05-0614:24:14][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:24][INFO]checking stateofnode1,2of6attempts[2019-05-0614:24:24][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:34][INFO]checking stateofnode1,3of6attempts[2019-05-0614:24:34][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:44][INFO]checking stateofnode1,4of6attempts[2019-05-0614:24:44][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:54][INFO]checking stateofnode1,5of6attempts[2019-05-0614:24:54][INFO]sleeping10seconds untilnextreconnection attempt[highgo@dbrs2conf]$[2019-05-0614:25:04][INFO]checking stateofnode1,6of6attempts[2019-05-0614:25:04][WARNING]unabletoreconnecttonode1after6attempts[2019-05-0614:25:04][NOTICE]this nodeisthe only available candidateandwill now promote itself[2019-05-0614:25:04][INFO]promote_commandis:"repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"NOTICE: promoting standbytoprimaryDETAIL: promoting server"dbrs2"(ID:2)using"/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote"DETAIL: waiting upto60seconds(parameter"promote_check_timeout")forpromotiontocomplete NOTICE: STANDBY PROMOTE successful DETAIL: server"dbrs2"(ID:2)was successfully promotedtoprimary[2019-05-0614:25:10][INFO]switchingtoprimarymonitoringmode[2019-05-0614:25:10][NOTICE]monitoring clusterprimary"dbrs2"(node ID:2)
  1. 当 node1 的故障恢复之后,可重新加入集群
[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+----------------------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|!runningasprimary|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2

1)重新加入集群 (在故障节点上执行,host指定新的主节点,重新加入后作为standby节点。想想pg_rewind)

repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verbose

注意:执行该命令前应关闭 node1 的 HGDB。

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verboseNOTICE:usingprovided configurationfile"/opt/highgo/5.6.1/conf/hg_repmgr.conf"INFO: prerequisitesforusingpg_rewind are met INFO:0files copiedto"/tmp/repmgr-config-archive-dbrs"NOTICE: executing pg_rewind NOTICE:0files copiedto/opt/highgo/5.6.1/dataINFO: directory"/tmp/repmgr-config-archive-dbrs"deleted INFO: deleting"recovery.done"NOTICE: setting node1's primary to node 2 NOTICE: starting server using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data'start" INFO: demotedprimaryispingable INFO: node1has attachedtoits upstream node NOTICE: NODE REJOIN successful DETAIL: node1isnow attachedtonode2[highgo@dbrsconf]$

2)查看集群状态 repmgr cluster show

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|standby|running|dbrs2|default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/20 14:39:11

收藏这份AI客服构建指南:有赞从0到1的实践经验与思考

有赞分享了AI客服系统从0到1的完整实践历程。项目始于黑客马拉松,初期选用Dify平台快速验证,后采用混合架构应对性能挑战。文章详细阐述了模型选择、Workflow设计、上下文管理、知识工程等关键技术环节,并分享了评测优化和协作管理的经验。核…

作者头像 李华
网站建设 2026/6/20 21:18:23

国外学术论文怎么找:实用检索技巧与资源平台推荐

刚开始做科研的时候,我一直以为: 文献检索就是在知网、Google Scholar 里反复换关键词。 直到后来才意识到,真正消耗精力的不是“搜不到”,而是—— 你根本不知道最近这个领域发生了什么。 生成式 AI 出现之后,学术检…

作者头像 李华
网站建设 2026/6/20 11:18:25

卡尔曼滤波做轨迹跟踪 鲁棒卡尔曼滤波做野值剔除后的预测 扩展卡尔曼滤波对GPS数据进行状态估计滤波

卡尔曼滤波做轨迹跟踪 鲁棒卡尔曼滤波做野值剔除后的预测 扩展卡尔曼滤波对GPS数据进行状态估计滤波 轨迹跟踪这活儿听起来玄乎,其实咱们每天都在用——手机导航里那个蓝色小圆点,背后八成藏着卡尔曼滤波的数学魔法。今天咱们扯点实在的,用P…

作者头像 李华
网站建设 2026/6/13 20:34:46

电鱼智能 RK3399 赋能配送机器人的多屏交互与人脸识别支付

什么是 电鱼智能 RK3399?电鱼智能 RK3399 是一款高性能、高扩展性的六核(2A72 4A53)嵌入式核心板。虽然发布已有几年,但它在多媒体处理方面依然表现强劲。它支持 双路 MIPI/LVDS/HDMI/eDP 显示接口,且内置了双路 ISP&…

作者头像 李华
网站建设 2026/6/16 2:31:12

python基于vue的教室预约管理平台的设计与实现django flask pycharm

目录教室预约管理平台的设计与实现摘要开发技术路线相关技术介绍核心代码参考示例结论源码lw获取/同行可拿货,招校园代理 :文章底部获取博主联系方式!教室预约管理平台的设计与实现摘要 基于Python的教室预约管理平台采用前后端分离架构,前端…

作者头像 李华