MHA failover NON-GTID 专题
这里以masterha_master_switch为背景详解各种可能遇到的场景
假定环境(经典三节点)
host_1(host_1:3306)(current master) +--host_2(host_2:3306slave[candidate master]) +--host_3(host_3:3306etl)
一、Master : MySQL down
1.1 etl 延迟8小时
配置文件中加上no_check_delay=0 即可忽略报错
1.2 slave(候选master)比etl还要落后更多
- 1.2.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了
### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000002 | 1920 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000002 Read_Master_Log_Pos: 150 Exec_Master_Log_Pos: 150 * etl (other slave) Master_Log_File: host_2.000002 Read_Master_Log_Pos: 1035 Exec_Master_Log_Pos: 1035 ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Mon Nov 13 16:24:03 2017 - [info] MHA::MasterFailover version 0.56. Mon Nov 13 16:24:03 2017 - [info] Starting master failover. Mon Nov 13 16:24:03 2017 - [info] Mon Nov 13 16:24:03 2017 - [info] * Phase 1: Configuration Check Phase.. Mon Nov 13 16:24:03 2017 - [info] Mon Nov 13 16:24:03 2017 - [info] HealthCheck: SSH to host_2 is reachable. Mon Nov 13 16:24:03 2017 - [info] Binlog server host_2 is reachable. Mon Nov 13 16:24:06 2017 - [info] HealthCheck: SSH to host_3 is reachable. Mon Nov 13 16:24:06 2017 - [info] Binlog server host_3 is reachable. Mon Nov 13 16:24:06 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Mon Nov 13 16:24:06 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Mon Nov 13 16:24:06 2017 - [info] GTID failover mode = 0 Mon Nov 13 16:24:06 2017 - [info] Dead Servers: Mon Nov 13 16:24:06 2017 - [info] host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Checking master reachability via MySQL(double check)... Mon Nov 13 16:24:06 2017 - [info] ok. Mon Nov 13 16:24:06 2017 - [info] Alive Servers: Mon Nov 13 16:24:06 2017 - [info] host_1(host_1:3306) Mon Nov 13 16:24:06 2017 - [info] host_3(host_3:3306) Mon Nov 13 16:24:06 2017 - [info] Alive Slaves: Mon Nov 13 16:24:06 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:06 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:06 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:06 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:06 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Mon Nov 13 16:24:06 2017 - [info] done. Mon Nov 13 16:24:06 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Mon Nov 13 16:24:06 2017 - [info] done. Mon Nov 13 16:24:06 2017 - [info] Starting Non-GTID based failover. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] ** Phase 1: Configuration Check Phase completed. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] HealthCheck: SSH to host_2 is reachable. Mon Nov 13 16:24:06 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Mon Nov 13 16:24:06 2017 - [info] Executing master IP deactivation script: Mon Nov 13 16:24:06 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-13 16:24:07-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.2M=0s 2017-11-13 16:24:11 (11.2 MB/s) - 已写入标准输出 [38] Mon Nov 13 16:24:11 2017 - [info] done. Mon Nov 13 16:24:11 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Nov 13 16:24:11 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3: Master Recovery Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] The latest binary log file/position on all slaves is host_2.000002:1035 Mon Nov 13 16:24:11 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon Nov 13 16:24:11 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:11 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:11 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:11 2017 - [info] The oldest binary log file/position on all slaves is host_2.000002:150 Mon Nov 13 16:24:11 2017 - [info] Oldest slaves: Mon Nov 13 16:24:11 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:11 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:11 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] Fetching dead master's binary logs.. Mon Nov 13 16:24:11 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000002 --start_pos=1035 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000002 pos 1035 to host_2.000002 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000002 position 1035 to tail(1939).. ok. Concat succeeded. Mon Nov 13 16:24:12 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:12 2017 - [info] HealthCheck: SSH to host_1 is reachable. Mon Nov 13 16:24:12 2017 - [info] HealthCheck: SSH to host_3 is reachable. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] * Phase 3.3: Determining New Master Phase.. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Mon Nov 13 16:24:13 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Mon Nov 13 16:24:13 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000002 --latest_rmlp=1035 --target_mlf=host_2.000002 --target_rmlp=150 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171113162403 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000005 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000005 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000005, start_pos:269. Target relay log FOUND! Mon Nov 13 16:24:13 2017 - [info] OK. host_3 has all relay logs. Mon Nov 13 16:24:13 2017 - [info] Searching new master from slaves.. Mon Nov 13 16:24:13 2017 - [info] Candidate masters from the configuration file: Mon Nov 13 16:24:13 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:13 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:13 2017 - [info] Non-candidate masters: Mon Nov 13 16:24:13 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:13 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:13 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:13 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Mon Nov 13 16:24:13 2017 - [info] Not found. Mon Nov 13 16:24:13 2017 - [info] Searching from all candidate_master slaves.. Mon Nov 13 16:24:13 2017 - [info] New master is host_1(host_1:3306) Mon Nov 13 16:24:13 2017 - [info] Starting master failover.. Mon Nov 13 16:24:13 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] Server host_1 received relay logs up to: host_2.000002:150 Mon Nov 13 16:24:13 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000002:1035 (using the latest slave's relay logs) Mon Nov 13 16:24:13 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Mon Nov 13 16:24:13 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000002 --latest_rmlp=1035 --target_mlf=host_2.000002 --target_rmlp=150 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000005 Mon Nov 13 16:24:15 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000005 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000005, start_pos:269. Concat binary/relay logs from host_3-relay-bin.000005 pos 269 to host_3-relay-bin.000005 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog .. Dumping binlog format description event, from position 0 to 269.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000005 position 269 to tail(1154).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog to root@host_1(22) succeeded. Mon Nov 13 16:24:15 2017 - [info] Generating diff files succeeded. Mon Nov 13 16:24:15 2017 - [info] Sending binlog.. Mon Nov 13 16:24:16 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:16 2017 - [info] Mon Nov 13 16:24:16 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Mon Nov 13 16:24:16 2017 - [info] Mon Nov 13 16:24:16 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Mon Nov 13 16:24:16 2017 - [info] Starting recovery on host_1(host_1:3306).. Mon Nov 13 16:24:16 2017 - [info] Generating diffs succeeded. Mon Nov 13 16:24:16 2017 - [info] Waiting until all relay logs are applied. Mon Nov 13 16:24:16 2017 - [info] done. Mon Nov 13 16:24:16 2017 - [info] Getting slave status.. Mon Nov 13 16:24:16 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000002:150). No need to recover from Exec_Master_Log_Pos. Mon Nov 13 16:24:16 2017 - [info] Connecting to the target slave host host_1, running recover script.. Mon Nov 13 16:24:16 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Nov 13 16:24:16 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog position 150 to tail(1054).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog on host_1:3306. This may take long time... Applying log files succeeded. Mon Nov 13 16:24:16 2017 - [info] All relay logs were successfully applied. Mon Nov 13 16:24:16 2017 - [info] Getting new master's binlog name and position.. Mon Nov 13 16:24:16 2017 - [info] host_1.000001:3232 Mon Nov 13 16:24:16 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000001', MASTER_LOG_POS=3232, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Nov 13 16:24:16 2017 - [info] Executing master IP activate script: Mon Nov 13 16:24:16 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Mon Nov 13 16:24:21 2017 - [info] OK. Mon Nov 13 16:24:21 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Mon Nov 13 16:24:21 2017 - [info] ok. Mon Nov 13 16:24:21 2017 - [info] ** Finished master recovery successfully. Mon Nov 13 16:24:21 2017 - [info] * Phase 3: Master Recovery Phase completed. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4: Slaves Recovery Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 104955. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171113162403.log if it takes time.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Log messages from host_3 ... Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Nov 13 16:24:21 2017 - [info] End of log messages from host_3. Mon Nov 13 16:24:21 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Mon Nov 13 16:24:21 2017 - [info] Generating relay diff files from the latest slave succeeded. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 104966. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171113162403.log if it takes time.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Log messages from host_3 ... Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Sending binlog.. Mon Nov 13 16:24:21 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:21 2017 - [info] Starting recovery on host_3(host_3:3306).. Mon Nov 13 16:24:21 2017 - [info] Generating diffs succeeded. Mon Nov 13 16:24:21 2017 - [info] Waiting until all relay logs are applied. Mon Nov 13 16:24:21 2017 - [info] done. Mon Nov 13 16:24:21 2017 - [info] Getting slave status.. Mon Nov 13 16:24:21 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000002:1035). No need to recover from Exec_Master_Log_Pos. Mon Nov 13 16:24:21 2017 - [info] Connecting to the target slave host host_3, running recover script.. Mon Nov 13 16:24:21 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Nov 13 16:24:21 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog on host_3:3306. This may take long time... Applying log files succeeded. Mon Nov 13 16:24:21 2017 - [info] All relay logs were successfully applied. Mon Nov 13 16:24:21 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Mon Nov 13 16:24:21 2017 - [info] Executed CHANGE MASTER. Mon Nov 13 16:24:21 2017 - [info] Slave started. Mon Nov 13 16:24:21 2017 - [info] End of log messages from host_3. Mon Nov 13 16:24:21 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Mon Nov 13 16:24:21 2017 - [info] All new slave servers recovered successfully. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 5: New master cleanup phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Resetting slave info on the new master.. Mon Nov 13 16:24:21 2017 - [info] host_1: Resetting slave info succeeded. Mon Nov 13 16:24:21 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Mon Nov 13 16:24:21 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Mon Nov 13 16:24:21 2017 - [info] Sending mail..
- 1.2.2 当master的所有日志已经传递到1个etl,这时候master 上的MySQL挂了
### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000010 | 1694 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000010 Exec_Master_Log_Pos: 806 dba:lc> select * from t_char_2; Empty set (0.00 sec) * etl (other slave) host_3 Master_Log_File: host_2.000010 Exec_Master_Log_Pos: 1694 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec) ### 切换日志 Wed Nov 15 10:25:50 2017 - [info] MHA::MasterFailover version 0.56. Wed Nov 15 10:25:50 2017 - [info] Starting master failover. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] * Phase 1: Configuration Check Phase.. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Wed Nov 15 10:25:50 2017 - [info] GTID failover mode = 0 Wed Nov 15 10:25:50 2017 - [info] Dead Servers: Wed Nov 15 10:25:50 2017 - [info] host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Checking master reachability via MySQL(double check)... Wed Nov 15 10:25:50 2017 - [info] ok. Wed Nov 15 10:25:50 2017 - [info] Alive Servers: Wed Nov 15 10:25:50 2017 - [info] host_1(host_1:3306) Wed Nov 15 10:25:50 2017 - [info] host_3(host_3:3306) Wed Nov 15 10:25:50 2017 - [info] Alive Slaves: Wed Nov 15 10:25:50 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:50 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:50 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:50 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:50 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Wed Nov 15 10:25:50 2017 - [info] done. Wed Nov 15 10:25:50 2017 - [info] Starting Non-GTID based failover. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] HealthCheck: SSH to host_2 is reachable. Wed Nov 15 10:25:51 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Nov 15 10:25:51 2017 - [info] Executing master IP deactivation script: Wed Nov 15 10:25:51 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-15 10:25:51-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.1M=0s 2017-11-15 10:25:53 (11.1 MB/s) - 已写入标准输出 [38] Wed Nov 15 10:25:53 2017 - [info] done. Wed Nov 15 10:25:53 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Nov 15 10:25:53 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3: Master Recovery Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] The latest binary log file/position on all slaves is host_2.000010:1694 Wed Nov 15 10:25:53 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Nov 15 10:25:53 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:53 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:53 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:53 2017 - [info] The oldest binary log file/position on all slaves is host_2.000010:806 Wed Nov 15 10:25:53 2017 - [info] Oldest slaves: Wed Nov 15 10:25:53 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:53 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:53 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] Fetching dead master's binary logs.. Wed Nov 15 10:25:53 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000010 --start_pos=1694 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000010 pos 1694 to host_2.000010 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000010 position 1694 to tail(1713).. ok. Concat succeeded. Wed Nov 15 10:25:53 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:54 2017 - [info] HealthCheck: SSH to host_1 is reachable. Wed Nov 15 10:25:54 2017 - [info] HealthCheck: SSH to host_3 is reachable. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed Nov 15 10:25:54 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Wed Nov 15 10:25:54 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000010 --latest_rmlp=1694 --target_mlf=host_2.000010 --target_rmlp=806 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171115102550 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:1017. Target relay log FOUND! Wed Nov 15 10:25:54 2017 - [info] OK. host_3 has all relay logs. Wed Nov 15 10:25:54 2017 - [info] Searching new master from slaves.. Wed Nov 15 10:25:54 2017 - [info] Candidate masters from the configuration file: Wed Nov 15 10:25:54 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:54 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:54 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:54 2017 - [info] Non-candidate masters: Wed Nov 15 10:25:54 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:54 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:54 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:54 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Wed Nov 15 10:25:54 2017 - [info] Not found. Wed Nov 15 10:25:54 2017 - [info] Searching from all candidate_master slaves.. Wed Nov 15 10:25:54 2017 - [info] New master is host_1(host_1:3306) Wed Nov 15 10:25:54 2017 - [info] Starting master failover.. Wed Nov 15 10:25:54 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] Server host_1 received relay logs up to: host_2.000010:806 Wed Nov 15 10:25:54 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000010:1694 (using the latest slave's relay logs) Wed Nov 15 10:25:55 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Wed Nov 15 10:25:55 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000010 --latest_rmlp=1694 --target_mlf=host_2.000010 --target_rmlp=806 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 Wed Nov 15 10:25:55 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:1017. Concat binary/relay logs from host_3-relay-bin.000004 pos 1017 to host_3-relay-bin.000004 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog .. Dumping binlog format description event, from position 0 to 361.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000004 position 1017 to tail(1905).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog to root@host_1(22) succeeded. Wed Nov 15 10:25:55 2017 - [info] Generating diff files succeeded. Wed Nov 15 10:25:55 2017 - [info] Sending binlog.. Wed Nov 15 10:25:56 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:56 2017 - [info] Wed Nov 15 10:25:56 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Wed Nov 15 10:25:56 2017 - [info] Wed Nov 15 10:25:56 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed Nov 15 10:25:56 2017 - [info] Starting recovery on host_1(host_1:3306).. Wed Nov 15 10:25:56 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:25:56 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:25:56 2017 - [info] done. Wed Nov 15 10:25:56 2017 - [info] Getting slave status.. Wed Nov 15 10:25:56 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000010:806). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:25:56 2017 - [info] Connecting to the target slave host host_1, running recover script.. Wed Nov 15 10:25:56 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:25:56 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog position 150 to tail(169).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog on host_1:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:25:56 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:25:56 2017 - [info] Getting new master's binlog name and position.. Wed Nov 15 10:25:56 2017 - [info] host_1.000010:2060 Wed Nov 15 10:25:56 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000010', MASTER_LOG_POS=2060, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Wed Nov 15 10:25:56 2017 - [info] Executing master IP activate script: Wed Nov 15 10:25:56 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Wed Nov 15 10:25:59 2017 - [info] OK. Wed Nov 15 10:25:59 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Wed Nov 15 10:25:59 2017 - [info] ok. Wed Nov 15 10:25:59 2017 - [info] ** Finished master recovery successfully. Wed Nov 15 10:25:59 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 125962. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115102550.log if it takes time.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Nov 15 10:25:59 2017 - [info] End of log messages from host_3. Wed Nov 15 10:25:59 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Wed Nov 15 10:25:59 2017 - [info] Generating relay diff files from the latest slave succeeded. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 125967. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115102550.log if it takes time.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Sending binlog.. Wed Nov 15 10:25:59 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:59 2017 - [info] Starting recovery on host_3(host_3:3306).. Wed Nov 15 10:25:59 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:25:59 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:25:59 2017 - [info] done. Wed Nov 15 10:25:59 2017 - [info] Getting slave status.. Wed Nov 15 10:25:59 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000010:1694). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:25:59 2017 - [info] Connecting to the target slave host host_3, running recover script.. Wed Nov 15 10:25:59 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:25:59 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog on host_3:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:25:59 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:25:59 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Wed Nov 15 10:25:59 2017 - [info] Executed CHANGE MASTER. Wed Nov 15 10:25:59 2017 - [info] Slave started. Wed Nov 15 10:25:59 2017 - [info] End of log messages from host_3. Wed Nov 15 10:25:59 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Wed Nov 15 10:25:59 2017 - [info] All new slave servers recovered successfully. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 5: New master cleanup phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Resetting slave info on the new master.. Wed Nov 15 10:25:59 2017 - [info] host_1: Resetting slave info succeeded. Wed Nov 15 10:25:59 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Wed Nov 15 10:25:59 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Wed Nov 15 10:25:59 2017 - [info] Sending mail.. ### 切换后的结果 * new master 和 new etl 数据一致 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec)
1.3 slave(候选master)的日志是最新的,比etl要多
- 1.3.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了
### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000004 | 4577 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | | 125 | 13 | | 126 | 14 | | 127 | 15 | +-----+------+ 14 rows in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000004 Exec_Master_Log_Pos: 3683 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | +-----+------+ 11 rows in set (0.00 sec) * etl (other slave) host_3 Master_Log_File: host_2.000004 Exec_Master_Log_Pos: 2789 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | +-----+------+ 8 rows in set (0.00 sec) ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Tue Nov 14 17:13:08 2017 - [info] MHA::MasterFailover version 0.56. Tue Nov 14 17:13:08 2017 - [info] Starting master failover. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] * Phase 1: Configuration Check Phase.. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Tue Nov 14 17:13:08 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Tue Nov 14 17:13:08 2017 - [info] GTID failover mode = 0 Tue Nov 14 17:13:08 2017 - [info] Dead Servers: Tue Nov 14 17:13:08 2017 - [info] host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Checking master reachability via MySQL(double check)... Tue Nov 14 17:13:08 2017 - [info] ok. Tue Nov 14 17:13:08 2017 - [info] Alive Servers: Tue Nov 14 17:13:08 2017 - [info] host_1(host_1:3306) Tue Nov 14 17:13:08 2017 - [info] host_3(host_3:3306) Tue Nov 14 17:13:08 2017 - [info] Alive Slaves: Tue Nov 14 17:13:08 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:08 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:08 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:08 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:08 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Tue Nov 14 17:13:08 2017 - [info] done. Tue Nov 14 17:13:08 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Tue Nov 14 17:13:08 2017 - [info] done. Tue Nov 14 17:13:08 2017 - [info] Starting Non-GTID based failover. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] HealthCheck: SSH to host_2 is reachable. Tue Nov 14 17:13:09 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Nov 14 17:13:09 2017 - [info] Executing master IP deactivation script: Tue Nov 14 17:13:09 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-14 17:13:09-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 7.19M=0s 2017-11-14 17:13:11 (7.19 MB/s) - 已写入标准输出 [38] Tue Nov 14 17:13:11 2017 - [info] done. Tue Nov 14 17:13:11 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Nov 14 17:13:11 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3: Master Recovery Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] The latest binary log file/position on all slaves is host_2.000004:3683 Tue Nov 14 17:13:11 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Nov 14 17:13:11 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:11 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:11 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:11 2017 - [info] The oldest binary log file/position on all slaves is host_2.000004:2789 Tue Nov 14 17:13:11 2017 - [info] Oldest slaves: Tue Nov 14 17:13:11 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:11 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:11 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] Fetching dead master's binary logs.. Tue Nov 14 17:13:11 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000004 --start_pos=3683 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000004 pos 3683 to host_2.000004 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000004 position 3683 to tail(4596).. ok. Concat succeeded. Tue Nov 14 17:13:12 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:12 2017 - [info] HealthCheck: SSH to host_1 is reachable. Tue Nov 14 17:13:12 2017 - [info] HealthCheck: SSH to host_3 is reachable. Tue Nov 14 17:13:12 2017 - [info] Tue Nov 14 17:13:12 2017 - [info] * Phase 3.3: Determining New Master Phase.. Tue Nov 14 17:13:12 2017 - [info] Tue Nov 14 17:13:12 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue Nov 14 17:13:12 2017 - [info] Checking whether host_1 has relay logs from the oldest position.. Tue Nov 14 17:13:12 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000004 --latest_rmlp=3683 --target_mlf=host_2.000004 --target_rmlp=2789 --server_id=1261261646 --workdir=/var/log/masterha/mha_test --timestamp=20171114171308 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_1-relay-bin.000003 : Relay log found at /data/mysql_data, up to host_1-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_1-relay-bin.000003, start_pos:269. Target relay log FOUND! Tue Nov 14 17:13:13 2017 - [info] OK. host_1 has all relay logs. Tue Nov 14 17:13:13 2017 - [info] Searching new master from slaves.. Tue Nov 14 17:13:13 2017 - [info] Candidate masters from the configuration file: Tue Nov 14 17:13:13 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:13 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:13 2017 - [info] Non-candidate masters: Tue Nov 14 17:13:13 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:13 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:13 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:13 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Nov 14 17:13:13 2017 - [info] New master is host_1(host_1:3306) Tue Nov 14 17:13:13 2017 - [info] Starting master failover.. Tue Nov 14 17:13:13 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Nov 14 17:13:13 2017 - [info] Sending binlog.. Tue Nov 14 17:13:13 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue Nov 14 17:13:13 2017 - [info] Starting recovery on host_1(host_1:3306).. Tue Nov 14 17:13:13 2017 - [info] Generating diffs succeeded. Tue Nov 14 17:13:13 2017 - [info] Waiting until all relay logs are applied. Tue Nov 14 17:13:13 2017 - [info] done. Tue Nov 14 17:13:13 2017 - [info] Getting slave status.. Tue Nov 14 17:13:13 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000004:3683). No need to recover from Exec_Master_Log_Pos. Tue Nov 14 17:13:13 2017 - [info] Connecting to the target slave host host_1, running recover script.. Tue Nov 14 17:13:13 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Tue Nov 14 17:13:13 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog on host_1:3306. This may take long time... Applying log files succeeded. Tue Nov 14 17:13:13 2017 - [info] All relay logs were successfully applied. Tue Nov 14 17:13:13 2017 - [info] Getting new master's binlog name and position.. Tue Nov 14 17:13:13 2017 - [info] host_1.000003:2347 Tue Nov 14 17:13:13 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000003', MASTER_LOG_POS=2347, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Nov 14 17:13:13 2017 - [info] Executing master IP activate script: Tue Nov 14 17:13:13 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Tue Nov 14 17:13:16 2017 - [info] OK. Tue Nov 14 17:13:16 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Tue Nov 14 17:13:16 2017 - [info] ok. Tue Nov 14 17:13:16 2017 - [info] ** Finished master recovery successfully. Tue Nov 14 17:13:16 2017 - [info] * Phase 3: Master Recovery Phase completed. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] * Phase 4: Slaves Recovery Phase.. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 29885. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171114171308.log if it takes time.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Log messages from host_3 ... Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] Server host_3 received relay logs up to: host_2.000004:2789 Tue Nov 14 17:13:16 2017 - [info] Need to get diffs from the latest slave(host_1) up to: host_2.000004:3683 (using the latest slave's relay logs) Tue Nov 14 17:13:16 2017 - [info] Connecting to the latest slave host host_1, generating diff relay log files.. Tue Nov 14 17:13:16 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_2.000004 --latest_rmlp=3683 --target_mlf=host_2.000004 --target_rmlp=2789 --server_id=1261261646 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_1-relay-bin.000003 Tue Nov 14 17:13:17 2017 - [info] Relay log found at /data/mysql_data, up to host_1-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_1-relay-bin.000003, start_pos:269. Concat binary/relay logs from host_1-relay-bin.000003 pos 269 to host_1-relay-bin.000003 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog .. Dumping binlog format description event, from position 0 to 269.. ok. Dumping effective binlog data from /data/mysql_data/host_1-relay-bin.000003 position 269 to tail(1163).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog . scp host_1.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog to root@host_3(22) succeeded. Tue Nov 14 17:13:17 2017 - [info] Generating diff files succeeded. Tue Nov 14 17:13:17 2017 - [info] End of log messages from host_3. Tue Nov 14 17:13:17 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Tue Nov 14 17:13:17 2017 - [info] Generating relay diff files from the latest slave succeeded. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 31393. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171114171308.log if it takes time.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Log messages from host_3 ... Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Sending binlog.. Tue Nov 14 17:13:17 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:17 2017 - [info] Starting recovery on host_3(host_3:3306).. Tue Nov 14 17:13:17 2017 - [info] Generating diffs succeeded. Tue Nov 14 17:13:17 2017 - [info] Waiting until all relay logs are applied. Tue Nov 14 17:13:17 2017 - [info] done. Tue Nov 14 17:13:17 2017 - [info] Getting slave status.. Tue Nov 14 17:13:17 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000004:2789). No need to recover from Exec_Master_Log_Pos. Tue Nov 14 17:13:17 2017 - [info] Connecting to the target slave host host_3, running recover script.. Tue Nov 14 17:13:17 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Tue Nov 14 17:13:17 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog position 150 to tail(1063).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog on host_3:3306. This may take long time... Applying log files succeeded. Tue Nov 14 17:13:17 2017 - [info] All relay logs were successfully applied. Tue Nov 14 17:13:17 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Tue Nov 14 17:13:17 2017 - [info] Executed CHANGE MASTER. Tue Nov 14 17:13:17 2017 - [info] Slave started. Tue Nov 14 17:13:17 2017 - [info] End of log messages from host_3. Tue Nov 14 17:13:17 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Tue Nov 14 17:13:17 2017 - [info] All new slave servers recovered successfully. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] * Phase 5: New master cleanup phase.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Resetting slave info on the new master.. Tue Nov 14 17:13:17 2017 - [info] host_1: Resetting slave info succeeded. Tue Nov 14 17:13:17 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Tue Nov 14 17:13:17 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_1(host_1:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_1(host_1:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Tue Nov 14 17:13:17 2017 - [info] Sending mail.. ## 切换结果 new master 和 new etl dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | | 125 | 13 | | 126 | 14 | | 127 | 15 | +-----+------+ 14 rows in set (0.00 sec)
- 1.3.2 当master的所有日志已经传递slave,这时候master 上的MySQL挂了
### 模拟现场,现场的3台DB binlog状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_1.000010 | 3341 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 | +-----+------+ 3 rows in set (0.00 sec) * slave host_2 Master_Log_File: host_1.000010 Exec_Master_Log_Pos: 3341 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 | +-----+------+ 3 rows in set (0.00 sec) * etl host_3 Master_Log_File: host_1.000010 Exec_Master_Log_Pos: 2381 ### 切换日志 Wed Nov 15 10:39:36 2017 - [info] MHA::MasterFailover version 0.56. Wed Nov 15 10:39:36 2017 - [info] Starting master failover. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] * Phase 1: Configuration Check Phase.. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Wed Nov 15 10:39:36 2017 - [info] GTID failover mode = 0 Wed Nov 15 10:39:36 2017 - [info] Dead Servers: Wed Nov 15 10:39:36 2017 - [info] host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Checking master reachability via MySQL(double check)... Wed Nov 15 10:39:36 2017 - [info] ok. Wed Nov 15 10:39:36 2017 - [info] Alive Servers: Wed Nov 15 10:39:36 2017 - [info] host_2(host_2:3306) Wed Nov 15 10:39:36 2017 - [info] host_3(host_3:3306) Wed Nov 15 10:39:36 2017 - [info] Alive Slaves: Wed Nov 15 10:39:36 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:36 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:36 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:36 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:36 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Wed Nov 15 10:39:36 2017 - [info] done. Wed Nov 15 10:39:36 2017 - [info] Starting Non-GTID based failover. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] HealthCheck: SSH to host_1 is reachable. Wed Nov 15 10:39:37 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Nov 15 10:39:37 2017 - [info] Executing master IP deactivation script: Wed Nov 15 10:39:37 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_1 is deleted ============================== --2017-11-15 10:39:37-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.4M=0s 2017-11-15 10:39:39 (11.4 MB/s) - 已写入标准输出 [38] Wed Nov 15 10:39:39 2017 - [info] done. Wed Nov 15 10:39:39 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Nov 15 10:39:39 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3: Master Recovery Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] The latest binary log file/position on all slaves is host_1.000010:3341 Wed Nov 15 10:39:39 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Nov 15 10:39:39 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:39 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:39 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:39 2017 - [info] The oldest binary log file/position on all slaves is host_1.000010:2381 Wed Nov 15 10:39:39 2017 - [info] Oldest slaves: Wed Nov 15 10:39:39 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:39 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:39 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] Fetching dead master's binary logs.. Wed Nov 15 10:39:39 2017 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1.000010 --start_pos=3341 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_1.000010 pos 3341 to host_1.000010 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /data/mysql.bin/host_1.000010 position 3341 to tail(3364).. ok. Binlog Checksum enabled Concat succeeded. Wed Nov 15 10:39:39 2017 - [info] scp from root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:40 2017 - [info] HealthCheck: SSH to host_2 is reachable. Wed Nov 15 10:39:40 2017 - [info] HealthCheck: SSH to host_3 is reachable. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed Nov 15 10:39:40 2017 - [info] Checking whether host_2 has relay logs from the oldest position.. Wed Nov 15 10:39:40 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1.000010 --latest_rmlp=3341 --target_mlf=host_1.000010 --target_rmlp=2381 --server_id=1261261656 --workdir=/var/log/masterha/mha_test --timestamp=20171115103936 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 : Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:636. Target relay log FOUND! Wed Nov 15 10:39:40 2017 - [info] OK. host_2 has all relay logs. Wed Nov 15 10:39:40 2017 - [info] Searching new master from slaves.. Wed Nov 15 10:39:40 2017 - [info] Candidate masters from the configuration file: Wed Nov 15 10:39:40 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:40 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:40 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:40 2017 - [info] Non-candidate masters: Wed Nov 15 10:39:40 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:40 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:40 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:40 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Wed Nov 15 10:39:40 2017 - [info] New master is host_2(host_2:3306) Wed Nov 15 10:39:40 2017 - [info] Starting master failover.. Wed Nov 15 10:39:40 2017 - [info] From: host_1(host_1:3306) (current master) +--host_2(host_2:3306) +--host_3(host_3:3306) To: host_2(host_2:3306) (new master) +--host_3(host_3:3306) Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Nov 15 10:39:40 2017 - [info] Sending binlog.. Wed Nov 15 10:39:41 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:41 2017 - [info] Wed Nov 15 10:39:41 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Wed Nov 15 10:39:41 2017 - [info] Wed Nov 15 10:39:41 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed Nov 15 10:39:41 2017 - [info] Starting recovery on host_2(host_2:3306).. Wed Nov 15 10:39:41 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:39:41 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:39:41 2017 - [info] done. Wed Nov 15 10:39:41 2017 - [info] Getting slave status.. Wed Nov 15 10:39:41 2017 - [info] This slave(host_2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000010:3341). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:39:41 2017 - [info] Connecting to the target slave host host_2, running recover script.. Wed Nov 15 10:39:41 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:39:41 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog on host_2:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:39:41 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:39:41 2017 - [info] Getting new master's binlog name and position.. Wed Nov 15 10:39:41 2017 - [info] host_2.000011:1307 Wed Nov 15 10:39:41 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2.000011', MASTER_LOG_POS=1307, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Wed Nov 15 10:39:41 2017 - [info] Executing master IP activate script: Wed Nov 15 10:39:41 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_2 is added ============================== Wed Nov 15 10:39:44 2017 - [info] OK. Wed Nov 15 10:39:44 2017 - [info] Setting read_only=0 on host_2(host_2:3306).. Wed Nov 15 10:39:44 2017 - [info] ok. Wed Nov 15 10:39:44 2017 - [info] ** Finished master recovery successfully. Wed Nov 15 10:39:44 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 11760. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115103936.log if it takes time.. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] Server host_3 received relay logs up to: host_1.000010:2381 Wed Nov 15 10:39:44 2017 - [info] Need to get diffs from the latest slave(host_2) up to: host_1.000010:3341 (using the latest slave's relay logs) Wed Nov 15 10:39:44 2017 - [info] Connecting to the latest slave host host_2, generating diff relay log files.. Wed Nov 15 10:39:44 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_1.000010 --latest_rmlp=3341 --target_mlf=host_1.000010 --target_rmlp=2381 --server_id=1261261656 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 Wed Nov 15 10:39:45 2017 - [info] Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:636. Concat binary/relay logs from host_2-relay-bin.000002 pos 636 to host_2-relay-bin.000002 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 315.. ok. Dumping effective binlog data from /data/mysql_data/host_2-relay-bin.000002 position 636 to tail(1596).. ok. Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog . scp host_2.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog to root@host_3(22) succeeded. Wed Nov 15 10:39:45 2017 - [info] Generating diff files succeeded. Wed Nov 15 10:39:45 2017 - [info] End of log messages from host_3. Wed Nov 15 10:39:45 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Wed Nov 15 10:39:45 2017 - [info] Generating relay diff files from the latest slave succeeded. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 12881. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115103936.log if it takes time.. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] Sending binlog.. Wed Nov 15 10:39:45 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:45 2017 - [info] Starting recovery on host_3(host_3:3306).. Wed Nov 15 10:39:45 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:39:45 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:39:45 2017 - [info] done. Wed Nov 15 10:39:45 2017 - [info] Getting slave status.. Wed Nov 15 10:39:45 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000010:2381). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:39:45 2017 - [info] Connecting to the target slave host host_3, running recover script.. Wed Nov 15 10:39:45 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:40:45 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog.. Binlog Checksum enabled dumped up to pos 154. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog has effective binlog events from pos 154. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog position 154 to tail(177).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog on host_3:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:40:45 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:40:45 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306).. Wed Nov 15 10:40:45 2017 - [info] Executed CHANGE MASTER. Wed Nov 15 10:40:45 2017 - [info] Slave started. Wed Nov 15 10:40:45 2017 - [info] End of log messages from host_3. Wed Nov 15 10:40:45 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Wed Nov 15 10:40:45 2017 - [info] All new slave servers recovered successfully. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] * Phase 5: New master cleanup phase.. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] Resetting slave info on the new master.. Wed Nov 15 10:40:45 2017 - [info] host_2: Resetting slave info succeeded. Wed Nov 15 10:40:45 2017 - [info] Master failover to host_2(host_2:3306) completed successfully. Wed Nov 15 10:40:45 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded Master host_1(host_1:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_1(host_1:3306) The latest slave host_2(host_2:3306) has all relay logs for recovery. Selected host_2(host_2:3306) as a new master. host_2(host_2:3306): OK: Applying all logs succeeded. host_2(host_2:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_2(host_2:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306) host_2(host_2:3306): Resetting slave info succeeded. Master failover to host_2(host_2:3306) completed successfully. Wed Nov 15 10:40:45 2017 - [info] Sending mail.. ### 切换结果 new master & new etl 数据一致 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 |
1.4 slave(候选master)上面有大事务在跑
- 1000s的大查询
无影响,正常切换
1.5 如果MHA过程中失败,是否可以重新执行MHA的failover呢?
-
90%的场景都是可以重新执行的
-
5%的场景不能执行
* NO。1 问题: dead master上面的vip已经delete,再次failover 这一步会报错 方案: a)将dead master上的vip再重新add上去 或者 b)直接过滤掉 delete dead master vip这一步
- 5%的场景不能再次执行,执行会报错
一般这种场景就是:已经failover到最后的change master阶段,这样主从结构已经变更( candidate slave <==rep== etl ),MHA无法重新走一遍。
好在,change master的语句已经生成,其余slave再次执行一遍日志里面的change master即可
Thu Nov 9 16:49:39 2017 - [info] GTID failover mode = 0 Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln169] Detected dead master host_1(host_1:3306) does not match with specified dead master host_2(host_2:3306)! Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
1.7 Master:MySQL down小结
1. failover最终命令 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=$dead_master_ip --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error
二、Master : Server down
2.1 etl 延迟8小时
同1.1 结论
2.2 slave(候选master)比etl还要落后更多
- 2.2.1 当master的部分日志还没传递两个slave,这时候master server挂了
### 3台DB的binlog状态 * master dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000012 | 1651 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | | 174 | 3 | | 175 | 4 | +-----+------+ 4 rows in set (0.00 sec) * slave Master_Log_File: host_2.000012 Exec_Master_Log_Pos: 467 dba:lc> select * from t_char_2; Empty set (0.00 sec) * etl Master_Log_File: host_2.000012 Exec_Master_Log_Pos: 1059 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 模拟故障场景 * 隔离master的网络,让其等同于down机 master> iptables -A INPUT -p tcp -s other_ip --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 切换日志 Thu Nov 16 16:54:40 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 16 16:54:40 2017 - [info] Starting master failover. Thu Nov 16 16:54:40 2017 - [info] Thu Nov 16 16:54:40 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 16 16:54:40 2017 - [info] Thu Nov 16 16:54:40 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Thu Nov 16 16:54:40 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Thu Nov 16 16:54:40 2017 - [info] GTID failover mode = 0 Thu Nov 16 16:54:40 2017 - [info] Dead Servers: Thu Nov 16 16:54:40 2017 - [info] host_2(host_2:3306) Thu Nov 16 16:54:40 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 16 16:54:41 2017 - [info] ok. Thu Nov 16 16:54:41 2017 - [info] Alive Servers: Thu Nov 16 16:54:41 2017 - [info] host_1(host_1:3306) Thu Nov 16 16:54:41 2017 - [info] host_3(host_3:3306) Thu Nov 16 16:54:41 2017 - [info] Alive Slaves: Thu Nov 16 16:54:41 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:54:41 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:54:41 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:54:41 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:54:41 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:54:41 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:54:41 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Thu Nov 16 16:54:41 2017 - [info] done. Thu Nov 16 16:54:41 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Thu Nov 16 16:54:41 2017 - [info] done. Thu Nov 16 16:54:41 2017 - [info] Starting Non-GTID based failover. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:54:41 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:54:41 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:55:31 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Thu Nov 16 16:55:31 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 16 16:55:31 2017 - [info] Executing master IP deactivation script: Thu Nov 16 16:55:31 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stop ssh: connect to host host_2 port 22: Connection timed out =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-16 16:55:38-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 12.1M=0s 2017-11-16 16:57:36 (12.1 MB/s) - 已写入标准输出 [38] Thu Nov 16 16:57:36 2017 - [info] done. Thu Nov 16 16:57:36 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 16 16:57:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] The latest binary log file/position on all slaves is host_2.000012:1059 Thu Nov 16 16:57:36 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 16 16:57:36 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:36 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:36 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:57:36 2017 - [info] The oldest binary log file/position on all slaves is host_2.000012:467 Thu Nov 16 16:57:36 2017 - [info] Oldest slaves: Thu Nov 16 16:57:36 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:36 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:36 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Thu Nov 16 16:57:37 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 16 16:57:37 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Thu Nov 16 16:57:37 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000012 --latest_rmlp=1059 --target_mlf=host_2.000012 --target_rmlp=467 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171116165440 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:678. Target relay log FOUND! Thu Nov 16 16:57:37 2017 - [info] OK. host_3 has all relay logs. Thu Nov 16 16:57:37 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 16 16:57:37 2017 - [info] Searching new master from slaves.. Thu Nov 16 16:57:37 2017 - [info] Candidate masters from the configuration file: Thu Nov 16 16:57:37 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:37 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:37 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:57:37 2017 - [info] Non-candidate masters: Thu Nov 16 16:57:37 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:37 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:37 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:57:37 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 16 16:57:37 2017 - [info] Not found. Thu Nov 16 16:57:37 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 16 16:57:37 2017 - [info] New master is host_1(host_1:3306) Thu Nov 16 16:57:37 2017 - [info] Starting master failover.. Thu Nov 16 16:57:37 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Thu Nov 16 16:57:37 2017 - [info] Thu Nov 16 16:57:37 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Thu Nov 16 16:57:37 2017 - [info] Thu Nov 16 16:57:37 2017 - [info] Server host_1 received relay logs up to: host_2.000012:467 Thu Nov 16 16:57:37 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000012:1059 (using the latest slave's relay logs) Thu Nov 16 16:57:38 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Thu Nov 16 16:57:38 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000012 --latest_rmlp=1059 --target_mlf=host_2.000012 --target_rmlp=467 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171116165440 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 Thu Nov 16 16:57:38 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:678. Concat binary/relay logs from host_3-relay-bin.000004 pos 678 to host_3-relay-bin.000004 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog .. Dumping binlog format description event, from position 0 to 361.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000004 position 678 to tail(1270).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog to root@host_1(22) succeeded. Thu Nov 16 16:57:38 2017 - [info] Generating diff files succeeded. Thu Nov 16 16:57:38 2017 - [info] Thu Nov 16 16:57:38 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Thu Nov 16 16:57:38 2017 - [info] Thu Nov 16 16:57:38 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Thu Nov 16 16:57:38 2017 - [info] Starting recovery on host_1(host_1:3306).. Thu Nov 16 16:57:38 2017 - [info] Generating diffs succeeded. Thu Nov 16 16:57:38 2017 - [info] Waiting until all relay logs are applied. Thu Nov 16 16:57:38 2017 - [info] done. Thu Nov 16 16:57:38 2017 - [info] Getting slave status.. Thu Nov 16 16:57:38 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000012:467). No need to recover from Exec_Master_Log_Pos. Thu Nov 16 16:57:38 2017 - [info] Connecting to the target slave host host_1, running recover script.. Thu Nov 16 16:57:38 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171116165440 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Thu Nov 16 16:57:39 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog on host_1:3306. This may take long time... Applying log files succeeded. Thu Nov 16 16:57:39 2017 - [info] All relay logs were successfully applied. Thu Nov 16 16:57:39 2017 - [info] Getting new master's binlog name and position.. Thu Nov 16 16:57:39 2017 - [info] host_1.000012:1310 Thu Nov 16 16:57:39 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000012', MASTER_LOG_POS=1310, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 16 16:57:39 2017 - [info] Executing master IP activate script: Thu Nov 16 16:57:39 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Thu Nov 16 16:57:41 2017 - [info] OK. Thu Nov 16 16:57:41 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Thu Nov 16 16:57:41 2017 - [info] ok. Thu Nov 16 16:57:41 2017 - [info] ** Finished master recovery successfully. Thu Nov 16 16:57:41 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 123011. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116165440.log if it takes time.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Log messages from host_3 ... Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Thu Nov 16 16:57:41 2017 - [info] End of log messages from host_3. Thu Nov 16 16:57:41 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Thu Nov 16 16:57:41 2017 - [info] Generating relay diff files from the latest slave succeeded. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 123044. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116165440.log if it takes time.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Log messages from host_3 ... Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Starting recovery on host_3(host_3:3306).. Thu Nov 16 16:57:41 2017 - [info] This server has all relay logs. Waiting all logs to be applied.. Thu Nov 16 16:57:41 2017 - [info] done. Thu Nov 16 16:57:41 2017 - [info] All relay logs were successfully applied. Thu Nov 16 16:57:41 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Thu Nov 16 16:57:41 2017 - [info] Executed CHANGE MASTER. Thu Nov 16 16:57:41 2017 - [info] Slave started. Thu Nov 16 16:57:41 2017 - [info] End of log messages from host_3. Thu Nov 16 16:57:41 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Thu Nov 16 16:57:41 2017 - [info] All new slave servers recovered successfully. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Resetting slave info on the new master.. Thu Nov 16 16:57:41 2017 - [info] host_1: Resetting slave info succeeded. Thu Nov 16 16:57:41 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Thu Nov 16 16:57:41 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. ### 切换后的结果 new_master & new_etl 结果一致,由于master上面未传过来的binlog彻底丢失,所以相应的新集群也缺失这些数据。 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 最后一步很重要 如果dead master之后又活过来了,那么这一步要做 dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述
- 2.2.2 当master的所有日志已经传递到1个etl,这时候master server挂了
测试省略,和2.2.1基本一样
结论:由于master上的所有日志全部传递到etl,所以最后是不会丢失master上任何数据的。
2.3 slave(候选master)的日志是最新的,比etl要多
- 2.3.1 当master的部分日志还没传递两个slave,这时候master server挂了
### 3台DB的binlog状态 * master dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_1.000012 | 3860 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | | 178 | 10 | | 179 | 20 | +-----+------+ 4 rows in set (0.00 sec) * slave Master_Log_File: host_1.000012 Exec_Master_Log_Pos: 3216 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | +-----+------+ 2 rows in set (0.00 sec) * etl Master_Log_File: host_1.000012 Exec_Master_Log_Pos: 2576 dba:lc> select * from t_char_2; Empty set (0.00 sec) ### 模拟故障场景 * 隔离master的网络,让其等同于down机 master> iptables -A INPUT -p tcp -s other_ip --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 切换日志 Thu Nov 16 17:17:59 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 16 17:17:59 2017 - [info] Starting master failover. Thu Nov 16 17:17:59 2017 - [info] Thu Nov 16 17:17:59 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 16 17:17:59 2017 - [info] Thu Nov 16 17:17:59 2017 - [warning] SQL Thread is stopped(no error) on host_2(host_2:3306) Thu Nov 16 17:17:59 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Thu Nov 16 17:17:59 2017 - [info] GTID failover mode = 0 Thu Nov 16 17:17:59 2017 - [info] Dead Servers: Thu Nov 16 17:17:59 2017 - [info] host_1(host_1:3306) Thu Nov 16 17:17:59 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 16 17:18:00 2017 - [info] ok. Thu Nov 16 17:18:00 2017 - [info] Alive Servers: Thu Nov 16 17:18:00 2017 - [info] host_2(host_2:3306) Thu Nov 16 17:18:00 2017 - [info] host_3(host_3:3306) Thu Nov 16 17:18:00 2017 - [info] Alive Slaves: Thu Nov 16 17:18:00 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:18:00 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:18:00 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:18:00 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:18:00 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:18:00 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:18:00 2017 - [info] Starting SQL thread on host_2(host_2:3306) .. Thu Nov 16 17:18:00 2017 - [info] done. Thu Nov 16 17:18:00 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Thu Nov 16 17:18:00 2017 - [info] done. Thu Nov 16 17:18:00 2017 - [info] Starting Non-GTID based failover. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:00 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:00 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:50 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Thu Nov 16 17:18:50 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 16 17:18:50 2017 - [info] Executing master IP deactivation script: Thu Nov 16 17:18:50 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop ssh: connect to host host_1 port 22: Connection timed out =================== swift vip : tgw_vip from host_1 is deleted ============================== --2017-11-16 17:18:57-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 8.61M=0s 2017-11-16 17:20:55 (8.61 MB/s) - 已写入标准输出 [38] Thu Nov 16 17:20:55 2017 - [info] done. Thu Nov 16 17:20:55 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 16 17:20:55 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] The latest binary log file/position on all slaves is host_1.000012:3216 Thu Nov 16 17:20:55 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 16 17:20:55 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:55 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:55 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:20:55 2017 - [info] The oldest binary log file/position on all slaves is host_1.000012:2576 Thu Nov 16 17:20:55 2017 - [info] Oldest slaves: Thu Nov 16 17:20:55 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:55 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:55 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Thu Nov 16 17:20:55 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 16 17:20:55 2017 - [info] Checking whether host_2 has relay logs from the oldest position.. Thu Nov 16 17:20:55 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1.000012 --latest_rmlp=3216 --target_mlf=host_1.000012 --target_rmlp=2576 --server_id=1261261656 --workdir=/var/log/masterha/mha_test --timestamp=20171116171759 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 : Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:1581. Target relay log FOUND! Thu Nov 16 17:20:56 2017 - [info] OK. host_2 has all relay logs. Thu Nov 16 17:20:56 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 16 17:20:56 2017 - [info] Searching new master from slaves.. Thu Nov 16 17:20:56 2017 - [info] Candidate masters from the configuration file: Thu Nov 16 17:20:56 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:56 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:56 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:20:56 2017 - [info] Non-candidate masters: Thu Nov 16 17:20:56 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:56 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:56 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:20:56 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 16 17:20:56 2017 - [info] New master is host_2(host_2:3306) Thu Nov 16 17:20:56 2017 - [info] Starting master failover.. Thu Nov 16 17:20:56 2017 - [info] From: host_1(host_1:3306) (current master) +--host_2(host_2:3306) +--host_3(host_3:3306) To: host_2(host_2:3306) (new master) +--host_3(host_3:3306) Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Thu Nov 16 17:20:56 2017 - [info] Starting recovery on host_2(host_2:3306).. Thu Nov 16 17:20:56 2017 - [info] This server has all relay logs. Waiting all logs to be applied.. Thu Nov 16 17:20:56 2017 - [info] done. Thu Nov 16 17:20:56 2017 - [info] All relay logs were successfully applied. Thu Nov 16 17:20:56 2017 - [info] Getting new master's binlog name and position.. Thu Nov 16 17:20:56 2017 - [info] host_2.000012:3959 Thu Nov 16 17:20:56 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2.000012', MASTER_LOG_POS=3959, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 16 17:20:56 2017 - [info] Executing master IP activate script: Thu Nov 16 17:20:56 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_2 is added ============================== Thu Nov 16 17:20:59 2017 - [info] OK. Thu Nov 16 17:20:59 2017 - [info] ** Finished master recovery successfully. Thu Nov 16 17:20:59 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 77007. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116171759.log if it takes time.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Log messages from host_3 ... Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] Server host_3 received relay logs up to: host_1.000012:2576 Thu Nov 16 17:20:59 2017 - [info] Need to get diffs from the latest slave(host_2) up to: host_1.000012:3216 (using the latest slave's relay logs) Thu Nov 16 17:20:59 2017 - [info] Connecting to the latest slave host host_2, generating diff relay log files.. Thu Nov 16 17:20:59 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_1.000012 --latest_rmlp=3216 --target_mlf=host_1.000012 --target_rmlp=2576 --server_id=1261261656 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171116171759 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 Thu Nov 16 17:21:00 2017 - [info] Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:1581. Concat binary/relay logs from host_2-relay-bin.000002 pos 1581 to host_2-relay-bin.000002 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 315.. ok. Dumping effective binlog data from /data/mysql_data/host_2-relay-bin.000002 position 1581 to tail(2221).. ok. Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog . scp host_2.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog to root@host_3(22) succeeded. Thu Nov 16 17:21:00 2017 - [info] Generating diff files succeeded. Thu Nov 16 17:21:00 2017 - [info] End of log messages from host_3. Thu Nov 16 17:21:00 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Thu Nov 16 17:21:00 2017 - [info] Generating relay diff files from the latest slave succeeded. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 78627. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116171759.log if it takes time.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Log messages from host_3 ... Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Starting recovery on host_3(host_3:3306).. Thu Nov 16 17:21:00 2017 - [info] Generating diffs succeeded. Thu Nov 16 17:21:00 2017 - [info] Waiting until all relay logs are applied. Thu Nov 16 17:21:00 2017 - [info] done. Thu Nov 16 17:21:00 2017 - [info] Getting slave status.. Thu Nov 16 17:21:00 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000012:2576). No need to recover from Exec_Master_Log_Pos. Thu Nov 16 17:21:00 2017 - [info] Connecting to the target slave host host_3, running recover script.. Thu Nov 16 17:21:00 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171116171759 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Thu Nov 16 17:21:00 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog on host_3:3306. This may take long time... Applying log files succeeded. Thu Nov 16 17:21:00 2017 - [info] All relay logs were successfully applied. Thu Nov 16 17:21:00 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306).. Thu Nov 16 17:21:00 2017 - [info] Executed CHANGE MASTER. Thu Nov 16 17:21:00 2017 - [info] Slave started. Thu Nov 16 17:21:00 2017 - [info] End of log messages from host_3. Thu Nov 16 17:21:00 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Thu Nov 16 17:21:00 2017 - [info] All new slave servers recovered successfully. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Resetting slave info on the new master.. Thu Nov 16 17:21:00 2017 - [info] host_2: Resetting slave info succeeded. Thu Nov 16 17:21:00 2017 - [info] Master failover to host_2(host_2:3306) completed successfully. Thu Nov 16 17:21:00 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded Master host_1(host_1:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_1(host_1:3306) The latest slave host_2(host_2:3306) has all relay logs for recovery. Selected host_2(host_2:3306) as a new master. host_2(host_2:3306): OK: Applying all logs succeeded. host_2(host_2:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_2(host_2:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306) host_2(host_2:3306): Resetting slave info succeeded. Master failover to host_2(host_2:3306) completed successfully. ### 切换后的结果 new_master & new_etl 结果一致,由于master上面未传过来的binlog彻底丢失,所以相应的新集群也缺失这些数据。 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 最后一步很重要 如果dead master之后又活过来了,那么这一步要做 dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述
结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志
- 2.3.2 当master的所有日志已经传递slave,这时候master server挂了
测试省略,和2.3.1基本一样
结论:由于master上的所有日志全部传递到slave,所以最后是不会丢失master上任何数据的。
2.4 slave(候选master)上面有大事务在跑
- 1000s的大查询
同1.4结论
- flush tables with readlock
同1.4结论
2.6 如果MHA过程中失败,是否可以重新执行MHA的failover呢?
同1.6结论
三、遇到的坑
3.1 交互模式下,如果没有及时敲’YES’,则终止切换
3.2 如果在non-gtid模式的机器上,配置了binlog server,会有什么影响呢?
无影响
3.3 不要在relay-log的地方伪造日志
xx-relay-bin.000001 xx-relay-bin.000002 xx-relay-bin.000002.lc --这个是伪造的,当时这个日志是自己解析的 mysqlbinlog -vv xx-relay-bin.000002 > xx-relay-bin.000002.lc MHA在切换的时候报错如下: Reading xx-relay-bin.000002.lc Event too large: pos: 4, event_length: 1163083840, event_type: 32 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 103 Tue Nov 14 10:39:08 2017 - [warning] xx doesn't have all relay logs. Maybe some logs were purged. Tue Nov 14 10:39:08 2017 - [warning] None of latest servers have enough relay logs from oldest position. We can't recover oldest slaves. Tue Nov 14 10:39:08 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery. Tue Nov 14 10:39:08 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
3.4 flush tables with read lock 会阻塞
dba:(none)> show processlist; +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | 63 | dba | localhost | NULL | Query | 0 | starting | show processlist | | 65 | dba | xx:11164 | NULL | Sleep | 121 | | NULL | | 83 | dba | new master:49022 | NULL | Query | 176 | Waiting for global read lock | BINLOG ' GpAKWhNYUy1LMAAAAGYHAAAAAG0AAAAAAAEAAmxjAAh0X2NoYXJfMgACAw8CLAEC GpAKWh5YUy1LJwAAAI0HAAAAAG | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) 会卡在这一步 Tue Nov 14 15:16:23 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --------------Waiting for global read lock----------------- 那么后面的步骤都会卡住 * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --将 当前需要恢复的slave 和 latest slave之间的diff日志 传送给 需要恢复的 slave --将 latest slave 和 dead master之间的diff日志 传送给 需要恢复的 slave --将 刚刚的差异日志依次apply 到需要恢复的slave --Resetting slave , Executed CHANGE MASTER to new_master *Phase 5: New master cleanup phase.. Resetting slave info on the new master.
3.5 binlog伪造
Tue Nov 14 17:23:10 2017 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1.000003 --start_pos=4042 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171114172304.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_1.000003 pos 4042 to host_1.000050 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171114172304.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /data/mysql.bin/host_1.000003 position 4042 to tail(4065).. ok. Failed to save binary log: Target file /data/mysql.bin/host_1.000004 not found! at /usr/bin/save_binary_logs line 176 Tue Nov 14 17:23:10 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln760] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt? 测试: 有flush + 无binlog.xx + 1伪造 = save binlog --成功 有flush + 有binlog.xx + 1伪造 = save binlog --失败 无flush + 无binlog.xx + 1伪造 = save binlog --成功 无flush + 有binlog.xx + 1伪造 = save binlog --失败 说明,如果有伪造的binlog,且后缀比之前的大,那么就会告警 解析: /usr/bin/save_binary_logs ==》 generate_diff_binary_log => /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm ==》 concat_all_binlogs_from():: => start_num , end_num(如果出现伪造的binlog,则会报错)
四、总结
MHA + NON-GTID 模式,重点配置和用法如下:
1. command masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=$dead_master_ip --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error 2. tgw 清理 dead master 如果还可以起来,那么必须在上面执行: /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c 原因可参看:http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==> TGW 章节
五、 流程简介
* Phase 1: Configuration Check Phase.. HealthCheck: SSH N台DB是否reachable Binlog server: N台DB 是否reachable GTID failover mode = ? Dead Servers is ? Primary candidate for the new Master (candidate_master is set) ? * Phase 2: Dead Master Shutdown Phase.. Executing master IP deactivation script: TGW-vip delete操作 shutdown_script: ? * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. Latest slaves ,file position ? Oldest slaves , file position ? * Phase 3.2: Saving Dead Master's Binlog Phase.. Executing command on the dead master : save_binary_logs --command=save Concat binary/relay logs from latest_slave file_pos to master's binlog EOF scp from root@master_ip: binlog to local: xx.binlog --将master最新的,latest缺失的binlog传递到manager机器 * Phase 3.3: Determining New Master Phase.. 选择哪个slave为new master * Phase 3.3: New Master Diff Log Generation Phase.. Need to get diffs from the latest slave up to: xx (using the latest slave's relay logs) --生成new master和latest slave之间的diff日志 scp latest_ip:xx to new_master:xx --将new master和latest slave之间的diff日志 传送给 new master scp from local: xx to new_master: xx --将latest slave 和 dead master之间的diff日志 传送给 new master * Phase 3.4: Master Log Apply Phase.. new master 将之前的两段diff日志合并,然后开始apply All other slaves should start replication from here --开始生成change master语句 Executing master IP activate script: TGW-vip 激活操作,并且设置readonly=0 * Phase 4: Slaves Recovery Phase.. (并行操作) * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. --开始生成其余slave 和 latest 之间的diff日志 * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --将 当前需要恢复的slave 和 latest slave之间的diff日志 传送给 需要恢复的 slave --将 latest slave 和 dead master之间的diff日志 传送给 需要恢复的 slave --将 刚刚的差异日志依次apply 到需要恢复的slave --Resetting slave , Executed CHANGE MASTER to new_master * Phase 5: New master cleanup phase.. Resetting slave info on the new master.
注:本文内容来自互联网,旨在为开发者提供分享、交流的平台。如有涉及文章版权等事宜,请你联系站长进行处理。