1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206
|
#######################################################################################
# WL11570 - GR: options to defer member eviction after a suspicion
#
# In a group of 4 servers, we suspend one of them for 300 seconds and test if it is
# possible to remove a node. The current behavior does not allow the removal of a
# node from the group and this is verified.
#
# 1. Create a group with 4 servers and a table on it.
# 2. Set the group_replication_member_expel_timeout parameter to 300 seconds
# 3. Suspend server 3 by sending a signal SIGSTOP to it.
# This will make server 3 not answer to "I am alive" GCS messages and it will
# eventually be considered faulty.
# 4. Check that all members are still in the group on servers 1, 2 and 4, which should
# be ONLINE.
# Server 3 should still be in the group but UNREACHABLE.
# 5. Make server 4 leave the group and notice that it is reported as ONLINE and server
# 3 as UNREACHABLE.
# 6. Reset the group_replication_member_expel_timeout parameter to 0 seconds thus
# forcing server 3 to be expelled.
# 7. Make server 4 successfully join the group after reseting the above option.
# 8. Resume server 3 and make it rejoin the group.
# 9. Clean up.
#######################################################################################
# Don't test this under valgrind, memory leaks will occur
--source include/not_valgrind.inc
# Test involves sending SIGSTOP and SIGCONT signals using kill Linux command.
--source include/linux.inc
--source include/big_test.inc
--source include/force_restart.inc
--source include/have_group_replication_plugin.inc
--echo
--echo ############################################################
--echo # 1. Create a group with 4 members and a table on it.
--let $rpl_server_count= 4
--source include/group_replication.inc
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
--let $server1_local_address= `SELECT @@GLOBAL.group_replication_local_address`
--echo #
--echo # Create a table
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
CREATE TABLE t1 (c1 INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
--source include/rpl_sync.inc
--let $rpl_connection_name= server4
--source include/rpl_connection.inc
set session sql_log_bin=0;
call mtr.add_suppression("Timeout on wait for view after joining group");
call mtr.add_suppression("The member is leaving a group without being on one");
call mtr.add_suppression("The member has left the group but the new view will not be installed");
set session sql_log_bin=1;
--echo
--echo ############################################################
--echo # 2. Set group_replication_member_expel_timeout to
--echo # 300 seconds.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 300;
SELECT @@GLOBAL.group_replication_member_expel_timeout;
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 300;
SELECT @@GLOBAL.group_replication_member_expel_timeout;
--let $rpl_connection_name= server3
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 300;
SELECT @@GLOBAL.group_replication_member_expel_timeout;
--let $rpl_connection_name= server4
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 300;
SELECT @@GLOBAL.group_replication_member_expel_timeout;
--echo
--echo ############################################################
--echo # 3. Suspend server 3 by sending signal SIGSTOP to it.
--echo # This will make server 3 not answer to "I am alive"
--echo # GCS messages and it will eventually be considered
--echo # faulty.
--let $rpl_connection_name= server3
--source include/rpl_connection.inc
--echo #
--echo # Get server 3 pid.
SET SESSION sql_log_bin= 0;
CREATE TABLE pid_table(pid_no INT);
--let $pid_file= `SELECT @@GLOBAL.pid_file`
--replace_result $pid_file pid_file
--eval LOAD DATA LOCAL INFILE '$pid_file' INTO TABLE pid_table
--let $server_pid=`SELECT pid_no FROM pid_table`
DROP TABLE pid_table;
SET SESSION sql_log_bin= 1;
--echo #
--echo # Suspending server 3...
--exec kill -19 $server_pid
--echo
--echo ############################################################
--echo # 4. Check that all members are still in the group on
--echo # servers 1, 2 and 4 which should be ONLINE.
--echo # Server 3 should still be in the group but UNREACHABLE.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
let $wait_condition=SELECT COUNT(*)=3 FROM performance_schema.replication_group_members where MEMBER_STATE="ONLINE";
--source include/wait_condition.inc
let $wait_condition=SELECT COUNT(*)=1 FROM performance_schema.replication_group_members where MEMBER_STATE="UNREACHABLE";
--source include/wait_condition.inc
--echo
--echo ############################################################
--echo # 5. Make server 4 leave the group and notice that it is
--echo # reported as ONLINE and server 3 as UNREACHABLE.
--echo #
--echo # Stop GR on server4.
--let $rpl_connection_name= server4
--source include/rpl_connection.inc
--source include/stop_group_replication.inc
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
let $wait_condition=SELECT COUNT(*)=3 FROM performance_schema.replication_group_members where MEMBER_STATE="ONLINE";
--source include/wait_condition.inc
let $wait_condition=SELECT COUNT(*)=1 FROM performance_schema.replication_group_members where MEMBER_STATE="UNREACHABLE";
--source include/wait_condition.inc
--echo
--echo ############################################################
--echo # 6. Reset the group_replication_member_expel_timeout
--echo # parameter to 0 seconds thus forcing server 3 to be
--echo # expelled.
--echo #
--echo # Reset the group_replication_member_expel_timeout to 0.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 0;
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 0;
--let $rpl_connection_name= server4
--source include/rpl_connection.inc
SET GLOBAL group_replication_member_expel_timeout = 0;
--echo #
--echo # Wait until until server 3 is expelled and 2 servers are online
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
let $wait_condition=SELECT COUNT(*)=2 FROM performance_schema.replication_group_members where MEMBER_STATE="ONLINE";
--source include/wait_condition.inc
--echo
--echo #############################################################
--echo # 7. Make server 4 successfully join the group after reseting
--echo # the above option.
--echo #
--echo # Start GR on server4
--let $rpl_connection_name= server4
--source include/rpl_connection.inc
--replace_result $server1_local_address SERVER1_LOCAL_ADDRESS
--eval SET GLOBAL group_replication_group_seeds= "$server1_local_address"
--source include/start_group_replication.inc
--echo #############################################################
--echo # 8. Resume server 3 and make it rejoin the group
--echo #
--exec kill -18 $server_pid
--let $rpl_connection_name= server3
--source include/rpl_connection.inc
# Due to BUG#28068548 and BUG#28224165 we cannot check the server status as it
# may report that the server never left the group.
#let $wait_condition=SELECT COUNT(*)=1 FROM performance_schema.replication_group_members where MEMBER_STATE="ONLINE" or MEMBER_STATE="ERROR";
#--source include/wait_condition.inc
--source include/stop_group_replication.inc
--source include/start_group_replication.inc
--echo #
--echo # Wait until until all servers are online again
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
let $wait_condition=SELECT COUNT(*)=4 FROM performance_schema.replication_group_members where MEMBER_STATE="ONLINE";
--source include/wait_condition.inc
--echo
--echo ############################################################
--echo # 9. Clean up.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
DROP TABLE t1;
--source include/rpl_sync.inc
--source include/group_replication_end.inc
|