While implementing HADR for multiple standbys, there could be potential issues if the HADR ports are not hard-coded on the HADR settings. Here’s what the issue looks like:
The primary is dbpp.example.com
The standbys are:
1st – dbps.example.com
2nd – dbpa-east.example.com
Customer was trying to add dbpa-east.example.com as a second standby. Primary, dbpp.example.com, complains like so:
2018-01-03-09.24.43.032460-480 E283856652E553 LEVEL: Error (OS)
PID : 26091 TID : 46961361151744 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbpp.example.com
EDUID : 1586 EDUNAME: db2hadrp.0.2 (SAMPLE) 0
FUNCTION: Db2 UDB, oper system services, sqloPdbQuerySocketErrorStatus, probe:15
MESSAGE : ZRC=0x810F0077=-2129723273=SQLO_COMM_ERR_EHOSTUNREACH
"No route to host"
CALLED : OS, -, getsockopt OSERR: EHOSTUNREACH (113)
2018-01-03-09.24.55.001353-480 I283857206E468 LEVEL: Warning
PID : 26091 TID : 46961365346048 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbpp.example.com
EDUID : 1585 EDUNAME: db2hadrp.0.1 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrHandleRemoteConn, probe:30160
MESSAGE : TCP socket connection accepted. Remote Host: X.X.XX.XX Port: 26324
2018-01-03-09.24.55.102096-480 I283857675E508 LEVEL: Info
PID : 26091 TID : 46961365346048 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbpp.example.com
EDUID : 1585 EDUNAME: db2hadrp.0.1 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:43900
DATA #1 : <preformatted>
Handshake HDR_MSG_HDRHS message is received from dbp-east.example.com:db2_hadra (X.X.XX.XX:50052)
2018-01-03-09.24.55.103951-480 E283858960E555 LEVEL: Error
PID : 26091 TID : 46961365346048 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbpp.example.com
EDUID : 1585 EDUNAME: db2hadrp.0.1 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:30420
MESSAGE : ADM12513E Unable to establish HADR primary-standby connection
because the primary and standby databases are incompatible. Reason
code: "4"
2018-01-03-09.24.55.106072-480 I283859516E669 LEVEL: Warning
PID : 26091 TID : 46961365346048 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbpp.example.com
EDUID : 1585 EDUNAME: db2hadrp.0.1 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrSendRejectionIfNeeded, probe:30490
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
DATA #1 : <preformatted>
A rejection message sent to HADR_LOCAL_HOST:HADR_LOCAL_SVC is dbp-east.example.com:db2_hadra (X.X.XX.XX:50052)
The second standby, dbp-east.example.com, reports the above failure to establish HADR:
2018-01-03-09.24.55.121626-480 I121560940E513 LEVEL: Info
PID : 19286 TID : 139664841238272 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbp-east.example.com
EDUID : 934 EDUNAME: db2hadrs.0.0 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:43900
DATA #1 : <preformatted>
Handshake HDR_MSG_HDRREJECT message is received from dbpp.example.com:db2_hadrs (X.X.X.X:50051)
2018-01-03-09.24.55.121946-480 I121561454E631 LEVEL: Error
PID : 19286 TID : 139664841238272 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbp-east.example.com
EDUID : 934 EDUNAME: db2hadrs.0.0 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:43901
MESSAGE : ZRC=0x87800140=-2021654208=HDR_ZRC_CONFIGURATION_ERROR
"One or both databases of the HADR pair is configured incorrectly"
DATA #1 : <preformatted>
HADR handshake with dbpp.example.com:db2_hadrs (X.X.X.X:50051) failed.
2018-01-03-09.24.55.123167-480 I121563753E528 LEVEL: Error
PID : 19286 TID : 139664841238272 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
HOSTNAME: dbp-east.example.com
EDUID : 934 EDUNAME: db2hadrs.0.0 (SAMPLE) 0
FUNCTION: Db2 UDB, High Availability Disaster Recovery, hdrEdu::hdrEduS, probe:21480
MESSAGE : ZRC=0x87800140=-2021654208=HDR_ZRC_CONFIGURATION_ERROR
"One or both databases of the HADR pair is configured incorrectly"
There was a service/port difference in the handshake above:
"MESSAGE : TCP socket connection accepted. Remote Host: X.X.XX.XX Port: 26324"
vs.
"Handshake HDR_MSG_HDRHS message is received from dbp-east.example.com:db2_hadra (X.X.XX.XX:50052)"
Notice the rejection was sent to X.X.XX.XX:50052. This was because port 26324 came in first, but is not defined on the primary server, dbpp.example.com, or the intended 2nd standby server, dbp-east.example.com. This is what all three hosts /etc/services look like for the HADR service:
db2_hadrp 50050/tcp
db2_hadrp 50050/udp
db2_hadrs 50051/tcp
db2_hadrs 50051/udp
db2_hadra 50052/tcp
db2_hadra 50052/udp
Where is port 26324 coming from? The work-around to this problem was hard-coding the port numbers instead of using the service names (db2_hadrp/db2_hadrs/db2_hadra) in the db cfg across all three hosts.