Actions
Bug #37965
closedrados/upgrade test fails
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
recent regression. looking at /a/sage-2019-01-18_06:11:36-rados-wip-sage-testing-2019-01-17-2111-distro-basic-smithi/3478286, trouble starts when osd.11 is marked down in e3254 for some reason but the mon log doesn't seem to indicate that it did it on purpose.
in 3254 incremental,
"osd_state_xor": [ { "osd": 11, "state_xor": [ "up" ] } ],
but the mon log doesn't mention marking osd.11 down
2019-01-18 09:14:03.905 7f067808a700 2 mon.c@0(leader).osd e3252 osd.11 UP v1:172.21.15.28:6805/310571 2019-01-18 09:14:03.909 7f067207e700 20 mon.c@0(leader).osd e3253 check_pg_creates_sub .. osd.11 2019-01-18 09:14:03.910 7f067207e700 7 mon.c@0(leader).osd e3253 _booted osd.11 v1:172.21.15.28:6805/310571 w 0 from 3251 2019-01-18 09:14:03.910 7f067207e700 0 log_channel(cluster) log [INF] : osd.11 v1:172.21.15.28:6805/310571 boot 2019-01-18 09:14:03.910 7f067207e700 5 mon.c@0(leader).osd e3253 send_latest to osd.11 v1:172.21.15.28:6805/310571 start 3252 2019-01-18 09:14:03.910 7f067207e700 5 mon.c@0(leader).osd e3253 send_incremental [3252..3253] to osd.11 2019-01-18 09:14:03.911 7f0675885700 10 mon.c@0(leader).log v5658 logging 2019-01-18 09:14:03.912496 mon.c (mon.0) 376 : cluster [INF] osd.11 v1:172.21.15.28:6805/310571 boot 2019-01-18 09:14:03.911 7f0677088700 20 mon.c@0(leader).osd e3253 check_pg_creates_sub .. osd.11 2019-01-18 09:14:03.929 7f0673080700 20 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=OPENED pgs=3 cs=1 l=1).handle_message_header got envelope type=73 src osd.11 front=22 data=0 off 0 2019-01-18 09:14:03.929 7f0673080700 5 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx osd.11 seq 11 0x55c9ea8cfa40 osd_alive(want up_thru 3253 have 3253) v1 2019-01-18 09:14:03.929 7f0675885700 1 -- v1:172.21.15.28:6789/0 <== osd.11 v1:172.21.15.28:6805/310571 11 ==== osd_alive(want up_thru 3253 have 3253) v1 ==== 22+0+0 (3463081574 0 0) 0x55c9ea8cfa40 con 0x55c9ea61cc00 2019-01-18 09:14:03.929 7f0675885700 20 mon.c@0(leader) e2 _ms_dispatch existing session 0x55c9eb093dc0 for osd.11 2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).paxosservice(osdmap 501..3253) dispatch 0x55c9ea8cfa40 osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571 con 0x55c9ea61cc00 2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_query osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_alive want up_thru 3253 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:03.929 7f0675885700 7 mon.c@0(leader).osd e3253 prepare_update osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:03.929 7f0675885700 7 mon.c@0(leader).osd e3253 prepare_alive want up_thru 3253 have 3253 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:03.934 7f067207e700 7 mon.c@0(leader).log v5659 update_from_paxos applying incremental log 5659 2019-01-18 09:14:03.912496 mon.c (mon.0) 376 : cluster [INF] osd.11 v1:172.21.15.28:6805/310571 boot 2019-01-18 09:14:04.111 7f0673080700 20 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=OPENED pgs=3 cs=1 l=1).handle_message_header got envelope type=79 src osd.11 front=26 data=0 off 0 2019-01-18 09:14:04.111 7f0673080700 5 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx osd.11 seq 12 0x55c9eb0f2000 osd_beacon(pgs [] lec 0 v3253) v1 2019-01-18 09:14:04.111 7f0675885700 1 -- v1:172.21.15.28:6789/0 <== osd.11 v1:172.21.15.28:6805/310571 12 ==== osd_beacon(pgs [] lec 0 v3253) v1 ==== 26+0+0 (480548915 0 0) 0x55c9eb0f2000 con 0x55c9ea61cc00 2019-01-18 09:14:04.111 7f0675885700 20 mon.c@0(leader) e2 _ms_dispatch existing session 0x55c9eb093dc0 for osd.11 2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).paxosservice(osdmap 501..3253) dispatch 0x55c9eb0f2000 osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571 con 0x55c9ea61cc00 2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_query osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader) e2 no_reply to osd.11 v1:172.21.15.28:6805/310571 osd_beacon(pgs [] lec 0 v3253) v1 2019-01-18 09:14:04.111 7f0675885700 7 mon.c@0(leader).osd e3253 prepare_update osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571 2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).osd e3253 prepare_beacon osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 2019-01-18 09:14:04.909 7f067808a700 2 mon.c@0(leader).osd e3253 osd.11 DOWN
Updated by Sage Weil about 5 years ago
- Status changed from In Progress to Can't reproduce
Actions