Project

General

Profile

Actions

Bug #37965

closed

rados/upgrade test fails

Added by Sage Weil about 5 years ago. Updated about 5 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

recent regression. looking at /a/sage-2019-01-18_06:11:36-rados-wip-sage-testing-2019-01-17-2111-distro-basic-smithi/3478286, trouble starts when osd.11 is marked down in e3254 for some reason but the mon log doesn't seem to indicate that it did it on purpose.

in 3254 incremental,

    "osd_state_xor": [
        {
            "osd": 11,
            "state_xor": [
                "up" 
            ]
        }
    ],

but the mon log doesn't mention marking osd.11 down

2019-01-18 09:14:03.905 7f067808a700  2 mon.c@0(leader).osd e3252  osd.11 UP v1:172.21.15.28:6805/310571
2019-01-18 09:14:03.909 7f067207e700 20 mon.c@0(leader).osd e3253 check_pg_creates_sub .. osd.11
2019-01-18 09:14:03.910 7f067207e700  7 mon.c@0(leader).osd e3253 _booted osd.11 v1:172.21.15.28:6805/310571 w 0 from 3251
2019-01-18 09:14:03.910 7f067207e700  0 log_channel(cluster) log [INF] : osd.11 v1:172.21.15.28:6805/310571 boot
2019-01-18 09:14:03.910 7f067207e700  5 mon.c@0(leader).osd e3253 send_latest to osd.11 v1:172.21.15.28:6805/310571 start 3252
2019-01-18 09:14:03.910 7f067207e700  5 mon.c@0(leader).osd e3253 send_incremental [3252..3253] to osd.11
2019-01-18 09:14:03.911 7f0675885700 10 mon.c@0(leader).log v5658  logging 2019-01-18 09:14:03.912496 mon.c (mon.0) 376 : cluster [INF] osd.11 v1:172.21.15.28:6805/310571 boot
2019-01-18 09:14:03.911 7f0677088700 20 mon.c@0(leader).osd e3253 check_pg_creates_sub .. osd.11
2019-01-18 09:14:03.929 7f0673080700 20 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=OPENED pgs=3 cs=1 l=1).handle_message_header got envelope type=73 src osd.11 front=22 data=0 off 0
2019-01-18 09:14:03.929 7f0673080700  5 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx osd.11 seq 11 0x55c9ea8cfa40 osd_alive(want up_thru 3253 have 3253) v1
2019-01-18 09:14:03.929 7f0675885700  1 -- v1:172.21.15.28:6789/0 <== osd.11 v1:172.21.15.28:6805/310571 11 ==== osd_alive(want up_thru 3253 have 3253) v1 ==== 22+0+0 (3463081574 0 0) 0x55c9ea8cfa40 con 0x55c9ea61cc00
2019-01-18 09:14:03.929 7f0675885700 20 mon.c@0(leader) e2 _ms_dispatch existing session 0x55c9eb093dc0 for osd.11
2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).paxosservice(osdmap 501..3253) dispatch 0x55c9ea8cfa40 osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571 con 0x55c9ea61cc00
2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_query osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:03.929 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_alive want up_thru 3253 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:03.929 7f0675885700  7 mon.c@0(leader).osd e3253 prepare_update osd_alive(want up_thru 3253 have 3253) v1 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:03.929 7f0675885700  7 mon.c@0(leader).osd e3253 prepare_alive want up_thru 3253 have 3253 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:03.934 7f067207e700  7 mon.c@0(leader).log v5659 update_from_paxos applying incremental log 5659 2019-01-18 09:14:03.912496 mon.c (mon.0) 376 : cluster [INF] osd.11 v1:172.21.15.28:6805/310571 boot
2019-01-18 09:14:04.111 7f0673080700 20 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=OPENED pgs=3 cs=1 l=1).handle_message_header got envelope type=79 src osd.11 front=26 data=0 off 0
2019-01-18 09:14:04.111 7f0673080700  5 --1- v1:172.21.15.28:6789/0 >> v1:172.21.15.28:6805/310571 conn(0x55c9ea61cc00 0x55c9ea666c00 :6789 s=READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx osd.11 seq 12 0x55c9eb0f2000 osd_beacon(pgs [] lec 0 v3253) v1
2019-01-18 09:14:04.111 7f0675885700  1 -- v1:172.21.15.28:6789/0 <== osd.11 v1:172.21.15.28:6805/310571 12 ==== osd_beacon(pgs [] lec 0 v3253) v1 ==== 26+0+0 (480548915 0 0) 0x55c9eb0f2000 con 0x55c9ea61cc00
2019-01-18 09:14:04.111 7f0675885700 20 mon.c@0(leader) e2 _ms_dispatch existing session 0x55c9eb093dc0 for osd.11
2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).paxosservice(osdmap 501..3253) dispatch 0x55c9eb0f2000 osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571 con 0x55c9ea61cc00
2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).osd e3253 preprocess_query osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader) e2 no_reply to osd.11 v1:172.21.15.28:6805/310571 osd_beacon(pgs [] lec 0 v3253) v1
2019-01-18 09:14:04.111 7f0675885700  7 mon.c@0(leader).osd e3253 prepare_update osd_beacon(pgs [] lec 0 v3253) v1 from osd.11 v1:172.21.15.28:6805/310571
2019-01-18 09:14:04.111 7f0675885700 10 mon.c@0(leader).osd e3253 prepare_beacon osd_beacon(pgs [] lec 0 v3253) v1 from osd.11
2019-01-18 09:14:04.909 7f067808a700  2 mon.c@0(leader).osd e3253  osd.11 DOWN

Actions #1

Updated by Sage Weil about 5 years ago

  • Status changed from In Progress to Can't reproduce
Actions

Also available in: Atom PDF