Project

General

Profile

Bug #19939

OSD crash in MOSDRepOpReply::decode_payload

Added by John Spray 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
05/16/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No
Component(RADOS):
OSD

Description

Seen on kcephfs suite, running against test branch based on Monday's master.

2017-05-16T10:15:48.415 INFO:tasks.ceph.osd.1.smithi158.stderr:/build/ceph-12.0.2-1210-gb130515/src/osd/osd_types.h: In function 'static void osd_reqid_t::_denc_finish(ceph::buffer::ptr::iterator&, __u8*, __u8*, char**, uint32_t*)' thread 7fa92256f700 time 2017-05-16 10:15:48.416215
2017-05-16T10:15:48.416 INFO:tasks.ceph.osd.1.smithi158.stderr:/build/ceph-12.0.2-1210-gb130515/src/osd/osd_types.h: 117: FAILED assert(pos <= end)
2017-05-16T10:15:48.416 INFO:tasks.ceph.osd.0.smithi158.stderr:/build/ceph-12.0.2-1210-gb130515/src/osd/osd_types.h: In function 'static void osd_reqid_t::_denc_finish(ceph::buffer::ptr::iterator&, __u8*, __u8*, char**, uint32_t*)' thread 7f36a8f2f700 time 2017-05-16 10:15:48.416393
2017-05-16T10:15:48.416 INFO:tasks.ceph.osd.0.smithi158.stderr:/build/ceph-12.0.2-1210-gb130515/src/osd/osd_types.h: 117: FAILED assert(pos <= end)
2017-05-16T10:15:48.417 INFO:tasks.ceph.osd.1.smithi158.stderr: ceph version 12.0.2-1210-gb130515 (b13051530ef8dca97f2a6fc0558a898dd02eb8e1)
2017-05-16T10:15:48.417 INFO:tasks.ceph.osd.1.smithi158.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x55c29afc534e]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 2: (std::enable_if<denc_traits<osd_reqid_t, void>::supported&&(!denc_traits<osd_reqid_t, void>::featured), void>::type decode<osd_reqid_t, denc_traits<osd_reqid_t, void> >(osd_reqid_t&, ceph::buffer::list::iterator&)+0x198) [0x55c29ab7f0b8]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 3: (MOSDRepOpReply::decode_payload()+0x8e) [0x55c29ad3e9ee]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 4: (decode_message(CephContext*, int, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&, ceph::buffer::list&, Connection*)+0xe37) [0x55c29aff5a77]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 5: (AsyncConnection::process()+0x1a64) [0x55c29b1c5ac4]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 6: (EventCenter::process_events(int)+0x301) [0x55c29b04e811]
2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 7: (()+0xa5129a) [0x55c29b05129a]
2017-05-16T10:15:48.419 INFO:tasks.ceph.osd.1.smithi158.stderr: 8: (()+0xb1a60) [0x7fa9251a3a60]
2017-05-16T10:15:48.419 INFO:tasks.ceph.osd.1.smithi158.stderr: 9: (()+0x8184) [0x7fa92581b184]
2017-05-16T10:15:48.419 INFO:tasks.ceph.osd.1.smithi158.stderr: 10: (clone()+0x6d) [0x7fa92490b37d]

I assume that since it's a MOSDRepOpReply this is the OSD's fault, and not something bogus from the client.

http://pulpito.ceph.com/jspray-2017-05-16_09:43:25-kcephfs-wip-jcsp-testing-20170515b-testing-basic-smithi/1184359

kcephfs/thrash/{clusters/fixed-3-cephfs.yaml conf.yaml objectstore/filestore-xfs.yaml thrashers/mon.yaml workloads/kclient_workunit_suites_iozone.yaml}

History

#1 Updated by John Spray 3 months ago

More instances from last night's master:
- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222341

  • fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml objectstore/filestore-xfs.yaml overrides/{debug.yaml frag_enable.yaml whitelist_wrongly_marked_down.yaml} tasks/snaptests.yaml}

- http://pulpito.ceph.com/jspray-2017-05-23_22:31:39-fs-master-distro-basic-smithi/1222394

  • fs/thrash/{begin.yaml ceph-thrash/default.yaml clusters/mds-1active-1standby.yaml mount/fuse.yaml msgr-failures/osd-mds-delay.yaml objectstore/filestore-xfs.yaml overrides/{debug.yaml frag_enable.yaml whitelist_wrongly_marked_down.yaml} tasks/cfuse_workunit_snaptests.yaml}

#2 Updated by Josh Durgin 3 months ago

  • Assignee set to Kefu Chai

Kefu, could you take a look at this one? Not sure if it's related to recent denc changes, or perhaps https://github.com/ceph/ceph/pull/15092

#3 Updated by Kefu Chai 2 months ago

John, sorry. i missed this. will take a look at it next monday.

#4 Updated by Greg Farnum 2 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
  • Component(RADOS) OSD added

#6 Updated by Kefu Chai 2 months ago

all passed modulo a valgrind error in ceph-mds, see /a/kchai-2017-06-19_09:40:27-fs-master---basic-smithi/1300881/remote/smithi113/log/valgrind/mds.a.log.gz, tracked at #20338

#7 Updated by John Spray 2 months ago

Unless there was a patch, I wouldn't be too sure this is fixed -- it was an intermittent failure.

#8 Updated by Kefu Chai 2 months ago

2017-05-16T10:15:48.418 INFO:tasks.ceph.osd.1.smithi158.stderr: 2: (std::enable_if<denc_traits<osd_reqid_t, void>::supported&&(!denc_traits<osd_reqid_t, void>::featured), void>::type decode<osd_reqid_t, denc_traits<osd_reqid_t, void> >(osd_reqid_t&, ceph::buffer::list::iterator&)+0x198) [0x55c29ab7f0b8]

77fbfc29d15e3b9b05f7434bea8792077a18aa42 does not contain the denc changes i introduced in 77fbfc29d15e3b9b05f7434bea8792077a18aa42. so it's not caused by my change. but i don't think my change will fix it either. as osd_reqid_t does have traits::need_contiguous = true.

also, the payload can never be discontiguous. see AsyncConnection::process(), case STATE_OPEN_MESSAGE_READ_FRONT:

        {
          // read front
          unsigned front_len = current_header.front_len;
          if (front_len) {
            if (!front.length())
              front.push_back(buffer::create(front_len));

            r = read_until(front_len, front.c_str());
            if (r < 0) {
              ldout(async_msgr->cct, 1) << __func__ << " read message front failed" << dendl;
              goto fail;
            } else if (r > 0) {
              break;
            }

            ldout(async_msgr->cct, 20) << __func__ << " got front " << front.length() << dendl;
          }
          state = STATE_OPEN_MESSAGE_READ_MIDDLE;
        }

so i am not sure how this could happen.

#9 Updated by Patrick Donnelly about 2 months ago

Here's another one:

/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333648

fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml objectstore/bluestore.yaml overrides/{debug.yaml frag_enable.yaml whitelist_wrongly_marked_down.yaml} tasks/snaptests.yaml}

#10 Updated by Greg Farnum about 2 months ago

  • Priority changed from High to Urgent

Kefu, any new updates or should this be unassigned from you?

#11 Updated by Kefu Chai about 2 months ago

  • Status changed from New to Need Review

#12 Updated by Kefu Chai about 2 months ago

  • Backport set to jewl, kraken

#13 Updated by Sage Weil about 2 months ago

  • Status changed from Need Review to Testing

#14 Updated by Nathan Cutler about 2 months ago

  • Backport changed from jewl, kraken to jewel, kraken

#15 Updated by Greg Farnum about 2 months ago

  • Status changed from Testing to Verified

#16 Updated by Kefu Chai about 1 month ago

  • Assignee deleted (Kefu Chai)

#17 Updated by Kefu Chai about 1 month ago

i will look at this issue again later on if no progress has been made before then.

#18 Updated by Patrick Donnelly about 1 month ago

Another one: /ceph/teuthology-archive/pdonnell-2017-07-07_20:24:01-fs-wip-pdonnell-20170706-distro-basic-smithi/1372305/teuthology.log

fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml objectstore/filestore-xfs.yaml overrides/{debug.yaml frag_enable.yaml whitelist_wrongly_marked_down.yaml} tasks/snaptests.yaml}

#19 Updated by Kefu Chai about 1 month ago

  • Description updated (diff)

#20 Updated by Kefu Chai about 1 month ago

  • Assignee set to Kefu Chai

#21 Updated by Kefu Chai about 1 month ago

i am able to reproduce this issue using qa/workunits/fs/snaps/untar_snap_rm.sh. but not always...

#22 Updated by Kefu Chai about 1 month ago

i found that the header.version of the MOSDRepOpReply message being decoded was 1. but i am using a vstart cluster for testing, so all OSDs are luminous. hence the header.version should be 2. i think that's why the reqid failed to decode.

#23 Updated by Greg Farnum about 1 month ago

We set it to 1 if the MSODRepOpReply is encoded with features that do not contain SERVER_LUMINOUS.

...which I think a connection from a client won't have? Not sure if that's related or not.

#24 Updated by Kefu Chai about 1 month ago

MSODRepOpReply is always sent by OSD.

core dump from osd.1

(gdb)  f 13
#13 0x000055ca58c13634 in decode_message (cct=0x55ca62c50d20, crcflags=3, header=..., footer=..., front=..., middle=..., data=..., conn=0x55ca63e1f800)
    at /var/ceph/ceph/src/msg/Message.cc:839
(gdb) p header
$20 = (ceph_msg_header &) @0x55ca63e20b58: {seq = {v = 1}, tid = {v = 5074}, type = {v = 113}, priority = {v = 196}, version = {v = 1}, front_len = {v = 111}, middle_len = {
    v = 0}, data_len = {v = 0}, data_off = {v = 0}, src = {type = 4 '\004', num = {v = 2}}, compat_version = {v = 1}, reserved = {v = 0}, crc = {v = 1040714380}}
(gdb) p p.p
$57 = {_raw = 0x55ca6563ea30, _off = 0, _len = 111}
(gdb) p *(ceph::buffer::raw*)0x55ca6563ea30
$3 = {_vptr.raw = 0x55ca59d37308 <vtable for ceph::buffer::raw_combined+16>, data = 0x55ca6563e9c0 "\025", len = 111, nref = {<std::__atomic_base<unsigned int>> = {
      static _S_alignment = 4, _M_i = 2}, <No data fields>}, mempool = 10, crc_spinlock = {<std::__atomic_flag_base> = {_M_i = false}, <No data fields>},
  crc_map = std::map with 1 element = {[{first = 0, second = 111}] = {first = 0, second = 2758781822}}}
(gdb)  p (uint32_t[2])(*0x55ca6563e9c0)
$61 = {21, 16}

log from osd.2.log. and the address of osd.1 is 127.0.0.1:6805

2017-07-19 14:55:07.735281 7fd565533700  1 -- 127.0.0.1:6814/1050987 --> 127.0.0.1:6805/50779 -- osd_repop_reply(client.4120.0:8880 1.0 e21/16 ack, result = 0) v2 -- 0x5607cc551c00 con 0

in this case, assuming the header is not corrupted. it was sent from osd.2 per the "src" field.

assuming "header.version>=2", if we decode the payload, the epoch and min_epoch are 21 and 16 respectively. which matches the previous messages sent from osd.2.

if i dump the first 49 bytes header without the trailing crc, and run it through ceph_crc32c(), the returned digest is 1040714380. which matches with header.crc.

$ grep 'osd_repop_reply' osd.*.log | grep -w v1

also, nobody claims that it ever sends or receives a v1 osd_repop_reply.

#25 Updated by Kefu Chai about 1 month ago

occasionally, i see

2017-07-19 09:31:31.611176 7fc8d016d700 10 osd.1 15 OSD::ms_get_authorizer type=osd
2017-07-19 09:31:31.611362 7fc8cf16b700 -1 failed to decode message of type 70 v3: buffer::malformed_input: void osd_peer_stat_t::decode(ceph::buffer::list::iterator&) no longer understand old encoding version 1 < struct_compat
2017-07-19 09:31:31.611381 7fc8d016d700 10 osd.1 15  new session (outgoing) 0x5643b88e0000 con=0x5643ba960800 addr=127.0.0.1:6826/31872
2017-07-19 09:31:31.611455 7fc

in jenkins' make check output. type 70 is MSG_OSD_PING. not sure if this is related.

#26 Updated by Kefu Chai about 1 month ago

  • Status changed from Verified to Need Review
  • Backport deleted (jewel, kraken)

#27 Updated by Kefu Chai about 1 month ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF