Project

General

Profile

Bug #43827

decode fail in SessionMapStore::decode_legacy on upgrade

Added by Sage Weil 2 months ago. Updated 2 months ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

/a/sage-2020-01-26_15:00:33-upgrade:cephfs-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4709313 (and the whole set of tests actually)

2020-01-26T15:27:59.105 INFO:teuthology.orchestra.run.smithi178.stderr:2020-01-26T15:27:59.100+0000 7f6ec7fff700  1 -- 172.21.15.178:0/2572686074 <== mon.0 v2:172.21.15.178:3300/0 6 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0  v0) v1 ==== 72+0+116194 (secure 0 0 0) 0x7f6eb8033cd0 con 0x7f6eb4003f60
2020-01-26T15:27:59.119 INFO:tasks.ceph.mds.b.smithi178.stderr:terminate called after throwing an instance of 'ceph::buffer::v14_2_0::end_of_buffer'
2020-01-26T15:27:59.124 INFO:tasks.ceph.mds.b.smithi178.stderr:  what():  buffer::end_of_buffer
2020-01-26T15:27:59.124 INFO:tasks.ceph.mds.b.smithi178.stderr:*** Caught signal (Aborted) **
2020-01-26T15:27:59.124 INFO:tasks.ceph.mds.b.smithi178.stderr: in thread 7f44d8b5a700 thread_name:MR_Finisher
2020-01-26T15:27:59.131 INFO:tasks.ceph.mds.b.smithi178.stderr: ceph version 15.0.0-9766-g084f050 (084f0502288c3898c3a27b35934ea510597b049d) octopus (dev)
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 1: (()+0x12890) [0x7f44e5d58890]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 2: (gsignal()+0xc7) [0x7f44e4e50e97]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 3: (abort()+0x141) [0x7f44e4e52801]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 4: (()+0x8c957) [0x7f44e5845957]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 5: (()+0x92ab6) [0x7f44e584bab6]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 6: (()+0x92af1) [0x7f44e584baf1]
2020-01-26T15:27:59.132 INFO:tasks.ceph.mds.b.smithi178.stderr: 7: (()+0x92d24) [0x7f44e584bd24]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 8: (()+0x26a540) [0x7f44e643f540]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 9: (()+0x57ff67) [0x7f44e6754f67]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 10: (SessionMapStore::decode_legacy(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x5d) [0x55b6363dc45d]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 11: (SessionMap::decode_legacy(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x26) [0x55b6363dcdf6]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 12: (SessionMap::_load_legacy_finish(int, ceph::buffer::v14_2_0::list&)+0x5c) [0x55b6363da28c]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 13: (MDSContext::complete(int)+0x52) [0x55b6363e2d42]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 14: (MDSIOContextBase::complete(int)+0x181) [0x55b6363e3021]
2020-01-26T15:27:59.133 INFO:tasks.ceph.mds.b.smithi178.stderr: 15: (Finisher::finisher_thread_entry()+0x195) [0x7f44e6493175]
2020-01-26T15:27:59.134 INFO:tasks.ceph.mds.b.smithi178.stderr: 16: (()+0x76db) [0x7f44e5d4d6db]
2020-01-26T15:27:59.134 INFO:tasks.ceph.mds.b.smithi178.stderr: 17: (clone()+0x3f) [0x7f44e4f3388f]
2020-01-26T15:27:59.134 INFO:tasks.ceph.mds.b.smithi178.stderr:2020-01-26T15:27:59.128+0000 7f44d8b5a700 -1 *** Caught signal (Aborted) **

Related issues

Duplicates bluestore - Bug #43824: fsck errors after auto omap update Resolved

History

#1 Updated by Zheng Yan 2 months ago

mimic does not use legacy session format. Looks like that mds got zero length omap header, so it retired loading sessionmap as legacy format.

Maybe following warning is related.

2020-01-26T15:39:52.961 INFO:tasks.ceph.osd.2.smithi158.stderr:2020-01-26T15:39:52.959+0000 7f41a7b82d80 -1 bluestore(/var/lib/ceph/osd/ceph-2) fsck warning: #2:d0630e4c:::mds0_sessionmap:head# has omap that is not per-pool or pgmeta

#2 Updated by Zheng Yan 2 months ago

I think it's RADOS bug. omap header/keys got lost after upgrade

#3 Updated by Sage Weil 2 months ago

  • Status changed from New to Duplicate

#4 Updated by Sage Weil 2 months ago

  • Duplicates Bug #43824: fsck errors after auto omap update added

Also available in: Atom PDF