Actions
Bug #20985
closedPG which marks divergent_priors causes crash on startup
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This was noticed in the course of somebody upgrading from 12.1.1 to 12.1.2:
2017-08-11 23:01:53.109922 7fd4268ffcc0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-08-11 23:01:53.109926 7fd4268ffcc0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 'filestore splice' config option 2017-08-11 23:01:53.111939 7fd4268ffcc0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-08-11 23:01:53.112060 7fd4268ffcc0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf 2017-08-11 23:01:53.113102 7fd4268ffcc0 0 filestore(/var/lib/ceph/osd/ceph-0) start omap initiation 2017-08-11 23:01:53.114429 7fd4268ffcc0 1 leveldb: Recovering log #181623 2017-08-11 23:01:53.122344 7fd4268ffcc0 1 leveldb: Delete type=0 #181623 2017-08-11 23:01:53.122450 7fd4268ffcc0 1 leveldb: Delete type=3 #181622 2017-08-11 23:02:41.757352 7fd4268ffcc0 0 filestore(/var/lib/ceph/osd/ceph-0) mount(1758): enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-08-11 23:02:41.788193 7fd4268ffcc0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2017-08-11 23:02:41.788202 7fd4268ffcc0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 28: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-08-11 23:02:41.823216 7fd4268ffcc0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 28: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-08-11 23:02:41.830592 7fd4268ffcc0 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade(1365) 2017-08-11 23:02:41.831343 7fd4268ffcc0 0 _get_class not permitted to load lua 2017-08-11 23:02:41.833438 7fd4268ffcc0 0 _get_class not permitted to load sdk 2017-08-11 23:02:41.842946 7fd4268ffcc0 0 <cls> /build/ceph-12.1.3/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs 2017-08-11 23:02:41.843280 7fd4268ffcc0 0 <cls> /build/ceph-12.1.3/src/cls/hello/cls_hello.cc:296: loading cls_hello 2017-08-11 23:02:41.843606 7fd4268ffcc0 0 _get_class not permitted to load kvs 2017-08-11 23:02:41.843662 7fd4268ffcc0 1 osd.0 0 warning: got an error loading one or more classes: (1) Operation not permitted 2017-08-11 23:02:41.844083 7fd4268ffcc0 0 osd.0 6793 crush map has features 288232576282525696, adjusting msgr requires for clients 2017-08-11 23:02:41.844124 7fd4268ffcc0 0 osd.0 6793 crush map has features 288232576282525696 was 8705, adjusting msgr requires for mons 2017-08-11 23:02:41.844160 7fd4268ffcc0 0 osd.0 6793 crush map has features 1008808516661821440, adjusting msgr requires for osds 2017-08-11 23:02:44.634391 7fd4268ffcc0 0 osd.0 6793 load_pgs 2017-08-11 23:02:44.749661 7fd4268ffcc0 -1 /build/ceph-12.1.3/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, bool, std::ostringstream&, bool, bool*, const DoutPrefixProvider*, std::set<std::basic_string<char> >*, bool) [with missing_type = pg_missing_set<true>; std::ostringstream = std::basic_ostringstream<char>]' thread 7fd4268ffcc0 time 2017-08-11 23:02:44.746102 /build/ceph-12.1.3/src/osd/PGLog.h: 1301: FAILED assert(force_rebuild_missing)
But it's actually much worse than that: PG::read_state only sets force_rebuild_missing if the info_structv is the Jewel version, and it asserts(force_rebuild_missing) if it sees divergent_priors written down. Which means that on reboot of an all-Luminous system, it will crash.
Files
Actions