Bug #15429
closedceph-qa-suite upgrade suite: OSD crash due to starting up post-jewel osd after firefly without stopping at hammer
0%
Description
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-04-07_02:10:01-upgrade:hammer-x-jewel-distro-basic-openstack/
Job: 30519
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:hammer-x-jewel-distro-basic-openstack/30519/teuthology.log
2016-04-07T06:36:08.498 INFO:tasks.ceph.osd.4.target065113.stderr:os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error") 2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: ceph version 0.94.6-254-ge219e85 (e219e85be00088eecde7b1f29d7699493a79bc4d) 2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1c6b] 2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa6c) [0x90c7ec] 2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x912e14] 2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x180) [0x912fb0] 2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xba2856] 2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 6: (ThreadPool::WorkThread::entry()+0x10) [0xba3900] 2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 7: (()+0x8182) [0x7fab83891182] 2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 8: (clone()+0x6d) [0x7fab81dfc47d] 2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein about 8 years ago
- Related to Bug #14810: "FileStore.cc: 2855: FAILED assert(0 == "unexpected error")" in powercycle-infernalis-testing-basic-smithi added
Updated by Yuri Weinstein about 8 years ago
Sage do we need to reduce ops = 2000 for rados load?
Updated by Sage Weil about 8 years ago
- Status changed from New to Can't reproduce
I don't think there's enough info here. It didn't manage to get the osd logs, so we cant' see what the unexpected op was. I don't think the disk filled up (ceph_test_rados doesn't write that much data).
Updated by Yuri Weinstein about 8 years ago
- Status changed from Can't reproduce to New
Reopened
Run: http://pulpito.ceph.com/yuriw-2016-04-09_15:40:40-upgrade:hammer-x-jewel-distro-basic-smithi/
Job: 118321
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-04-09_15:40:40-upgrade:hammer-x-jewel-distro-basic-smithi/118321/teuthology.log
2016-04-09T18:06:02.363 INFO:teuthology.orchestra.run.smithi052:Running: 'sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pgs' 2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr:os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f54dc0f08c0 time 2016-04-10 01:06:02.747386 2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr:os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error") 2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr: ceph version 0.94.6-254-ge219e85 (e219e85be00088eecde7b1f29d7699493a79bc4d) 2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc7f2b] 2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa6c) [0x9ac19c] 2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x9b27c4] 2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x9c973b] 2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 5: (FileStore::mount()+0x3bb6) [0x99be66] 2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 6: (main()+0x1e26) [0x650e86] 2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 7: (__libc_start_main()+0xf5) [0x7f54d6c83ec5] 2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 8: ceph-objectstore-tool() [0x66e5b7] 2016-04-09T18:06:02.752 INFO:teuthology.orchestra.run.smithi052.stderr: NOTE: a copy of the executable, or `objdump -rdS <exec
Updated by Nathan Cutler almost 8 years ago
Hit this again here:
http://pulpito.ceph.com/smithfarm-2016-07-07_09:31:30-upgrade:hammer-x-wip-16598---basic-vps/
Right before the crash, there is this:
2016-07-07T10:36:52.434 INFO:tasks.ceph.osd.0.vpm137.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2016-07-07T10:36:52.504 INFO:tasks.ceph.osd.0.vpm137.stderr:2016-07-07 17:36:53.102149 7f1263d1f800 -1 filestore(/var/lib/ceph/osd/ceph-0) WARNING: max attr value size (1024) is smaller than osd_max_object_name_len (2048). Your backend filesystem appears to not support attrs large enough to handle the configured max rados name size. You may get unexpected ENAMETOOLONG errors on rados operations or buggy behavior 2016-07-07T10:36:52.504 INFO:tasks.ceph.osd.0.vpm137.stderr:2016-07-07 17:36:53.133047 7f1263d1f800 -1 filestore(/var/lib/ceph/osd/ceph-0) FileStore::mount: stale version stamp detected: 3. Proceeding, do_update is set, performing disk format upgrade. 2016-07-07T10:36:52.559 INFO:tasks.ceph.osd.1.vpm137.stdout:starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Updated by Samuel Just over 7 years ago
Somehow that most recent one bypassed hammer and upgraded straight to master.
Updated by Samuel Just over 7 years ago
That test actually doesn't stop at hammer at all. I'm confused.
Updated by Samuel Just over 7 years ago
- Subject changed from "FileStore.cc: 2761: FAILED assert(0 == "unexpected error")" in upgrade:hammer-x-jewel-distro-basic-openstack to ceph-qa-suite upgrade suite: OSD crash due to starting up post-jewel osd after firefly without stopping at hammer
Updated by Samuel Just over 7 years ago
- Status changed from New to Can't reproduce