Project

General

Profile

Actions

Bug #15429

closed

ceph-qa-suite upgrade suite: OSD crash due to starting up post-jewel osd after firefly without stopping at hammer

Added by Yuri Weinstein about 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/hammer-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-04-07_02:10:01-upgrade:hammer-x-jewel-distro-basic-openstack/
Job: 30519
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:hammer-x-jewel-distro-basic-openstack/30519/teuthology.log

2016-04-07T06:36:08.498 INFO:tasks.ceph.osd.4.target065113.stderr:os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: ceph version 0.94.6-254-ge219e85 (e219e85be00088eecde7b1f29d7699493a79bc4d)
2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1c6b]
2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa6c) [0x90c7ec]
2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x912e14]
2016-04-07T06:36:08.499 INFO:tasks.ceph.osd.4.target065113.stderr: 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x180) [0x912fb0]
2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xba2856]
2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 6: (ThreadPool::WorkThread::entry()+0x10) [0xba3900]
2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 7: (()+0x8182) [0x7fab83891182]
2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: 8: (clone()+0x6d) [0x7fab81dfc47d]
2016-04-07T06:36:08.500 INFO:tasks.ceph.osd.4.target065113.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #14810: "FileStore.cc: 2855: FAILED assert(0 == "unexpected error")" in powercycle-infernalis-testing-basic-smithiResolved02/18/2016

Actions
Actions #1

Updated by Yuri Weinstein about 8 years ago

  • Related to Bug #14810: "FileStore.cc: 2855: FAILED assert(0 == "unexpected error")" in powercycle-infernalis-testing-basic-smithi added
Actions #2

Updated by Yuri Weinstein about 8 years ago

Sage do we need to reduce ops = 2000 for rados load?

Actions #3

Updated by Sage Weil about 8 years ago

  • Status changed from New to Can't reproduce

I don't think there's enough info here. It didn't manage to get the osd logs, so we cant' see what the unexpected op was. I don't think the disk filled up (ceph_test_rados doesn't write that much data).

Actions #4

Updated by Yuri Weinstein about 8 years ago

  • Status changed from Can't reproduce to New

Reopened
Run: http://pulpito.ceph.com/yuriw-2016-04-09_15:40:40-upgrade:hammer-x-jewel-distro-basic-smithi/
Job: 118321
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-04-09_15:40:40-upgrade:hammer-x-jewel-distro-basic-smithi/118321/teuthology.log

2016-04-09T18:06:02.363 INFO:teuthology.orchestra.run.smithi052:Running: 'sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pgs'
2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr:os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f54dc0f08c0 time 2016-04-10 01:06:02.747386
2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr:os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
2016-04-09T18:06:02.749 INFO:teuthology.orchestra.run.smithi052.stderr: ceph version 0.94.6-254-ge219e85 (e219e85be00088eecde7b1f29d7699493a79bc4d)
2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc7f2b]
2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa6c) [0x9ac19c]
2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x9b27c4]
2016-04-09T18:06:02.750 INFO:teuthology.orchestra.run.smithi052.stderr: 4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x9c973b]
2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 5: (FileStore::mount()+0x3bb6) [0x99be66]
2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 6: (main()+0x1e26) [0x650e86]
2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 7: (__libc_start_main()+0xf5) [0x7f54d6c83ec5]
2016-04-09T18:06:02.751 INFO:teuthology.orchestra.run.smithi052.stderr: 8: ceph-objectstore-tool() [0x66e5b7]
2016-04-09T18:06:02.752 INFO:teuthology.orchestra.run.smithi052.stderr: NOTE: a copy of the executable, or `objdump -rdS <exec
Actions #5

Updated by Nathan Cutler almost 8 years ago

Hit this again here:

http://pulpito.ceph.com/smithfarm-2016-07-07_09:31:30-upgrade:hammer-x-wip-16598---basic-vps/

Right before the crash, there is this:

2016-07-07T10:36:52.434 INFO:tasks.ceph.osd.0.vpm137.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2016-07-07T10:36:52.504 INFO:tasks.ceph.osd.0.vpm137.stderr:2016-07-07 17:36:53.102149 7f1263d1f800 -1 filestore(/var/lib/ceph/osd/ceph-0) WARNING: max attr value size (1024) is smaller than osd_max_object_name_len (2048).  Your backend filesystem appears to not support attrs large enough to handle the configured max rados name size.  You may get unexpected ENAMETOOLONG errors on rados operations or buggy behavior
2016-07-07T10:36:52.504 INFO:tasks.ceph.osd.0.vpm137.stderr:2016-07-07 17:36:53.133047 7f1263d1f800 -1 filestore(/var/lib/ceph/osd/ceph-0) FileStore::mount: stale version stamp detected: 3. Proceeding, do_update is set, performing disk format upgrade.
2016-07-07T10:36:52.559 INFO:tasks.ceph.osd.1.vpm137.stdout:starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Actions #6

Updated by Samuel Just over 7 years ago

Somehow that most recent one bypassed hammer and upgraded straight to master.

Actions #7

Updated by Samuel Just over 7 years ago

That test actually doesn't stop at hammer at all. I'm confused.

Actions #8

Updated by Samuel Just over 7 years ago

  • Subject changed from "FileStore.cc: 2761: FAILED assert(0 == "unexpected error")" in upgrade:hammer-x-jewel-distro-basic-openstack to ceph-qa-suite upgrade suite: OSD crash due to starting up post-jewel osd after firefly without stopping at hammer
Actions #9

Updated by Samuel Just over 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF