Actions
Fix #6990
closedosd crash when running mixed versions of dumpling and master
% Done:
0%
Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
steps to reproduce:
1.running a cluster of 2 nodes with dumpling version of ceph.
2. upgrade only the osds on the first node to master branch
3. thrash osds.
This causes the osd running master branch to crash.
logs are copied to ubuntu@mira052.front.sepia.ceph.com:/home/ubuntu/bug
2013-12-12 14:28:26.317723 7fc955f97700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::d o_backfill(OpRequestRef)' thread 7fc955f97700 time 2013-12-12 14:28:26.316292 osd/ReplicatedPG.cc: 1439: FAILED assert(is_replica()) ceph version 0.67.4-37-ga447fb7 (a447fb7d04fbad84f9ecb57726396bb6ca29d8f6) 1: (ReplicatedPG::do_backfill(std::tr1::shared_ptr<OpRequest>)+0xbd5) [0x5d6125] 2: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f0) [0x706c80] 3: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x 330) [0x65ae10] 4: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x4a0) [0x671510] 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boos t::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6acb8c] 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b4f06] 7: (ThreadPool::WorkThread::entry()+0x10) [0x8b6d10] 8: (()+0x7e9a) [0x7fc96a09ce9a] 9: (clone()+0x6d) [0x7fc9681e8ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. config file to reproduce the issue: tamil@tamil-VirtualBox:~/tam_final/teuthology$ cat up_master.yaml overrides: ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - log bound mismatch roles: - [mon.a, mon.b, osd.0, osd.1, osd.2, mds.a] - [mon.c, osd.3, osd.4, osd.5, client.0] targets: ubuntu@mira023.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCw8G36ubCLJBcN7Ys9+3erO+GTlJyGJirlP2p1zdkuB4gNpG0scx9lZcM+id8D9ywrA+gQK5DMKaYBuhDHzk8tvbtX9X5TsCdXHpQJtrXmvUCSPKKOK7efnhw/qRB43CYa2p4sM+X1i7QTCXBOjk8syYzM5sxumjsxswsTsVnZ75xRcOIK30W8Cog3wwVsbr4ZaJ8YlMxNObzPqOYlfYCsl+AJ8ELa7hPd+8JTP3EBYjiVvfjntkmYr8CWA+z9kXRxp6Iv9ADr4OAB9uJOkQpOAievN2qF1hCFLoI0Qxlw2px0fVpLl0SFOctVRFnefzWnuYeN+CjNHgnUAVN5HaBj ubuntu@mira052.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3sW7EMc9QRG2qjunPv8uQ3rCKTYjs/P/6/aYnNUJ8CM3IkJHexkNlkGYdTD5fOyVzQBC1c+SoqPpyRYPcJvNSOOiJpoQuUE1eyVNYLdtrFaqGCN9nmQg0turDQMwDlE8nK2Fmk74xB1Bc7lvaGm9/EqZrYYMq0KSTKGlIXUD/lAHzdAbe0uItRuEi7g7FALZ9lVgUBVdW3zE+pBpIW/yqP3NKNzP6cwaDu00tUGYgnQi8tjDo+0zZEMTa4hFb8dbO4HVz+10J7qZZCPATiX0SAZvGpm9YferGLxUdGG0qeuo/SHjc2UCMg1TfFug3oRSLDlUI3BllscyCWuWXZZ2j tasks: - chef: - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: branch: master - ceph.restart: daemons: [osd.0, osd.1, osd.2] - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - ceph.restart: daemons: [mon.a] wait-for-healthy: false wait-for-osds-up: true - workunit: clients: client.0: - rados/test.sh - ceph.restart: daemons: [mon.b] wait-for-healthy: false wait-for-osds-up: true - workunit: clients: client.0: - rados/test.sh - ceph.restart: daemons: [mon.c] wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: [a, b, c] - workunit: clients: client.0: - rados/test.sh
Updated by Greg Farnum over 10 years ago
Tamil, I think you're backwards about which OSDs crash? That backtrace says .67.4, and the master branch OSDs don't contain that assert anywhere any more. :)
Updated by Greg Farnum over 10 years ago
- Assignee set to David Zafman
And I'm pretty sure this is fallout from some of David's no-acting-set-for-backfillers changes.
Updated by Tamilarasi muthamizhan over 10 years ago
oh yes Greg, you are right. this happened on the dumpling osds and not on the master ones.
Updated by David Zafman over 10 years ago
- Target version changed from v0.76a to v0.75
Actions