Project

General

Profile

Actions

Fix #6990

closed

osd crash when running mixed versions of dumpling and master

Added by Tamilarasi muthamizhan over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

steps to reproduce:

1.running a cluster of 2 nodes with dumpling version of ceph.
2. upgrade only the osds on the first node to master branch
3. thrash osds.

This causes the osd running master branch to crash.

logs are copied to :/home/ubuntu/bug

2013-12-12 14:28:26.317723 7fc955f97700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::d
o_backfill(OpRequestRef)' thread 7fc955f97700 time 2013-12-12 14:28:26.316292
osd/ReplicatedPG.cc: 1439: FAILED assert(is_replica())

 ceph version 0.67.4-37-ga447fb7 (a447fb7d04fbad84f9ecb57726396bb6ca29d8f6)
 1: (ReplicatedPG::do_backfill(std::tr1::shared_ptr<OpRequest>)+0xbd5) [0x5d6125]
 2: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f0) [0x706c80]
 3: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x
330) [0x65ae10]
 4: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x4a0) [0x671510]
 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boos
t::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6acb8c]
 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b4f06]
 7: (ThreadPool::WorkThread::entry()+0x10) [0x8b6d10]
 8: (()+0x7e9a) [0x7fc96a09ce9a]
 9: (clone()+0x6d) [0x7fc9681e8ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

config file to reproduce the issue:

tamil@tamil-VirtualBox:~/tam_final/teuthology$ cat up_master.yaml 
overrides:
  ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch

roles:
- [mon.a, mon.b, osd.0, osd.1, osd.2, mds.a]
- [mon.c, osd.3, osd.4, osd.5, client.0]

targets:
  ubuntu@mira023.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCw8G36ubCLJBcN7Ys9+3erO+GTlJyGJirlP2p1zdkuB4gNpG0scx9lZcM+id8D9ywrA+gQK5DMKaYBuhDHzk8tvbtX9X5TsCdXHpQJtrXmvUCSPKKOK7efnhw/qRB43CYa2p4sM+X1i7QTCXBOjk8syYzM5sxumjsxswsTsVnZ75xRcOIK30W8Cog3wwVsbr4ZaJ8YlMxNObzPqOYlfYCsl+AJ8ELa7hPd+8JTP3EBYjiVvfjntkmYr8CWA+z9kXRxp6Iv9ADr4OAB9uJOkQpOAievN2qF1hCFLoI0Qxlw2px0fVpLl0SFOctVRFnefzWnuYeN+CjNHgnUAVN5HaBj
  ubuntu@mira052.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3sW7EMc9QRG2qjunPv8uQ3rCKTYjs/P/6/aYnNUJ8CM3IkJHexkNlkGYdTD5fOyVzQBC1c+SoqPpyRYPcJvNSOOiJpoQuUE1eyVNYLdtrFaqGCN9nmQg0turDQMwDlE8nK2Fmk74xB1Bc7lvaGm9/EqZrYYMq0KSTKGlIXUD/lAHzdAbe0uItRuEi7g7FALZ9lVgUBVdW3zE+pBpIW/yqP3NKNzP6cwaDu00tUGYgnQi8tjDo+0zZEMTa4hFb8dbO4HVz+10J7qZZCPATiX0SAZvGpm9YferGLxUdGG0qeuo/SHjc2UCMg1TfFug3oRSLDlUI3BllscyCWuWXZZ2j

tasks:
- chef:
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0:
      branch: master
- ceph.restart:
    daemons: [osd.0, osd.1, osd.2]
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph.restart:
    daemons: [mon.a]
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rados/test.sh
- ceph.restart:
    daemons: [mon.b]
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rados/test.sh
- ceph.restart:
    daemons: [mon.c]
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum: [a, b, c]
- workunit:
    clients:
      client.0:
      - rados/test.sh

Actions

Also available in: Atom PDF