Project

General

Profile

Actions

Bug #4396

closed

osd crashed in ReplicatedPG::do_op in the nightlies

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2013-03-08_01:00:06-regression-master-testing-gcov/18382

 ceph version 0.58-373-g38e55dc (38e55dc5749274655e62ce80d4b067b4addebc89)
 1: ceph-osd() [0x77661a]
 2: (()+0xfcb0) [0x7f751f695cb0]
 3: (gsignal()+0x35) [0x7f751d763425]
 4: (abort()+0x17b) [0x7f751d766b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f751e0b569d]
 6: (()+0xb5846) [0x7f751e0b3846]
 7: (()+0xb5873) [0x7f751e0b3873]
 8: (()+0xb596e) [0x7f751e0b396e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x821c2f]
 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3bc9) [0x5c57c9]
 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x610) [0x69af30]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x5fe2f3]
 13: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x6132fb]
 14: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x64dba1]
 15: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x64ddcc]
 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x817056]
 17: (ThreadPool::WorkThread::entry()+0x10) [0x818e80]
 18: (()+0x7e9a) [0x7f751f68de9a]
 19: (clone()+0x6d) [0x7f751d820cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2013-03-08_01:00:06-regression-master-testing-gcov/18382$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 34911873b4ed638fd1c65d8681233af856511006
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 500
      osd:
        osd op thread timeout: 60
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 38e55dc5749274655e62ce80d4b067b4addebc89
  s3tests:
    branch: master
  workunit:
    sha1: 38e55dc5749274655e62ce80d4b067b4addebc89
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana15.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrdOMpeQVLQ+RrMyCLqxOoU/uNzrq/WmYYHhE9yAJSAeD652SCOzMCaChBawwFypiHB/1Zv++PGx2mIceuh8BpAjs0iWoWwj39TDMsB8GYm2A5qFK9BfG080rc8LtmNX//IX3IdbwzxKIM3odcrg1sdQ4p6zLMQYiuwUb5+8clItH7Vl8SzgT6Y+NNyXuwQRZ2JqCcnuV22fSpcfEYVh3HtjXw/G6k/NmdPnP3lab5kQzYsio9A2WmlGmtHHntRMZ+syMCPZI6Rn7rySElxLoet9WqK0qxcusHmPZf1N4gBre0fYnSK7ix6N7TRXlI86TA5Z/VHmkqDaSyuO4YUYoL
  ubuntu@plana31.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5J4n7rTsH+IMjGAu+EfhukuK5+zScoSaPIfXDOUU8LfvuI/3x8Luiyv9eRVwZgwuLBWZ/zorBbGZ+G2Iaxy3632AG/XE7cRZA9AxzZT+Qvm9D+BW+Uletgf92cttKMk7qwK3DetQwRKKl6AMv0SDpUff+nzqnJH6LMS8zoBPVXDHFM3Lup8h9H6DYEs1F/Zn8LVSw8hNiD279rg1n1hqWdItmnKBPKyC/qkRoPa6h7gDU6FPaBiNhuhBd0016XGrVwL7Y8gqoDBiArP+NDt1lcnbeiK43bFhqW+pYovOdIA2MJC6z+bkZDlOJdxoz9mDP0cJZBdB43v3UdbS1R+WT
  ubuntu@plana52.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9kswBp2g5ZV1Qrvlee8MvUOCNdubQFqUBr5WSsmFBODqEuiitWbhuBu2Ucz0lBMf41DpMKLeYDN0lIC94GZmGaiCN+Ak9Ia05d/uRvesT2nDgHB3Z9J/zEFlY8RVxL3xhD+hq4u8dbASlqqoMDiBP+7efZMxt4Ndnzr/yOxge3KenxyQImBUS+OV+BqnfCOHf6BqM33U1leXz2kng7ocxoE91DAMslKD/2DPRSYEhfucUJZk6IYevr/g0JVhbfvjSlZzwUEfTyVmPeqNyls/U+azhKlvQbqpb+ttc02RNydQ1YgOgHFCaqd9Vm8XjUU6vYGlkFHZ+BMJuEwA9AH/D
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - rbd/map-unmap.sh


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #4371: osd/ReplicatedPG.cc: 4814: FAILED assert(peer_missing.count(fromosd))ResolvedSamuel Just03/07/2013

Actions
Actions #1

Updated by Samuel Just about 11 years ago

3> 2013-03-08 08:54:13.000357 7f750e4f5700 5 --OSD::tracker- reqid: client.4396.1:74, seq: 22763, time: 2013-03-08 08:54:13.000357, event: started, request: osd_op(client.4396.1:74 image-16181.rbd [watch add cookie 1 ver 17179869184] 2.5581526c e4) v4
2> 2013-03-08 08:54:13.000380 7f750e4f5700 5 --OSD::tracker- reqid: client.4396.1:74, seq: 22763, time: 2013-03-08 08:54:13.000380, event: started, request: osd_op(client.4396.1:74 image-16181.rbd [watch add cookie 1 ver 17179869184] 2.5581526c e4) v4
-1> 2013-03-08 08:54:13.000439 7f750e4f5700 -1 osd.1 pg_epoch: 4 pg[2.c( v 4'521 (0'0,4'521] local-les=4 n=2 ec=1 les/c 4/4 3/3/3) [1,0] r=0 lpr=3 luod=4'520 lcod 4'520 mlcod 4'519 active+clean] bad op order, already applied 90 > this 74
0> 2013-03-08 08:54:13.002825 7f750e4f5700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::do_op(OpRequestRef)' thread 7f750e4f5700 time 2013-03-08 08:54:13.000463
osd/ReplicatedPG.cc: 1017: FAILED assert(0 == "out of order op")

ceph version 0.58-373-g38e55dc (38e55dc5749274655e62ce80d4b067b4addebc89)
1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3bc9) [0x5c57c9]
2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x610) [0x69af30]
3: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x5fe2f3]
4: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x6132fb]
5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x64dba1]
6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x64ddcc]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x817056]
8: (ThreadPool::WorkThread::entry()+0x10) [0x818e80]
9: (()+0x7e9a) [0x7f751f68de9a]
10: (clone()+0x6d) [0x7f751d820cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---

Actions #2

Updated by Tamilarasi muthamizhan about 11 years ago

  • Assignee changed from Samuel Just to Sage Weil
  • Priority changed from Normal to Urgent
Actions #3

Updated by Ian Colle about 11 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF