Project

General

Profile

Actions

Bug #4271

closed

osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on latest next, including the most recent osd ooo reply fixes:

2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7ff1a77fe700 time 2013-02-26 05:22:22.133385
2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err:osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err: ceph version 0.57-411-g9096d70 (9096d70642880946b0b477e33f7debabbefec9fa)
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7ff1c320f90d]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 2: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xda7) [0x7ff1c4070fa3]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 3: (ObjectCacher::C_WriteCommit::finish(int)+0x6e) [0x7ff1c407dcfa]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 4: (Context::complete(int)+0x2b) [0x7ff1c402fe3f]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 5: (librbd::C_Request::finish(int)+0x142) [0x7ff1c40687a0]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 6: (Context::complete(int)+0x2b) [0x7ff1c402fe3f]
2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 7: (librbd::AioRequest::complete(int)+0x6f) [0x7ff1c402fecf]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 8: (librbd::rados_req_cb(void*, void*)+0x34) [0x7ff1c405e59e]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 9: (librados::C_AioSafe::finish(int)+0x4d) [0x7ff1c31996e5]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 10: (Finisher::finisher_thread_entry()+0x342) [0x7ff1c320ea8a]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 11: (Finisher::FinisherThread::entry()+0x1c) [0x7ff1c3182690]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 12: (Thread::_entry_func(void*)+0x23) [0x7ff1c33c827d]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 13: (()+0x7e9a) [0x7ff1c0453e9a]
2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 14: (clone()+0x6d) [0x7ff1c01814bd]
2013-02-26T05:22:29.639 INFO:teuthology.task.qemu.client.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-02-26T05:22:29.639 INFO:teuthology.task.qemu.client.0.err:terminate called after throwing an instance of 'ceph::FailedAssertion'
2013-02-26T05:22:37.477 INFO:teuthology.task.qemu.client.0.err:daemon-helper: command crashed with signal 6

job was
kernel:
  kdb: true
  sha1: 49659416ba4fa8308bd29e453f54c3bcf8a0fbf1
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 9096d70642880946b0b477e33f7debabbefec9fa
  s3tests:
    branch: next
  workunit:
    sha1: 9096d70642880946b0b477e33f7debabbefec9fa
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph:
    conf:
      client:
        rbd cache: true
- qemu:
    all:
      test: https://raw.github.com/ceph/ceph/master/qa/workunits/suites/tiobench.sh


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #4369: osd: msgr connection not cleanly shut downResolvedSage Weil03/07/2013

Actions
Is duplicate of Ceph - Bug #3675: osd: hang during intial peeringResolvedSage Weil12/22/2012

Actions
Actions #1

Updated by Sage Weil about 11 years ago

  • Project changed from rbd to Ceph
Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Sage Weil
Actions #3

Updated by Sage Weil about 11 years ago

hit this again, ubuntu@teuthology:/a/sage-2013-02-27_09:23:30-rbd-next-testing-basic$ less 13513/teuthology.log

Actions #4

Updated by Sage Weil about 11 years ago

  • Status changed from New to In Progress

ubuntu@plana25:~/cephtest/archive/log$ grep rb.0.1010.7a6d8d3c.000000000003 osd.3.log | grep -- -- | grep -v sub_op | egrep '(28183 |28186 )'

this is the osd reordering the replies. it's very reproducible, at least!

Actions #5

Updated by Sage Weil about 11 years ago

2013-02-27 16:48:25.907737 7f624039e700 10 osd.3 pg_epoch: 9 pg[2.3( v 9'1918 (9'917,9'1918] local-les=4 n=5 ec=1 les/c 4/4 3/3/2) [3,0] r=0 lpr=3 mlcod 9'1917 active+clean] append_log log(9'917,9'1918] [9'1919 (9'1918) modify 167ed8a3/rb.0.1010.7a6d8d3c.000000000003/head//2 by client.4120.0:28186 2013-02-27 16:48:19.949601]

is where the first out of order op is serialized.

Actions #6

Updated by Sage Weil about 11 years ago

  • Status changed from In Progress to Resolved

this was a messenger bug, fixed by 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4

technically a dup of #3675

Actions #7

Updated by Tamilarasi muthamizhan about 11 years ago

  • Status changed from Resolved to In Progress

ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314

2013-03-11T04:05:47.640 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0'
2013-03-11T04:05:47.747 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:ceph-fuse[7234]: fuse finished with error 0
2013-03-11T04:06:23.844 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7fccff
df6700 time 2013-03-11 04:06:02.323622
2013-03-11T04:06:23.844 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: ceph version 0.58-401-g494f968 (494f968aafeaa7327fef84b14ace752958e2bf6c)
2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xd68) [0x5f1238]
2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x5f764b]
2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x602f45]
2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 4: (Client::ms_dispatch(Message*)+0x8b) [0x4c7aab]
2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 5: (DispatchQueue::entry()+0x341) [0x63a901]
2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x57c40d]
2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 7: (()+0x7e9a) [0x7fcd04b53e9a]
2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 8: (clone()+0x6d) [0x7fcd0330c4bd]
2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 34911873b4ed638fd1c65d8681233af856511006
nuke-on-error: true
overrides:
  ceph:
    conf:
      mds:
        mds inject traceless reply probability: 0.5
    coverage: true
    log-whitelist:
    - slow request
    sha1: 494f968aafeaa7327fef84b14ace752958e2bf6c
  s3tests:
    branch: master
  workunit:
    sha1: 494f968aafeaa7327fef84b14ace752958e2bf6c
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana74.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbl3n0OItF2T7uE/9SNA9YV+pq3nFmmr3EBjf2aMo7Bkyy6e9MC3kaK64vzlZxppbpxEhbrgWo7GsVHxphZqwtMt5Ek3RDbLUpQ8dPckbhOZWBhoZe9S8uCynDIZaF3Yq/b1//OBufSdM0TRYK4Vf4rtQnpyWPEn48HaairNhQvKCxY6XUImZc7Lh4CIaMkT0yyZkBlksJeAkTs4UgxhkSoSc9IAlAxwwPK5JpRL7GOowT+hsqYCVJM5+NVqIOGS66Vam/m9xHqecQtfYIEFoVFZiF/xi+b2kstxJzD39UTNP/XB5J2CtJOujW9J1jsdPKSz1L9XoDIkAFHzQjdhyx
  ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjbUncT43IZIcpSlFXnWDaagYNdnaLTfgsN8TKi3J2QYOB1BlMpaSRBByXr5MW2ZGuWrB77USjJLdRW/feuwtAjDbwc7y5woxcxn1u9eqPh4RyQWqyCmKWvi3GpdM4/4vh+L+7X5rNzJPStTjiPXCxHK+stPGCk9pg+J+KKg/GlJ1Lwx1DeWetdForGqEVJrTSSWj8RM+Nyw/V+c+t2d6gW1SzjY2NvaDdVHfduM+8X9F9aTf3KLitnIvNQWXzfqoEDNPSwrrO2uvTDyqpFkW6HLpZDnWm+io8Y9P7XMrEhWCTlQX2SHeQv/BhaA1btYAw7qfLC9n8RbeXLlQnc5yn
  ubuntu@plana81.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzwyoEUguCYhu0TNdXrk7aWVUvePF6F6coBynLZ73Y/7eqTKAzxnNCiwUQx3kGjK33kliZDk7g/x4FdsiwzknDuGGCXP1pZyIGVtU5wNJ6ZM29XyH2SZvyU0MNfmoMzygxHR53TGcsK3hzwSbaW1woEpJmoqgIFQJ6BJr3nc2foKl79wBQdCNumqJ7sbh26xYVI29vYsJHUTYAdmyE4QrLaLOkZKU5Q/OvUnbKQbURcs7ArxFooObu9h0ENRPK4MKuxBFfgpAYTr/rMeWfVQxvSWsuOMpOdzLaNLt5UBYVVU+wxIFdDwcHb+2Er0rEV49W9xUD6JGXnaFjrxDDocfj
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh
ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314$ cat summary.yaml 
client.0-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006
description: collection:fs-traceless clusters:fixed-3.yaml tasks:cfuse_workunit_suites_ffsb.yaml
  traceless:50pc.yaml
duration: 1182.9954659938812
failure_reason: 'Command failed on 10.214.133.35 with status 1: ''sudo /home/ubuntu/cephtest/enable-coredump
  ceph-coverage /home/ubuntu/cephtest/archive/coverage /home/ubuntu/cephtest/daemon-helper
  kill ceph-fuse -f --name client.0 /home/ubuntu/cephtest/mnt.0'''
flavor: gcov
mon.a-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006
mon.b-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006
owner: scheduled_teuthology@teuthology
success: false

Actions #8

Updated by Sage Weil about 11 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF