Bug #4271
closedosdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
0%
Description
on latest next, including the most recent osd ooo reply fixes:
2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7ff1a77fe700 time 2013-02-26 05:22:22.133385 2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err:osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid) 2013-02-26T05:22:29.636 INFO:teuthology.task.qemu.client.0.err: ceph version 0.57-411-g9096d70 (9096d70642880946b0b477e33f7debabbefec9fa) 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7ff1c320f90d] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 2: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xda7) [0x7ff1c4070fa3] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 3: (ObjectCacher::C_WriteCommit::finish(int)+0x6e) [0x7ff1c407dcfa] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 4: (Context::complete(int)+0x2b) [0x7ff1c402fe3f] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 5: (librbd::C_Request::finish(int)+0x142) [0x7ff1c40687a0] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 6: (Context::complete(int)+0x2b) [0x7ff1c402fe3f] 2013-02-26T05:22:29.637 INFO:teuthology.task.qemu.client.0.err: 7: (librbd::AioRequest::complete(int)+0x6f) [0x7ff1c402fecf] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 8: (librbd::rados_req_cb(void*, void*)+0x34) [0x7ff1c405e59e] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 9: (librados::C_AioSafe::finish(int)+0x4d) [0x7ff1c31996e5] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 10: (Finisher::finisher_thread_entry()+0x342) [0x7ff1c320ea8a] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 11: (Finisher::FinisherThread::entry()+0x1c) [0x7ff1c3182690] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 12: (Thread::_entry_func(void*)+0x23) [0x7ff1c33c827d] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 13: (()+0x7e9a) [0x7ff1c0453e9a] 2013-02-26T05:22:29.638 INFO:teuthology.task.qemu.client.0.err: 14: (clone()+0x6d) [0x7ff1c01814bd] 2013-02-26T05:22:29.639 INFO:teuthology.task.qemu.client.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-02-26T05:22:29.639 INFO:teuthology.task.qemu.client.0.err:terminate called after throwing an instance of 'ceph::FailedAssertion' 2013-02-26T05:22:37.477 INFO:teuthology.task.qemu.client.0.err:daemon-helper: command crashed with signal 6
job was
kernel: kdb: true sha1: 49659416ba4fa8308bd29e453f54c3bcf8a0fbf1 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 9096d70642880946b0b477e33f7debabbefec9fa s3tests: branch: next workunit: sha1: 9096d70642880946b0b477e33f7debabbefec9fa roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: conf: client: rbd cache: true - qemu: all: test: https://raw.github.com/ceph/ceph/master/qa/workunits/suites/tiobench.sh
Updated by Sage Weil about 11 years ago
hit this again, ubuntu@teuthology:/a/sage-2013-02-27_09:23:30-rbd-next-testing-basic$ less 13513/teuthology.log
Updated by Sage Weil about 11 years ago
- Status changed from New to In Progress
ubuntu@plana25:~/cephtest/archive/log$ grep rb.0.1010.7a6d8d3c.000000000003 osd.3.log | grep -- -- | grep -v sub_op | egrep '(28183 |28186 )'
this is the osd reordering the replies. it's very reproducible, at least!
Updated by Sage Weil about 11 years ago
2013-02-27 16:48:25.907737 7f624039e700 10 osd.3 pg_epoch: 9 pg[2.3( v 9'1918 (9'917,9'1918] local-les=4 n=5 ec=1 les/c 4/4 3/3/2) [3,0] r=0 lpr=3 mlcod 9'1917 active+clean] append_log log(9'917,9'1918] [9'1919 (9'1918) modify 167ed8a3/rb.0.1010.7a6d8d3c.000000000003/head//2 by client.4120.0:28186 2013-02-27 16:48:19.949601]
is where the first out of order op is serialized.
Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Resolved
this was a messenger bug, fixed by 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4
technically a dup of #3675
Updated by Tamilarasi muthamizhan about 11 years ago
- Status changed from Resolved to In Progress
ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314
2013-03-11T04:05:47.640 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0' 2013-03-11T04:05:47.747 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:ceph-fuse[7234]: fuse finished with error 0 2013-03-11T04:06:23.844 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7fccff df6700 time 2013-03-11 04:06:02.323622 2013-03-11T04:06:23.844 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid) 2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: ceph version 0.58-401-g494f968 (494f968aafeaa7327fef84b14ace752958e2bf6c) 2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xd68) [0x5f1238] 2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x5f764b] 2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x602f45] 2013-03-11T04:06:23.852 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 4: (Client::ms_dispatch(Message*)+0x8b) [0x4c7aab] 2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 5: (DispatchQueue::entry()+0x341) [0x63a901] 2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x57c40d] 2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 7: (()+0x7e9a) [0x7fcd04b53e9a] 2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: 8: (clone()+0x6d) [0x7fcd0330c4bd] 2013-03-11T04:06:23.853 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314$ cat config.yaml kernel: &id001 kdb: true sha1: 34911873b4ed638fd1c65d8681233af856511006 nuke-on-error: true overrides: ceph: conf: mds: mds inject traceless reply probability: 0.5 coverage: true log-whitelist: - slow request sha1: 494f968aafeaa7327fef84b14ace752958e2bf6c s3tests: branch: master workunit: sha1: 494f968aafeaa7327fef84b14ace752958e2bf6c roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana74.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbl3n0OItF2T7uE/9SNA9YV+pq3nFmmr3EBjf2aMo7Bkyy6e9MC3kaK64vzlZxppbpxEhbrgWo7GsVHxphZqwtMt5Ek3RDbLUpQ8dPckbhOZWBhoZe9S8uCynDIZaF3Yq/b1//OBufSdM0TRYK4Vf4rtQnpyWPEn48HaairNhQvKCxY6XUImZc7Lh4CIaMkT0yyZkBlksJeAkTs4UgxhkSoSc9IAlAxwwPK5JpRL7GOowT+hsqYCVJM5+NVqIOGS66Vam/m9xHqecQtfYIEFoVFZiF/xi+b2kstxJzD39UTNP/XB5J2CtJOujW9J1jsdPKSz1L9XoDIkAFHzQjdhyx ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjbUncT43IZIcpSlFXnWDaagYNdnaLTfgsN8TKi3J2QYOB1BlMpaSRBByXr5MW2ZGuWrB77USjJLdRW/feuwtAjDbwc7y5woxcxn1u9eqPh4RyQWqyCmKWvi3GpdM4/4vh+L+7X5rNzJPStTjiPXCxHK+stPGCk9pg+J+KKg/GlJ1Lwx1DeWetdForGqEVJrTSSWj8RM+Nyw/V+c+t2d6gW1SzjY2NvaDdVHfduM+8X9F9aTf3KLitnIvNQWXzfqoEDNPSwrrO2uvTDyqpFkW6HLpZDnWm+io8Y9P7XMrEhWCTlQX2SHeQv/BhaA1btYAw7qfLC9n8RbeXLlQnc5yn ubuntu@plana81.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzwyoEUguCYhu0TNdXrk7aWVUvePF6F6coBynLZ73Y/7eqTKAzxnNCiwUQx3kGjK33kliZDk7g/x4FdsiwzknDuGGCXP1pZyIGVtU5wNJ6ZM29XyH2SZvyU0MNfmoMzygxHR53TGcsK3hzwSbaW1woEpJmoqgIFQJ6BJr3nc2foKl79wBQdCNumqJ7sbh26xYVI29vYsJHUTYAdmyE4QrLaLOkZKU5Q/OvUnbKQbURcs7ArxFooObu9h0ENRPK4MKuxBFfgpAYTr/rMeWfVQxvSWsuOMpOdzLaNLt5UBYVVU+wxIFdDwcHb+2Er0rEV49W9xUD6JGXnaFjrxDDocfj tasks: - internal.lock_machines: - 3 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - install: null - ceph: null - ceph-fuse: null - workunit: clients: all: - suites/ffsb.sh ubuntu@teuthology:/a/teuthology-2013-03-11_01:00:04-regression-master-testing-gcov/21314$ cat summary.yaml client.0-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006 description: collection:fs-traceless clusters:fixed-3.yaml tasks:cfuse_workunit_suites_ffsb.yaml traceless:50pc.yaml duration: 1182.9954659938812 failure_reason: 'Command failed on 10.214.133.35 with status 1: ''sudo /home/ubuntu/cephtest/enable-coredump ceph-coverage /home/ubuntu/cephtest/archive/coverage /home/ubuntu/cephtest/daemon-helper kill ceph-fuse -f --name client.0 /home/ubuntu/cephtest/mnt.0''' flavor: gcov mon.a-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006 mon.b-kernel-sha1: 34911873b4ed638fd1c65d8681233af856511006 owner: scheduled_teuthology@teuthology success: false
Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Resolved