Project

General

Profile

Bug #3664

osdc/ObjectCacher.cc: 517: FAILED assert(!i->size())

Added by Sage Weil over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

2012-12-20T13:25:38.347 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7f1a67748780 time 2012-12-20 13:25:01.308939
2012-12-20T13:25:38.347 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: 517: FAILED assert(!i->size())
2012-12-20T13:25:38.347 INFO:teuthology.orchestra.run.err: ceph version 0.55-320-g5497d22 (5497d228116a717da43ce29984ec20d1289780ae)
2012-12-20T13:25:38.348 INFO:teuthology.orchestra.run.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f1a6649c50d]
2012-12-20T13:25:38.348 INFO:teuthology.orchestra.run.err: 2: (ObjectCacher::~ObjectCacher()+0x67) [0x7f1a672d48b3]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 3: (librbd::ImageCtx::~ImageCtx()+0x41) [0x7f1a672a667f]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 4: (librbd::close_image(librbd::ImageCtx*)+0x1a8) [0x7f1a672c16d2]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 5: (librbd::close_image(librbd::ImageCtx*)+0x160) [0x7f1a672c168a]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 6: (librbd::refresh_parent(librbd::ImageCtx*)+0x1f0) [0x7f1a672bdd98]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 7: (librbd::ictx_refresh(librbd::ImageCtx*)+0x132b) [0x7f1a672bf2aa]
2012-12-20T13:25:38.349 INFO:teuthology.orchestra.run.err: 8: (librbd::ictx_check(librbd::ImageCtx*)+0x189) [0x7f1a672bda0c]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 9: (librbd::info(librbd::ImageCtx*, rbd_image_info_t&, unsigned long)+0x121) [0x7f1a672b9492]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 10: (rbd_stat()+0x33) [0x7f1a67291d67]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 11: (check_size()+0x32) [0x404132]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 12: (test()+0x3c5) [0x405945]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 13: (main()+0x7fc) [0x4031bc]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 14: (__libc_start_main()+0xed) [0x7f1a6563c76d]
2012-12-20T13:25:38.350 INFO:teuthology.orchestra.run.err: 15: /tmp/cephtest/binary/usr/local/bin/test_librbd_fsx() [0x403535]
2012-12-20T13:25:38.351 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-12-20T13:25:38.351 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'

from job
ubuntu@teuthology:/var/lib/teuthworker/archive/sam-ooo3/19022$ cat orig.config.yaml 
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
      osd:
        debug ms: 20
        debug osd: 20
    fs: ext4
    log-whitelist:
    - slow request
    valgrind:
      osd:
      - --tool=memcheck
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

Subtasks

Ceph - Bug #4417: rbd_fsx run failed in the nightlies: FAILED assert(i->empty())DuplicateJosh Durgin


Related issues

Duplicated by rbd - Bug #4255: osdc/ObjectCacher.cc: 517: FAILED assert(i->empty()) Duplicate 02/23/2013
Duplicated by rbd - Bug #3600: rbd: assert in objectcacher destructor after flatten Duplicate 12/10/2012

Associated revisions

Revision 82d5cd60 (diff)
Added by Josh Durgin over 7 years ago

ObjectCacher: remove all buffers from a non-existent object

Once we're sure an object doesn't exist, we retry all the waiters in
order, and they return -ENOENT immediately. If there were a bunch of
BufferHeads waiting for data (rx state), they would be left behind
while the reads that triggered them were complete from the cache
user's perspective. These extra rx BufferHeads would pin the object in
the lru, so they wouldn't be removed by release_set(). This meant that
the assert during shutdown of the cache would be triggered.

To fix this, remove any BufferHeads in this state immediately when we
find out the object doesn't exist. Use the same condition as readx for
determining whether this is safe - if we got -ENOENT and all
BufferHeads for the object are clean or rx.

Fixes: #3664
Signed-off-by: Josh Durgin <>

History

#1 Updated by Ian Colle over 7 years ago

  • Assignee set to Josh Durgin

#2 Updated by Ian Colle over 7 years ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)

#3 Updated by Tamilarasi muthamizhan over 7 years ago

recent log: ubuntu@teuthology:/a/teuthology-2013-02-09_20:00:03-regression-bobtail-master-basic/3977

2013-02-10T00:35:17.538 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7f3fad397780 time 2013-02-10 00:35:12.654134
2013-02-10T00:35:17.538 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: 517: FAILED assert(!i->size())
2013-02-10T00:35:17.538 INFO:teuthology.orchestra.run.err: ceph version 0.56.2-29-gf11beb9 (f11beb954976f66bfae75e847937f84958ebeaf3)
2013-02-10T00:35:17.538 INFO:teuthology.orchestra.run.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f3fac0eaf8d]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 2: (ObjectCacher::~ObjectCacher()+0x67) [0x7f3facf23b4f]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 3: (librbd::ImageCtx::~ImageCtx()+0x41) [0x7f3facef5a19]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 4: (librbd::close_image(librbd::ImageCtx*)+0x1a8) [0x7f3facf10986]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 5: (librbd::close_image(librbd::ImageCtx*)+0x160) [0x7f3facf1093e]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 6: (rbd_close()+0x20) [0x7f3facee0fbf]
2013-02-10T00:35:17.539 INFO:teuthology.orchestra.run.err: 7: (main()+0x825) [0x4031e5]
2013-02-10T00:35:17.540 INFO:teuthology.orchestra.run.err: 8: (__libc_start_main()+0xed) [0x7f3fab29576d]
2013-02-10T00:35:17.540 INFO:teuthology.orchestra.run.err: 9: /tmp/cephtest/binary/usr/local/bin/test_librbd_fsx() [0x403535]
2013-02-10T00:35:17.540 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2013-02-09_20:00:03-regression-bobtail-master-basic/3977$ cat summary.yaml 
ceph-sha1: f11beb954976f66bfae75e847937f84958ebeaf3
client.0-kernel-sha1: 0fa6ebc600bc8e830551aee47a0e929e818a1868
description: collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:xfs.yaml msgr-failures:few.yaml
  thrashers:default.yaml workloads:rbd_fsx_cache_writethrough.yaml
duration: 696.5752260684967
failure_reason: 'Command crashed: ''CEPH_CONF=/tmp/cephtest/ceph.conf LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib
  /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage
  /tmp/cephtest/binary/usr/local/bin/test_librbd_fsx -d -W -R -p 100 -P /tmp/cephtest/archive
  -t 1 -l 1073741824 -S 0 -N 2000 pool_client.0 image_client.0'''
flavor: basic
mds.a-kernel-sha1: 0fa6ebc600bc8e830551aee47a0e929e818a1868
mon.a-kernel-sha1: 0fa6ebc600bc8e830551aee47a0e929e818a1868
owner: scheduled_teuthology@teuthology
success: false
ubuntu@teuthology:/a/teuthology-2013-02-09_20:00:03-regression-bobtail-master-basic/3977$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 0fa6ebc600bc8e830551aee47a0e929e818a1868
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
        rbd cache max dirty: 0
      global:
        ms inject socket failures: 5000
    fs: xfs
    log-whitelist:
    - slow request
    sha1: f11beb954976f66bfae75e847937f84958ebeaf3
  s3tests:
    branch: bobtail
  workunit:
    sha1: f11beb954976f66bfae75e847937f84958ebeaf3
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana46.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrD6s9otJ5xCNH4nyv0iJu6AoqmlNTFd8D0X9RfFBnmOMrMBWU9kwsFzPIOsuJGbSYbA8LCtjWUwaWoXmbEFtTMitxaDXp47gbVNXknHq7TGZHkWWOwKKu+tlSQBpCVzO/rzBbvJ9fcG7tewq5XcIHz0IUXsUFuEuXR1HaTUJKic2twBpaeAGNvdd6IZ9Sz9TMkfiRV/aVdcHJ/yF8bsXi3pfRPR3puMK/Nyfq5Hz/aabQo1TSyK2o0weoWV7D8vD6S8f3D7p5/5ScBhL3zUcP85SsV47W+/hTFbU8kN1Grlv2sx0fVMB/TUB/UNVdsHKGn5Nv6zb/qMqBEx9nSeZ9
  ubuntu@plana59.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2ijQopwlP92QQaUVB/Dx8HLmRokvfKoN4+2xGBvej1Qb/P67By9WNF92tQspMpLSpvqodRaXxcKixvrlLEJJQ16lY9xje2jNDWRjJ/AjnAg79qRA7N6L6BC7rccZ8oZWXqOpNKIKRNUUyn+XwtVgkbbcUr5NjQG2CfsLPtoNkdX98a/UyvhefomanU6oStJHFIptLXe2WjgW6DoXofu9qIpyZrO75XFOIlEpC2QTXU6eTVGRxhL2qtVVR42z7bcrNuVFr4TTTNvqaabyvW2mPZNz3a8ojh/pb8gIZGFYydiG8aOsS9dapltOysmdwfBPXHsAoD/49vbPEHIdyjarp
  ubuntu@plana60.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5sa91O7pjxDijkXWrqS3slUWFtQJZtC3hfN5cKt+oISIVRZoy4CNxFrOWhmXwKOZaIV7fS2PzmhZQe9jfA5iRP7G/opl/5sMImzTIUUz8Qrlqb7epmwYHkoE196uN45y8iW/IDwsD7o69+rOGIj0SgdT0nGBYXB27OhdECdumxHcST/vM8dFfIwwMienvhKKB+ghrS2bYa33DcfYHBQWdIPXrUf2brb/XRYDwVMg8W5uPyW8dx3GIONVxSWequnTO1m0FuMKB7BrVpH19+/H4DFOwj6vqSvvOYUWm3Go3FsQTvparxrg7/Rliee0RBWcBIE3D0gWk6r74HV0C4pnT
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

#4 Updated by Josh Durgin over 7 years ago

Need to reproduce with:

client:
  debug rbd: 20
  debug objectcacher: 20
  debug ms: 1

#5 Updated by Josh Durgin over 7 years ago

Reproduced with client side logging after running in a loop for several days.

#6 Updated by Sage Weil over 7 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-20_07:49:52-rbd-master-testing-basic/36

#7 Updated by Ian Colle over 7 years ago

  • Target version set to v0.61 - Cuttlefish

#8 Updated by Ian Colle over 7 years ago

Josh - any update? Are we still going to get this in Cuttlefish?

#9 Updated by Josh Durgin over 7 years ago

  • Status changed from 12 to In Progress

Looking at this again, hopefully will get it fixed tomorrow.

#10 Updated by Josh Durgin over 7 years ago

Unfortunately the logs aren't there anymore (they weren't saved when a power failure restarted the machine running teuthology). Trying to reproduce again, although I've still got the core file at least.

#11 Updated by Josh Durgin over 7 years ago

Nevermind, the logs were saved after all. Hooray!

#12 Updated by Sage Weil over 7 years ago

ubuntu@teuthology:/a/teuthology-2013-04-23_19:55:59-rbd-next-testing-basic$ less 148/teuthology.log

#13 Updated by Sage Weil over 7 years ago

ubuntu@teuthology:/a/teuthology-2013-04-23_19:55:59-rbd-next-testing-basic$ less 155/teuthology.log

#14 Updated by Josh Durgin over 7 years ago

  • Status changed from In Progress to Fix Under Review

#15 Updated by Josh Durgin over 7 years ago

  • Status changed from Fix Under Review to Resolved

commit:82d5cd601e0fb7cb24dda4ea1f0e9f12e5d18708

Also available in: Atom PDF