Project

General

Profile

Bug #3600

rbd: assert in objectcacher destructor after flatten

Added by Josh Durgin about 8 years ago. Updated almost 8 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

From ubuntu@teuthology:/a/teuthology-2012-12-09_19:00:03-regression-master-testing-gcov/10977:

2012-12-09T22:20:17.777 INFO:teuthology.orchestra.run.out:131 write     0xef4cf97 thru  0xef503e3       (0x344d bytes)
2012-12-09T22:20:18.566 INFO:teuthology.orchestra.run.out:132 write     0x18d5c0e5 thru 0x18d5dc04      (0x1b20 bytes)
2012-12-09T22:20:18.613 INFO:teuthology.orchestra.run.out:133 read      0x3fb40df thru  0x3fb4fc2       (0xee4 bytes)
2012-12-09T22:20:18.616 INFO:teuthology.orchestra.run.out:135 punch     from 0x5855ff3 to 0x58653fb, (0xf408 bytes)
2012-12-09T22:20:19.382 INFO:teuthology.orchestra.run.out:137 flatten
2012-12-09T22:20:20.280 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise -- pg dump --format=json'
2012-12-09T22:20:20.335 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise -s'
2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.ceph_manager:   health HEALTH_OK
   monmap e1: 1 mons at {a=10.214.131.24:6789/0}, election epoch 2, quorum 0 a
   osdmap e27: 6 osds: 6 up, 6 in
    pgmap v59: 80 pgs: 80 active+clean; 30382 KB data, 1195 MB used, 2609 GB / 2750 GB avail
   mdsmap e5: 1/1/1 up {0=a=up:active}
2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.ceph_manager:recovered!
2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 2 min_out 0 min_live 2 min_dead 0
2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.thrasher:Removing osd 1, in_osds are: [2, 3, 4, 5, 1, 0]
2012-12-09T22:20:20.444 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise osd out 1'

2012-12-09T22:20:21.547 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7faabdf44780 time 2012-12-09 22:19:56.019607
2012-12-09T22:20:21.548 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: 517: FAILED assert(!i->size())
2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: ceph version 0.55-180-gd4c6a22 (d4c6a22d798ff6de1bee31824144aa33bbf5e608)
2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: 1: (ObjectCacher::~ObjectCacher()+0x26f) [0x7faabda838af]
2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: 2: (librbd::ImageCtx::~ImageCtx()+0x42) [0x7faabda51432]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 3: (librbd::close_image(librbd::ImageCtx*)+0xbf) [0x7faabda66cef]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 4: (librbd::refresh_parent(librbd::ImageCtx*)+0xa1) [0x7faabda67c71]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 5: (librbd::ictx_refresh(librbd::ImageCtx*)+0xb8d) [0x7faabda5f46d]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 6: (librbd::ictx_check(librbd::ImageCtx*)+0x185) [0x7faabda61385]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 7: (librbd::info(librbd::ImageCtx*, rbd_image_info_t&, unsigned long)+0x121) [0x7faabda74b41]
2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 8: (rbd_stat()+0x11) [0x7faabda384c1]
2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 9: (check_size()+0x36) [0x404eb6]
2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 10: (test()+0x58d) [0x40707d]
2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 11: (main()+0xb33) [0x4039f3]
2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 12: (__libc_start_main()+0xed) [0x7faabc3ae76d]
2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 13: /tmp/cephtest/binary/usr/local/bin/test_librbd_fsx() [0x403ecd]


Related issues

Duplicates rbd - Bug #3664: osdc/ObjectCacher.cc: 517: FAILED assert(!i->size()) Resolved 03/11/2013

History

#1 Updated by Dan Mick about 8 years ago

Looking at the code, I don't really see a clean "stop caching" mechanism. While I look, hacked fsx to do only write/flatten/clone, and am running on a 3-machine cluster
and my local vstart cluster to try to reproduce the problem.

also the core file was corrupted again :(

It certainly seems possible that there could be I/Os in flight while the cache is
being destroyed, and that seems bad. Maybe I'm missing the mechanism that prevents
that.

#2 Updated by Dan Mick about 8 years ago

Hm #2. Both runs are deadlocked on the mutex IoCtxImpl::selfmanaged_snap_create::mylock. Will look for owner next.

#3 Updated by Dan Mick about 8 years ago

The hang on selfmanaged_snap_create seems like a monitor issue; bouncing the monitor, or hanging out in gdb long enough to make the monitor reconnect, resolves the issue. I've sent mail to Joao about the problem in hopes that he can reproduce/investigate.

#4 Updated by Sage Weil about 8 years ago

  • Priority changed from Urgent to High

#5 Updated by Joao Eduardo Luis about 8 years ago

Tried to reproduce this behavior to no avail.

There are operations on the test that do hang for a long time, but after discussing that with Dan he told me that those operations do take a considerable amount of time anyway, so no luck there.

The logs weren't very useful in tracking down the problem either given the low debug levels.

#6 Updated by Tamilarasi muthamizhan about 8 years ago

recent log: ubuntu@teuthology:/a/teuthology-2012-12-27_19:00:03-regression-next-testing-basic/28713

#7 Updated by Dan Mick about 8 years ago

  • Assignee changed from Dan Mick to Josh Durgin

#8 Updated by Josh Durgin almost 8 years ago

  • Status changed from 12 to Duplicate

Also available in: Atom PDF