Bug #3600
rbd: assert in objectcacher destructor after flatten
0%
Description
From ubuntu@teuthology:/a/teuthology-2012-12-09_19:00:03-regression-master-testing-gcov/10977:
2012-12-09T22:20:17.777 INFO:teuthology.orchestra.run.out:131 write 0xef4cf97 thru 0xef503e3 (0x344d bytes) 2012-12-09T22:20:18.566 INFO:teuthology.orchestra.run.out:132 write 0x18d5c0e5 thru 0x18d5dc04 (0x1b20 bytes) 2012-12-09T22:20:18.613 INFO:teuthology.orchestra.run.out:133 read 0x3fb40df thru 0x3fb4fc2 (0xee4 bytes) 2012-12-09T22:20:18.616 INFO:teuthology.orchestra.run.out:135 punch from 0x5855ff3 to 0x58653fb, (0xf408 bytes) 2012-12-09T22:20:19.382 INFO:teuthology.orchestra.run.out:137 flatten 2012-12-09T22:20:20.280 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise -- pg dump --format=json' 2012-12-09T22:20:20.335 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise -s' 2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.ceph_manager: health HEALTH_OK monmap e1: 1 mons at {a=10.214.131.24:6789/0}, election epoch 2, quorum 0 a osdmap e27: 6 osds: 6 up, 6 in pgmap v59: 80 pgs: 80 active+clean; 30382 KB data, 1195 MB used, 2609 GB / 2750 GB avail mdsmap e5: 1/1/1 up {0=a=up:active} 2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.ceph_manager:recovered! 2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 2 min_out 0 min_live 2 min_dead 0 2012-12-09T22:20:20.443 INFO:teuthology.task.thrashosds.thrasher:Removing osd 1, in_osds are: [2, 3, 4, 5, 1, 0] 2012-12-09T22:20:20.444 DEBUG:teuthology.orchestra.run:Running: 'LD_LIBRARY_PRELOAD=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -k /tmp/cephtest/ceph.keyring -c /tmp/cephtest/ceph.conf --concise osd out 1' 2012-12-09T22:20:21.547 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7faabdf44780 time 2012-12-09 22:19:56.019607 2012-12-09T22:20:21.548 INFO:teuthology.orchestra.run.err:osdc/ObjectCacher.cc: 517: FAILED assert(!i->size()) 2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: ceph version 0.55-180-gd4c6a22 (d4c6a22d798ff6de1bee31824144aa33bbf5e608) 2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: 1: (ObjectCacher::~ObjectCacher()+0x26f) [0x7faabda838af] 2012-12-09T22:20:21.567 INFO:teuthology.orchestra.run.err: 2: (librbd::ImageCtx::~ImageCtx()+0x42) [0x7faabda51432] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 3: (librbd::close_image(librbd::ImageCtx*)+0xbf) [0x7faabda66cef] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 4: (librbd::refresh_parent(librbd::ImageCtx*)+0xa1) [0x7faabda67c71] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 5: (librbd::ictx_refresh(librbd::ImageCtx*)+0xb8d) [0x7faabda5f46d] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 6: (librbd::ictx_check(librbd::ImageCtx*)+0x185) [0x7faabda61385] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 7: (librbd::info(librbd::ImageCtx*, rbd_image_info_t&, unsigned long)+0x121) [0x7faabda74b41] 2012-12-09T22:20:21.568 INFO:teuthology.orchestra.run.err: 8: (rbd_stat()+0x11) [0x7faabda384c1] 2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 9: (check_size()+0x36) [0x404eb6] 2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 10: (test()+0x58d) [0x40707d] 2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 11: (main()+0xb33) [0x4039f3] 2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 12: (__libc_start_main()+0xed) [0x7faabc3ae76d] 2012-12-09T22:20:21.569 INFO:teuthology.orchestra.run.err: 13: /tmp/cephtest/binary/usr/local/bin/test_librbd_fsx() [0x403ecd]
Related issues
History
#1 Updated by Dan Mick over 11 years ago
Looking at the code, I don't really see a clean "stop caching" mechanism. While I look, hacked fsx to do only write/flatten/clone, and am running on a 3-machine cluster
and my local vstart cluster to try to reproduce the problem.
also the core file was corrupted again :(
It certainly seems possible that there could be I/Os in flight while the cache is
being destroyed, and that seems bad. Maybe I'm missing the mechanism that prevents
that.
#2 Updated by Dan Mick over 11 years ago
Hm #2. Both runs are deadlocked on the mutex IoCtxImpl::selfmanaged_snap_create::mylock. Will look for owner next.
#3 Updated by Dan Mick over 11 years ago
The hang on selfmanaged_snap_create seems like a monitor issue; bouncing the monitor, or hanging out in gdb long enough to make the monitor reconnect, resolves the issue. I've sent mail to Joao about the problem in hopes that he can reproduce/investigate.
#4 Updated by Sage Weil over 11 years ago
- Priority changed from Urgent to High
#5 Updated by Joao Eduardo Luis over 11 years ago
Tried to reproduce this behavior to no avail.
There are operations on the test that do hang for a long time, but after discussing that with Dan he told me that those operations do take a considerable amount of time anyway, so no luck there.
The logs weren't very useful in tracking down the problem either given the low debug levels.
#6 Updated by Tamilarasi muthamizhan about 11 years ago
recent log: ubuntu@teuthology:/a/teuthology-2012-12-27_19:00:03-regression-next-testing-basic/28713
#7 Updated by Dan Mick about 11 years ago
- Assignee changed from Dan Mick to Josh Durgin
#8 Updated by Josh Durgin about 11 years ago
- Status changed from 12 to Duplicate