Bug #51557
closed[pwl ssd] rbd bench can't exit sometimes in ssd cache mode
0%
Description
This is a problem that happens by chance. It also happened in ssd cache mode qa on teuthology.
refer qa test qa/suites/rbd/persistent-writeback-cache/6-workloads/recovery.yaml, rbd bench command:
timeout 10s ./bin/rbd bench --io-pattern rand --io-type write testimage -c ceph.conf
./bin/rbd bench --io-type write --io-pattern rand --io-total 4M testimage -c ceph.conf
log:
2021-07-07T15:17:01.921+0800 7fc1067fc700 20 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 release_guarded_request: released_cell=0x7fc0d400ab00
2021-07-07T15:17:01.921+0800 7fc1067fc700 20 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 release_guarded_request: current barrier released cell=0x7fc0d400ab00
2021-07-07T15:17:01.921+0800 7fc1067fc700 20 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 release_guarded_request: exit
2021-07-07T15:17:01.921+0800 7fc1067fc700 6 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 operator(): Done internal_flush in shutdown
2021-07-07T15:17:01.921+0800 7fc111c0b700 6 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 operator(): waiting for in flight operations
2021-07-07T15:17:01.921+0800 7fc106ffd700 6 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 operator(): flushing
2021-07-07T15:17:01.921+0800 7fc106ffd700 20 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 flush_dirty_entries: no dirty entries
2021-07-07T15:17:01.921+0800 7fc106ffd700 6 librbd::cache::pwl::AbstractWriteLog: 0x7fc0d4009160 operator(): image cache cleaned
The next log shuld be:2021-07-07T15:29:29.963+0800 7f81027fc700 1 bdev(0x7f80d402aa80 /tmp/rwl/rbd-pwl.rbd.1019bbdbf1e8.pool) close
if issue occur, rbd bench will hang befor this log. So it will hange between these two logs, and can't exit.
Updated by CONGMIN YIN almost 3 years ago
I guess it may hang at bdev->close() or delete bdev. need to try to reproduce the scene, and then use GDB to locate it
Updated by CONGMIN YIN almost 3 years ago
- Status changed from New to Can't reproduce
- Priority changed from Normal to Low
Can't reproduce.