Bug #53613
closed[pwl] Failed to start IOs when SSD mode persistent write back cache is enabled in ceph version 16.2.7-3.el8cp
0%
Description
We upgraded the cluster to the latest and saw Io's failed to start. (Triggered Ios from RBD bench and FIO both)
[root@magna031 ubuntu]# ceph version
ceph version 16.2.7-3.el8cp (54410e69e153d229a04fb6acc388f7e4afdd05e7) pacific (stable)
RBD bench output for reference -
[root@plena007 ubuntu]# rbd bench-write image1 --pool=test --io-threads=1
rbd: bench-write is deprecated, use rbd bench --io-type write ...
2021-12-14T07:25:30.666+0000 7fc3327fc700 -1 librbd::exclusive_lock::PostAcquireRequest: 0x7fc32c037000 handle_process_plugin_acquire_lock: failed to process plugins: (2) No such file or directory
rbd: failed to flush: 2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::exclusive_lock::ImageDispatch: 0x7fc314002b60 handle_acquire_lock: failed to acquire exclusive lock: (2) No such file or directory
2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::io::AioCompletion: 0x559cca568320 fail: (2) No such file or directory
(2) No such file or directory
bench failed: (2) No such file or directory
FIO output -
[root@plena007 ubuntu]# fio --name=test-1 --ioengine=rbd --pool=test1 --rbdname=image2 --numjobs=1 --rw=write --bs=4k --iodepth=1 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120
test-1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
fio-3.19
Starting 1 process
fio: io_u error on file test-1.0.0: No such file or directory: write offset=0, buflen=4096
fio: pid=1197333, err=2/file:io_u.c:1803, func=io_u error, error=No such file or directory
test-1: (groupid=0, jobs=1): err= 2 (file:io_u.c:1803, func=io_u error, error=No such file or directory): pid=1197333: Tue Dec 14 07:26:47 2021
cpu : usr=0.00%, sys=0.00%, ctx=2, majf=0, minf=5
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
Disk stats (read/write):
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
[root@plena007 ubuntu]#
Configuration and steps
1) After updating conf file to SSD mode as below (Tried from both CLI and conf file)
root@plena007 log]# cat /etc/ceph/ceph.conf- minimal ceph.conf for d6e5c458-0f10-11ec-9663-002590fc25a4
[global]
fsid = d6e5c458-0f10-11ec-9663-002590fc25a4
mon_host = [v2:10.8.128.31:3300/0,v1:10.8.128.31:6789/0]
[client]
rbd_cache = false
rbd_persistent_cache_mode = ssd
rbd_plugins = pwl_cache
rbd_persistent_cache_size = 1073741824
rbd_persistent_cache_path = /mnt/nvme/
Started Ios using rbd bench and FIO , and saw the above error
steps performed to mount -
1. Working ceph cluster
2. client node with NVMe SSD
3. # ceph config set client rbd_persistent_cache_mode SSD
4. # ceph config set client rbd_plugins pwl_cache
Steps to enable DAX
mount -o dax=always /dev/pmem0 <mountpoint>And then set rbd_persistent_cache_path to the mountpoint
- rbd config global set global rbd_persistent_cache_path path
After mounting, make sure that DAX is indeed enabled
Check for something like "EXT4-fs (pmem0): DAX enabled ..." in dmesg