Project

General

Profile

Actions

Bug #53613

closed

[pwl] Failed to start IOs when SSD mode persistent write back cache is enabled in ceph version 16.2.7-3.el8cp

Added by Preethi Nataraj over 2 years ago. Updated over 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
12/15/2021
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We upgraded the cluster to the latest and saw Io's failed to start. (Triggered Ios from RBD bench and FIO both)
[root@magna031 ubuntu]# ceph version
ceph version 16.2.7-3.el8cp (54410e69e153d229a04fb6acc388f7e4afdd05e7) pacific (stable)

RBD bench output for reference -
[root@plena007 ubuntu]# rbd bench-write image1 --pool=test --io-threads=1
rbd: bench-write is deprecated, use rbd bench --io-type write ...
2021-12-14T07:25:30.666+0000 7fc3327fc700 -1 librbd::exclusive_lock::PostAcquireRequest: 0x7fc32c037000 handle_process_plugin_acquire_lock: failed to process plugins: (2) No such file or directory
rbd: failed to flush: 2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::exclusive_lock::ImageDispatch: 0x7fc314002b60 handle_acquire_lock: failed to acquire exclusive lock: (2) No such file or directory
2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::io::AioCompletion: 0x559cca568320 fail: (2) No such file or directory
(2) No such file or directory
bench failed: (2) No such file or directory

FIO output -
[root@plena007 ubuntu]# fio --name=test-1 --ioengine=rbd --pool=test1 --rbdname=image2 --numjobs=1 --rw=write --bs=4k --iodepth=1 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120
test-1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
fio-3.19
Starting 1 process
fio: io_u error on file test-1.0.0: No such file or directory: write offset=0, buflen=4096
fio: pid=1197333, err=2/file:io_u.c:1803, func=io_u error, error=No such file or directory

test-1: (groupid=0, jobs=1): err= 2 (file:io_u.c:1803, func=io_u error, error=No such file or directory): pid=1197333: Tue Dec 14 07:26:47 2021
cpu : usr=0.00%, sys=0.00%, ctx=2, majf=0, minf=5
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):

Disk stats (read/write):
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
[root@plena007 ubuntu]#


Configuration and steps

1) After updating conf file to SSD mode as below (Tried from both CLI and conf file)

root@plena007 log]# cat /etc/ceph/ceph.conf
  1. minimal ceph.conf for d6e5c458-0f10-11ec-9663-002590fc25a4
    [global]
    fsid = d6e5c458-0f10-11ec-9663-002590fc25a4
    mon_host = [v2:10.8.128.31:3300/0,v1:10.8.128.31:6789/0]
    [client]
    rbd_cache = false
    rbd_persistent_cache_mode = ssd
    rbd_plugins = pwl_cache
    rbd_persistent_cache_size = 1073741824
    rbd_persistent_cache_path = /mnt/nvme/

Started Ios using rbd bench and FIO , and saw the above error

steps performed to mount -
1. Working ceph cluster
2. client node with NVMe SSD
3. # ceph config set client rbd_persistent_cache_mode SSD
4. # ceph config set client rbd_plugins pwl_cache

Steps to enable DAX

mount -o dax=always /dev/pmem0 <mountpoint>
And then set rbd_persistent_cache_path to the mountpoint
  1. rbd config global set global rbd_persistent_cache_path path
    After mounting, make sure that DAX is indeed enabled
    Check for something like "EXT4-fs (pmem0): DAX enabled ..." in dmesg
Actions

Also available in: Atom PDF