Project

General

Profile

Actions

Bug #55534

closed

Persistent write back cache - Error message needs improvement for corrupted cache with appropriate message instead "No space left on device

Added by Preethi Nataraj almost 2 years ago. Updated almost 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem:Persistent write back cache - Error message needs improvement for corrupted cache with appropriate message instead "No space left on device "

Version-Release number of selected component (if applicable):
ceph version 16.2.7-106.el8cp (83a8e200569d52a42ad69374c2d4cfd39921b24d) pacific (stable)
[root@intel-purley-lr-02 pmem]#

How reproducible:

Pre-req
1. Working ceph cluster
2. client node with pemem
3. # ceph config set client rbd_persistent_cache_mode rwl
4. # ceph config set client rbd_plugins pwl_cache

Steps to enable DAX
List the ndctl (must include the pmem as below)
[root@intel-purley-02 tmp]# ndctl list {
"dev":"namespace0.0",
"mode":"fsdax",
"map":"dev",
"size":12681478144,
"uuid":"c5dbfb44-fe3a-42ac-8331-8df3187e7d74",
"sector_size":512,
"align":2097152,
"Blockdev":"pmem0"
}
mkfs.ext4 /dev/pmem0
mount -o dax=always /dev/pmem0 <mountpoint>
And then set rbd_persistent_cache_path to the mountpoint
  1. rbd config global set global rbd_persistent_cache_path path
    After mounting, make sure that DAX is indeed enabled
    Check for something like "EXT4-fs (pmem0): DAX enabled ..." in dmesg

Steps to Reproduce:
1) wite data using RBD bench to pmem/image after few minutes abort, cache file present in path and not flushed to OSDs
2) start FIO write with different pool/image name i.e pmem1/image and then observe the errors

output snippet:

^Coot@intel-purley-lr-02 pmem]# Jobs: 1 (f=0): [/(1),X(1)][-.-%][eta 09m:56s]
fio: io_u error on file test-1.0.0: No space left on device: write offset=4096, buflen=4096
fio: pid=96033, err=28/file:io_u.c:1803, func=io_u error, error=No space left on device
Jobs: 1 (f=1): [f(1),X(1)][-.-%][eta 00m:00s]
test-1: (groupid=0, jobs=2): err=28 (file:io_u.c:1803, func=io_u error, error=No space left on device): pid=96033: Fri Apr 29 06:57:59 2022
cpu : usr=0.00%, sys=0.00%, ctx=10, majf=0, minf=30
IO depths : 1=12.5%, 2=25.0%, 4=50.0%, 8=12.5%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,16,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
we are seeing IO error and No apce left on device
this needs manual flush or invalidate cache command

Expected Results:
This is expected. If the corrupted cache is not cleared, it will give out error, the error msg should be more helpful instead of showing user as "no space left on device " which is incorrect

Actions

Also available in: Atom PDF