Project

General

Profile

Bug #55534

Updated by Deepika Upadhyay almost 2 years ago

Description of problem:Persistent write back cache - Error message needs improvement for corrupted cache with appropriate message instead "No space left on device " 

 Version-Release number of selected component (if applicable): 
 ceph version 16.2.7-106.el8cp (83a8e200569d52a42ad69374c2d4cfd39921b24d) pacific (stable) 
 [root@intel-purley-lr-02 pmem]#  


 How reproducible: 

 Pre-req 
 1. Working ceph cluster 
 2. client node with pemem 
 3. # ceph config set client rbd_persistent_cache_mode rwl 
 4. # ceph config set client rbd_plugins pwl_cache 

 Steps to enable DAX 
 List the ndctl (must include the pmem as below) 
 [root@intel-purley-02 tmp]# ndctl list 
  { 
     "dev":"namespace0.0", 
     "mode":"fsdax", 
     "map":"dev", 
     "size":12681478144, 
     "uuid":"c5dbfb44-fe3a-42ac-8331-8df3187e7d74", 
     "sector_size":512, 
     "align":2097152, 
     "Blockdev":"pmem0" 
   } 
 mkfs.ext4 /dev/pmem0 
 mount -o dax=always /dev/pmem0 <mountpoint> 
 And then set rbd_persistent_cache_path to the mountpoint 
 # rbd config global set global rbd_persistent_cache_path path 
 After mounting, make sure that DAX is indeed enabled 
 Check for something like "EXT4-fs (pmem0): DAX enabled ..." in dmesg 

 Steps to Reproduce: 
 1) wite data using RBD bench to pmem/image after few minutes abort, cache file present in path and not flushed to OSDs 
 2) start FIO write with different pool/image name i.e pmem1/image and then observe the errors 

 output snippet: 

 ^Coot@intel-purley-lr-02 pmem]# Jobs: 1 (f=0): [/(1),X(1)][-.-%][eta 09m:56s] 
 fio: io_u error on file test-1.0.0: No space left on device: write offset=4096, buflen=4096 
 fio: pid=96033, err=28/file:io_u.c:1803, func=io_u error, error=No space left on device 
 Jobs: 1 (f=1): [f(1),X(1)][-.-%][eta 00m:00s] 
 test-1: (groupid=0, jobs=2): err=28 (file:io_u.c:1803, func=io_u error, error=No space left on device): pid=96033: Fri Apr 29 06:57:59 2022 
   cpu            : usr=0.00%, sys=0.00%, ctx=10, majf=0, minf=30 
   IO depths      : 1=12.5%, 2=25.0%, 4=50.0%, 8=12.5%, 16=0.0%, 32=0.0%, >=64=0.0% 
      submit      : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
      complete    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
      issued rwts: total=0,16,0,0 short=0,0,0,0 dropped=0,0,0,0 
      latency     : target=0, window=0, percentile=100.00%, depth=8 
 we are seeing IO error and No apce left on device 
 this needs manual flush or invalidate cache command  


 Expected Results: 
 This is expected. If the corrupted cache is not cleared, it will give out error, the error msg should be more helpful instead of showing user as "no space left on device " which is incorrect 



 Additional info: 

 Additional info: 
 cluster details magna021  
 pmem client details - root@intel-purley-lr-02.7a2m.lab.eng.bos.redhat.com 
 password -    QwAo2U6GRxyNPKiZaOCx 

Back