Project

General

Profile

Bug #44976

Updated by Patrick Donnelly about 4 years ago

I originally posted this on the nfs-ganesha issues github. They referenced me here since they think it is related to a problem in libcephfs or mds. They were thinking that the Attr_Expiration_Time = 0 was agitating an underlying issue rather than the issue itself. 
 I am planning on upgrading to octopus. I figured I would wait a couple weeks before doing that while we figure this out. 

 OS: Centos 7.7.1908 
 Versions: 
 Ceph - 14.2.8 
 NFS-Ganesha - 2.8.3 

 12 Days ago I upgraded to 14.2.8. From 14.2.7. I started seeing the cache pressure messages after that upgrade. 
 The cephfs system sat rather dormant for over a week. Last week on Thursday started seeing all of this stuff 

 Firstly, whether I have 1 MDS or 3, I tend to see the clients failing to respond to cache pressure. 
 Also, I always find that the MDS reports damaged metadata and pops back up after doing a scrub. 

 If I have 3 MDSes I start to get 3 MDS report slow requests. The requests are failing to RDLOCK/WRLOCK. 
 I should be able to recreate this I just want to make sure I am getting all of the logs needed when I recreate it. 
 Let me know which you would like? That way I get right log levels and the right logs. 

 The cluster has existed for almost a year(10 months). Version was nautilus this entire time. 
 No disk failures or hardware failures the entire time. 
 The nfs cephfs has existed pretty much since the beginning with exact same use case. 

 I can repair the metadata but it will eventually show right back up. 

 Also once the filesystem gets pretty bad with 3 mds and metadata damange. 
 The slow requests issue will arise just by listing files in a directory from a kernel mount. 

 NFS Ganesha logs on debug shows entirely nothing. If I increase to 3 mdses, then it starts to show this. 
 03/04/2020 13:57:51 : epoch 5e866ee0 : cmp04: ganesha.nfsd-1096024[svc_595] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT 


 I am using autoscale for this cluster. So besides the cephfs warnings that is the only thing I get. 
 These are the messages with just 1 MDS. 
 I plan to switcch to the multiple MDS and recreate the slow requests RDLOCK/WRLOCK issue. 

 ALL OF THE FOLLOWING IS 1 MDS. Like I said it gets much worse with 3. 

 <pre> 
 [root@cmp03 ~]# ceph -s 
   cluster: 
     id:       FSID 
     health: HEALTH_WARN 
             2 clients failing to respond to cache pressure 
             1 MDSs report damaged metadata 
             too few PGs per OSD (15 < min 30) 

   services: 
     mon: 3 daemons, quorum mon01,mon02,mon03 (age 12d) 
     mgr: mon01(active, since 12d), standbys: mon02, mon03 
     mds: nfs:1 {0=cmp03=up:active} 3 up:standby 
     osd: 64 osds: 64 up (since 12d), 64 in (since 9w) 

   data: 
     pools:     8 pools, 336 pgs 
     objects: 4.78M objects, 6.6 TiB 
     usage:     43 TiB used, 446 TiB / 488 TiB avail 
     pgs:       336 active+clean 

   io: 
     client:     0 B/s rd, 338 KiB/s wr, 14 op/s rd, 16 op/s wr 

 [root@cmp03 ~]# ceph health detail 
 HEALTH_ERR 2 clients failing to respond to cache pressure; 1 MDSs report damaged metadata; too few PGs per OSD (15 < min 30) 
 MDS_CLIENT_RECALL 2 clients failing to respond to cache pressure 
     mdscmp03(mds.0): Client cmp04. failing to respond to cache pressure client_id: 106550698 
     mdscmp03(mds.0): Client cmp04. failing to respond to cache pressure client_id: 106550713 
 MDS_DAMAGE 1 MDSs report damaged metadata 
     mdscmp03(mds.0): Metadata damage detected 
 TOO_FEW_PGS too few PGs per OSD (15 < min 30) 

 [root@cmp04 ~]# ceph tell mds.cmp03 client ls 
 2020-04-07 08:04:02.930 7ff9457fa700    0 client.107373917 ms_handle_reset on v2:10.10.48.102:6800/3250397436 
 2020-04-07 08:04:02.963 7ff9467fc700    0 client.107312857 ms_handle_reset on v2:10.10.48.102:6800/3250397436 
 [ 
     { 
         "id": 106550713, 
         "entity": { 
             "name": { 
                 "type": "client", 
                 "num": 106550713 
             }, 
             "addr": { 
                 "type": "any", 
                 "addr": "10.10.48.103:0", 
                 "nonce": 452355589 
             } 
         }, 
         "state": "open", 
         "num_leases": 0, 
         "num_caps": 95382, 
         "request_load_avg": 0, 
         "uptime": 59384.837150430998, 
         "requests_in_flight": 0, 
         "completed_requests": 5, 
         "reconnecting": false, 
         "recall_caps": { 
             "value": 286887.99426187889, 
             "halflife": 60 
         }, 
         "release_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "recall_caps_throttle": { 
             "value": 10319.76544103905, 
             "halflife": 2.5 
         }, 
         "recall_caps_throttle2o": { 
             "value": 815.64572559331623, 
             "halflife": 0.5 
         }, 
         "session_cache_liveness": { 
             "value": 0, 
             "halflife": 300 
         }, 
         "inst": "client.106550713 10.10.48.103:0/452355589", 
         "completed_requests": [ 
             { 
                 "tid": 1367968, 
                 "created_ino": "0x0" 
             }, 
             { 
                 "tid": 1367969, 
                 "created_ino": "0x0" 
             }, 
             { 
                 "tid": 1367975, 
                 "created_ino": "0x0" 
             }, 
             { 
                 "tid": 1367977, 
                 "created_ino": "0x0" 
             }, 
             { 
                 "tid": 1367979, 
                 "created_ino": "0x0" 
             } 
         ], 
         "prealloc_inos": [ 
             { 
                 "start": 1099522467650, 
                 "length": 287 
             }, 
             { 
                 "start": 1099522469581, 
                 "length": 505 
             } 
         ], 
         "used_inos": [], 
         "client_metadata": { 
             "features": "0x0000000000001fff", 
             "ceph_sha1": "2d095e947a02261ce61424021bb43bd3022d35cb", 
             "ceph_version": "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)", 
             "entity_id": "ganesha-9209657c-3804-4c42-99e3-d669f936a3f4", 
             "hostname": "cmp04.", 
             "pid": "1630708", 
             "root": "/volumes/_nogroup/9209657c-3804-4c42-99e3-d669f936a3f4", 
             "timeout": "300", 
             "uuid": "ganesha-cmp04.-0411" 
         } 
     }, 
     { 
         "id": 107031118, 
         "entity": { 
             "name": { 
                 "type": "client", 
                 "num": 107031118 
             }, 
             "addr": { 
                 "type": "v1", 
                 "addr": "10.10.48.103:0", 
                 "nonce": 3527069069 
             } 
         }, 
         "state": "open", 
         "num_leases": 0, 
         "num_caps": 1, 
         "request_load_avg": 0, 
         "uptime": 59384.837150430998, 
         "requests_in_flight": 0, 
         "completed_requests": 0, 
         "reconnecting": false, 
         "recall_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "release_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "recall_caps_throttle": { 
             "value": 0, 
             "halflife": 2.5 
         }, 
         "recall_caps_throttle2o": { 
             "value": 0, 
             "halflife": 0.5 
         }, 
         "session_cache_liveness": { 
             "value": 0, 
             "halflife": 300 
         }, 
         "inst": "client.107031118 v1:10.10.48.103:0/3527069069", 
         "completed_requests": [], 
         "prealloc_inos": [], 
         "used_inos": [], 
         "client_metadata": { 
             "features": "0x00000000000000ff", 
             "entity_id": "admin", 
             "hostname": "cmp04.", 
             "kernel_version": "3.10.0-1062.12.1.el7.x86_64", 
             "root": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7" 
         } 
     }, 
     { 
         "id": 106550698, 
         "entity": { 
             "name": { 
                 "type": "client", 
                 "num": 106550698 
             }, 
             "addr": { 
                 "type": "any", 
                 "addr": "10.10.48.103:0", 
                 "nonce": 4140370337 
             } 
         }, 
         "state": "open", 
         "num_leases": 0, 
         "num_caps": 4771, 
         "request_load_avg": 0, 
         "uptime": 59384.837150430998, 
         "requests_in_flight": 0, 
         "completed_requests": 0, 
         "reconnecting": false, 
         "recall_caps": { 
             "value": 302690.92799215915, 
             "halflife": 60 
         }, 
         "release_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "recall_caps_throttle": { 
             "value": 12256.831612812737, 
             "halflife": 2.5 
         }, 
         "recall_caps_throttle2o": { 
             "value": 2601.7867175086562, 
             "halflife": 0.5 
         }, 
         "session_cache_liveness": { 
             "value": 0, 
             "halflife": 300 
         }, 
         "inst": "client.106550698 10.10.48.103:0/4140370337", 
         "completed_requests": [], 
         "prealloc_inos": [ 
             { 
                 "start": 1099522471904, 
                 "length": 687 
             } 
         ], 
         "used_inos": [], 
         "client_metadata": { 
             "features": "0x0000000000001fff", 
             "ceph_sha1": "2d095e947a02261ce61424021bb43bd3022d35cb", 
             "ceph_version": "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)", 
             "entity_id": "ganesha-a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7", 
             "hostname": "cmp04.", 
             "pid": "1630708", 
             "root": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7", 
             "timeout": "300", 
             "uuid": "ganesha-cmp04.-03fe" 
         } 
     }, 
     { 
         "id": 106103709, 
         "entity": { 
             "name": { 
                 "type": "client", 
                 "num": 106103709 
             }, 
             "addr": { 
                 "type": "v1", 
                 "addr": "10.10.48.103:0", 
                 "nonce": 1266568608 
             } 
         }, 
         "state": "open", 
         "num_leases": 0, 
         "num_caps": 3741, 
         "request_load_avg": 3, 
         "uptime": 59384.837150430998, 
         "requests_in_flight": 0, 
         "completed_requests": 1, 
         "reconnecting": false, 
         "recall_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "release_caps": { 
             "value": 0, 
             "halflife": 60 
         }, 
         "recall_caps_throttle": { 
             "value": 0, 
             "halflife": 2.5 
         }, 
         "recall_caps_throttle2o": { 
             "value": 0, 
             "halflife": 0.5 
         }, 
         "session_cache_liveness": { 
             "value": 136.57798974559933, 
             "halflife": 300 
         }, 
         "inst": "client.106103709 v1:10.10.48.103:0/1266568608", 
         "completed_requests": [ 
             { 
                 "tid": 4537378, 
                 "created_ino": "0x0" 
             } 
         ], 
         "prealloc_inos": [], 
         "used_inos": [], 
         "client_metadata": { 
             "features": "0x00000000000000ff", 
             "entity_id": "admin", 
             "hostname": "cmp04.", 
             "kernel_version": "3.10.0-1062.12.1.el7.x86_64", 
             "root": "/volumes/_nogroup/9209657c-3804-4c42-99e3-d669f936a3f4" 
         } 
     } 
 ] 

 [root@cmp04 ~]# ceph tell mds.cmp03 damage ls 
 .... 
 7000+ backtrace errors 
 .... 
 { 
     "damage_type": "backtrace", 
     "id": 4289458104, 
     "ino": 1099522468770, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag45_nearest10_hog/msm_6tic_N11_lag45.txt" 
 }, 
 { 
     "damage_type": "backtrace", 
     "id": 4291358526, 
     "ino": 1099522468884, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3_nearest10_hog/state_apo_6tic_N11_lag3_9.pdb" 
 }, 
 { 
     "damage_type": "backtrace", 
     "id": 4291586401, 
     "ino": 1099522467412, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag25_nearest10_hog/state_apo_6tic_N11_lag25_1.pdb" 
 }, 
 { 
     "damage_type": "backtrace", 
     "id": 4293141981, 
     "ino": 1099522471402, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/corr_frames10/p450_apoN11_lag83_spearmancorr.npy" 
 }, 
 { 
     "damage_type": "backtrace", 
     "id": 4293413627, 
     "ino": 1099522467311, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag29_nearest10_hog/state_apo_6tic_N11_lag29_0.pdb" 
 }, 
 { 
     "damage_type": "backtrace", 
     "id": 4294040932, 
     "ino": 1099522468824, 
     "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag31_nearest10_hog/state_apo_6tic_N11_lag31_10.pdb" 


 [root@cmp03 ~]# cat /etc/ceph/ceph.conf 
 [global] 
 fsid = FSID 
 mon_initial_members = mon01, mon02, mon03 
 mon_host = 10.10.48.27,10.10.48.28,10.10.48.29 
 auth_cluster_required = cephx 
 auth_service_required = cephx 
 auth_client_required = cephx 

 public_network = 10.10.48.0/24 
 cluster_network= 10.10.53.0/24 

 mon_allow_pool_delete = true 

 log_to_syslog = true 
 err_to_syslog = true 
 mon_cluster_log_to_syslog = true 
 clog_to_syslog = true 

 [mds] 
 mds_cache_memory_limit = 68719476736 
 mds_cache_reservation = 0.05 

 [osd] 
 bluestore_cache_size_hdd = 8589934592 
 bluestore_cache_size_ssd = 8589934592 

 [client] 
 rbd_cache = true 
 rbd_cache_size = 67108864 
 rbd_cache_max_dirty = 50331648 
 rbd_cache_target_dirty = 33554432 
 rbd_cache_max_dirty_age = 2 
 rbd_cache_writethrough_until_flush = true 
 rbd_concurrent_management_ops = 20 

 [client.manila] 
 client_mount_uid = 0 
 client_mount_gid = 0 
 keyring = /etc/ceph/manila.keyring 

 [root@cmp04 ~]# cat /etc/ganesha/ceph.conf 
 LOG 
 { 
	 Default_Log_Level = DEBUG; 
 } 
 NFS_CORE_PARAM 
 { 
	 Enable_NLM = false; 
	 Enable_RQUOTA = false; 
	 Protocols = 4; 
 } 
 NFSv4 
 { 
	 Delegations = true; 
	 RecoveryBackend = rados_ng; 
	 Minor_Versions =    1,2; 
 } 
 MDCACHE { 
	 Dir_Chunk = 0; 
	 NParts = 1; 
	 Cache_Size = 1; 
 } 
 RADOS_URLS 
 { 
	 Ceph_Conf = /etc/ceph/ceph.conf; 
	 UserId = "admin"; 
	 watch_url = "rados://nfs-ganesha/ganesha-export-counter"; 
 } 
 %url rados://nfs-ganesha/ganesha-export-index 
 CEPH 
 { 
 } 
 RADOS_KV 
 { 
	 Ceph_Conf = /etc/ceph/ceph.conf; 
	 UserId = "admin"; 
	 pool = "nfs-ganesha"; 
 } 
 NFS_KRB5 
 { 
        Active_krb5 = false; 
 } 




 RADOS EXPORT EXAMPLE: 

 EXPORT { 
     CLIENT { 
         Clients = {ips} 
         Access_Type = "rw"; 
     } 

     SecType = "sys"; 
     Pseudo = "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7"; 
     Squash = "None"; 
     FSAL { 
         Secret_Access_Key = {secret} 
         User_Id = {user} 
         Name = "Ceph"; 
     } 
     Tag = "share-a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7"; 
     Path = "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7"; 
     Export_Id = 1022; 
 } 
 </pre> 



 




 Here are the logs while running, the scrub. About 7485 lines of this. All seem the same. 

 <pre> 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576dc (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb) see mds.cmp03 log and `damage ls` output for details 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576dc [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb auth v62 dirtyparent s=6229785 n(v0 rc2020-04-06 18:20:15.392548 b6229785 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@4},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a9bd6ee00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576dc:[<0x10000a576c9/state_apo_6tic_N11_lag3700_5.pdb v62>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2} 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576e7 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt) see mds.cmp03 log and `damage ls` output for details 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576e7 [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt auth v116 dirtyparent s=2065 n(v0 rc2020-04-06 18:20:52.995274 b2065 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@2},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a4541dc00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576e7:[<0x10000a576c9/msm_6tic_N11_lag3700.txt v116>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2} 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576c9 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog) see mds.cmp03 log and `damage ls` output for details 
 2020-04-07 07:54:16.326 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576c9 [...2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/ auth v103691 dirtyparent f(v0 m2020-04-06 18:20:52.988600 12=12+0) n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1) (inest lock) (ifile excl) (iversion lock) caps={106550698=pAsLsXsFsx/AsLsXsFsx@14},l=106550698 | request=0 lock=0 dirfrag=1 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 waiter=0 authpin=0 0x555a53c58700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(38)0x10000a576c9:[<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","ondisk_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","memory_value.dirrstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","memory_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","error_str":""},"return_code":-2} 
 2020-04-07 07:54:16.356 7f3308d21700    0 log_channel(cluster) log [INF] : scrub complete with tag 'ce3c7e20-71eb-4bb6-bbc5-d2463ba8ff38' 
 2020-04-07 07:54:16.356 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 07:54:18.698 7f330f52e700    1 mds.cmp03 Updating MDS map to version 26580 from mon.2 
 2020-04-07 09:17:58.270 7f330f52e700    1 mds.cmp03 Updating MDS map to version 26581 from mon.2 
 </pre> 

 


 Scrub Repair logs 

 <pre> 
 2020-04-07 09:23:44.230 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:23:44.230 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576dc [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb auth v62 dirtyparent s=6229785 n(v0 rc2020-04-06 18:20:15.392548 b6229785 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@4},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a9bd6ee00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576dc:[<0x10000a576c9/state_apo_6tic_N11_lag3700_5.pdb v62>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2} 
 2020-04-07 09:23:44.230 7f3308d21700    0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000a576e7(/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt), rewriting it 
 2020-04-07 09:23:44.230 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:23:44.230 7f3308d21700    0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000a576e7 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt) 
 2020-04-07 09:23:44.230 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:23:44.230 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576e7 [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt auth v116 dirtyparent s=2065 n(v0 rc2020-04-06 18:20:52.995274 b2065 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@2},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a4541dc00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576e7:[<0x10000a576c9/msm_6tic_N11_lag3700.txt v116>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2} 
 2020-04-07 09:23:44.231 7f3308d21700    0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000a576c9(/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog), rewriting it 
 2020-04-07 09:23:44.231 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:23:44.231 7f3308d21700    0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000a576c9 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog) 
 2020-04-07 09:23:44.231 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:23:44.231 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576c9 [...2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/ auth v103691 dirtyparent f(v0 m2020-04-06 18:20:52.988600 12=12+0) n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1) (inest lock) (ifile excl) (iversion lock) caps={106550698=pAsLsXsFsx/AsLsXsFsx@16},l=106550698 | request=0 lock=0 dirfrag=1 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 waiter=0 authpin=0 0x555a53c58700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(38)0x10000a576c9:[<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","ondisk_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","memory_value.dirrstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","memory_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","error_str":""},"return_code":-2} 
 2020-04-07 09:27:33.890 7f3308d21700    0 log_channel(cluster) log [INF] : scrub complete with tag '47c8a435-7fdd-40f1-879f-3cb435ce530f' 
 2020-04-07 09:27:33.890 7f3308d21700    0 log_channel(cluster) do_log log to syslog 
 2020-04-07 09:31:42.355 7f330f52e700    1 mds.cmp03 Updating MDS map to version 26582 from mon.2 
 </pre> h1.

Back