Project

General

Profile

Actions

Bug #44976

closed

MDS problem slow requests, cache pressure, damaged metadata after upgrading 14.2.7 to 14.2.8

Added by mitchell walls about 4 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Community (user)
Tags:
cephfs ganesha mds ERR_FSAL_SERVERFAULT
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(FS):
Ganesha FSAL
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I originally posted this on the nfs-ganesha issues github. They referenced me here since they think it is related to a problem in libcephfs or mds. They were thinking that the Attr_Expiration_Time = 0 was agitating an underlying issue rather than the issue itself.
I am planning on upgrading to octopus. I figured I would wait a couple weeks before doing that while we figure this out.

OS: Centos 7.7.1908
Versions:
Ceph - 14.2.8
NFS-Ganesha - 2.8.3

12 Days ago I upgraded to 14.2.8. From 14.2.7. I started seeing the cache pressure messages after that upgrade.
The cephfs system sat rather dormant for over a week. Last week on Thursday started seeing all of this stuff

Firstly, whether I have 1 MDS or 3, I tend to see the clients failing to respond to cache pressure.
Also, I always find that the MDS reports damaged metadata and pops back up after doing a scrub.

If I have 3 MDSes I start to get 3 MDS report slow requests. The requests are failing to RDLOCK/WRLOCK.
I should be able to recreate this I just want to make sure I am getting all of the logs needed when I recreate it.
Let me know which you would like? That way I get right log levels and the right logs.

The cluster has existed for almost a year(10 months). Version was nautilus this entire time.
No disk failures or hardware failures the entire time.
The nfs cephfs has existed pretty much since the beginning with exact same use case.

I can repair the metadata but it will eventually show right back up.

Also once the filesystem gets pretty bad with 3 mds and metadata damange.
The slow requests issue will arise just by listing files in a directory from a kernel mount.

NFS Ganesha logs on debug shows entirely nothing. If I increase to 3 mdses, then it starts to show this.
03/04/2020 13:57:51 : epoch 5e866ee0 : cmp04: ganesha.nfsd-1096024[svc_595] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT

I am using autoscale for this cluster. So besides the cephfs warnings that is the only thing I get.
These are the messages with just 1 MDS.
I plan to switcch to the multiple MDS and recreate the slow requests RDLOCK/WRLOCK issue.

ALL OF THE FOLLOWING IS 1 MDS. Like I said it gets much worse with 3.

[root@cmp03 ~]# ceph -s
  cluster:
    id:     FSID
    health: HEALTH_WARN
            2 clients failing to respond to cache pressure
            1 MDSs report damaged metadata
            too few PGs per OSD (15 < min 30)

  services:
    mon: 3 daemons, quorum mon01,mon02,mon03 (age 12d)
    mgr: mon01(active, since 12d), standbys: mon02, mon03
    mds: nfs:1 {0=cmp03=up:active} 3 up:standby
    osd: 64 osds: 64 up (since 12d), 64 in (since 9w)

  data:
    pools:   8 pools, 336 pgs
    objects: 4.78M objects, 6.6 TiB
    usage:   43 TiB used, 446 TiB / 488 TiB avail
    pgs:     336 active+clean

  io:
    client:   0 B/s rd, 338 KiB/s wr, 14 op/s rd, 16 op/s wr

[root@cmp03 ~]# ceph health detail
HEALTH_ERR 2 clients failing to respond to cache pressure; 1 MDSs report damaged metadata; too few PGs per OSD (15 < min 30)
MDS_CLIENT_RECALL 2 clients failing to respond to cache pressure
    mdscmp03(mds.0): Client cmp04. failing to respond to cache pressure client_id: 106550698
    mdscmp03(mds.0): Client cmp04. failing to respond to cache pressure client_id: 106550713
MDS_DAMAGE 1 MDSs report damaged metadata
    mdscmp03(mds.0): Metadata damage detected
TOO_FEW_PGS too few PGs per OSD (15 < min 30)

[root@cmp04 ~]# ceph tell mds.cmp03 client ls
2020-04-07 08:04:02.930 7ff9457fa700  0 client.107373917 ms_handle_reset on v2:10.10.48.102:6800/3250397436
2020-04-07 08:04:02.963 7ff9467fc700  0 client.107312857 ms_handle_reset on v2:10.10.48.102:6800/3250397436
[
    {
        "id": 106550713,
        "entity": {
            "name": {
                "type": "client",
                "num": 106550713
            },
            "addr": {
                "type": "any",
                "addr": "10.10.48.103:0",
                "nonce": 452355589
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 95382,
        "request_load_avg": 0,
        "uptime": 59384.837150430998,
        "requests_in_flight": 0,
        "completed_requests": 5,
        "reconnecting": false,
        "recall_caps": {
            "value": 286887.99426187889,
            "halflife": 60
        },
        "release_caps": {
            "value": 0,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 10319.76544103905,
            "halflife": 2.5
        },
        "recall_caps_throttle2o": {
            "value": 815.64572559331623,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 0,
            "halflife": 300
        },
        "inst": "client.106550713 10.10.48.103:0/452355589",
        "completed_requests": [
            {
                "tid": 1367968,
                "created_ino": "0x0" 
            },
            {
                "tid": 1367969,
                "created_ino": "0x0" 
            },
            {
                "tid": 1367975,
                "created_ino": "0x0" 
            },
            {
                "tid": 1367977,
                "created_ino": "0x0" 
            },
            {
                "tid": 1367979,
                "created_ino": "0x0" 
            }
        ],
        "prealloc_inos": [
            {
                "start": 1099522467650,
                "length": 287
            },
            {
                "start": 1099522469581,
                "length": 505
            }
        ],
        "used_inos": [],
        "client_metadata": {
            "features": "0x0000000000001fff",
            "ceph_sha1": "2d095e947a02261ce61424021bb43bd3022d35cb",
            "ceph_version": "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)",
            "entity_id": "ganesha-9209657c-3804-4c42-99e3-d669f936a3f4",
            "hostname": "cmp04.",
            "pid": "1630708",
            "root": "/volumes/_nogroup/9209657c-3804-4c42-99e3-d669f936a3f4",
            "timeout": "300",
            "uuid": "ganesha-cmp04.-0411" 
        }
    },
    {
        "id": 107031118,
        "entity": {
            "name": {
                "type": "client",
                "num": 107031118
            },
            "addr": {
                "type": "v1",
                "addr": "10.10.48.103:0",
                "nonce": 3527069069
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 1,
        "request_load_avg": 0,
        "uptime": 59384.837150430998,
        "requests_in_flight": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "recall_caps": {
            "value": 0,
            "halflife": 60
        },
        "release_caps": {
            "value": 0,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 0,
            "halflife": 2.5
        },
        "recall_caps_throttle2o": {
            "value": 0,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 0,
            "halflife": 300
        },
        "inst": "client.107031118 v1:10.10.48.103:0/3527069069",
        "completed_requests": [],
        "prealloc_inos": [],
        "used_inos": [],
        "client_metadata": {
            "features": "0x00000000000000ff",
            "entity_id": "admin",
            "hostname": "cmp04.",
            "kernel_version": "3.10.0-1062.12.1.el7.x86_64",
            "root": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7" 
        }
    },
    {
        "id": 106550698,
        "entity": {
            "name": {
                "type": "client",
                "num": 106550698
            },
            "addr": {
                "type": "any",
                "addr": "10.10.48.103:0",
                "nonce": 4140370337
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 4771,
        "request_load_avg": 0,
        "uptime": 59384.837150430998,
        "requests_in_flight": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "recall_caps": {
            "value": 302690.92799215915,
            "halflife": 60
        },
        "release_caps": {
            "value": 0,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 12256.831612812737,
            "halflife": 2.5
        },
        "recall_caps_throttle2o": {
            "value": 2601.7867175086562,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 0,
            "halflife": 300
        },
        "inst": "client.106550698 10.10.48.103:0/4140370337",
        "completed_requests": [],
        "prealloc_inos": [
            {
                "start": 1099522471904,
                "length": 687
            }
        ],
        "used_inos": [],
        "client_metadata": {
            "features": "0x0000000000001fff",
            "ceph_sha1": "2d095e947a02261ce61424021bb43bd3022d35cb",
            "ceph_version": "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)",
            "entity_id": "ganesha-a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7",
            "hostname": "cmp04.",
            "pid": "1630708",
            "root": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7",
            "timeout": "300",
            "uuid": "ganesha-cmp04.-03fe" 
        }
    },
    {
        "id": 106103709,
        "entity": {
            "name": {
                "type": "client",
                "num": 106103709
            },
            "addr": {
                "type": "v1",
                "addr": "10.10.48.103:0",
                "nonce": 1266568608
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 3741,
        "request_load_avg": 3,
        "uptime": 59384.837150430998,
        "requests_in_flight": 0,
        "completed_requests": 1,
        "reconnecting": false,
        "recall_caps": {
            "value": 0,
            "halflife": 60
        },
        "release_caps": {
            "value": 0,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 0,
            "halflife": 2.5
        },
        "recall_caps_throttle2o": {
            "value": 0,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 136.57798974559933,
            "halflife": 300
        },
        "inst": "client.106103709 v1:10.10.48.103:0/1266568608",
        "completed_requests": [
            {
                "tid": 4537378,
                "created_ino": "0x0" 
            }
        ],
        "prealloc_inos": [],
        "used_inos": [],
        "client_metadata": {
            "features": "0x00000000000000ff",
            "entity_id": "admin",
            "hostname": "cmp04.",
            "kernel_version": "3.10.0-1062.12.1.el7.x86_64",
            "root": "/volumes/_nogroup/9209657c-3804-4c42-99e3-d669f936a3f4" 
        }
    }
]

[root@cmp04 ~]# ceph tell mds.cmp03 damage ls
....
7000+ backtrace errors
....
{
    "damage_type": "backtrace",
    "id": 4289458104,
    "ino": 1099522468770,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag45_nearest10_hog/msm_6tic_N11_lag45.txt" 
},
{
    "damage_type": "backtrace",
    "id": 4291358526,
    "ino": 1099522468884,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3_nearest10_hog/state_apo_6tic_N11_lag3_9.pdb" 
},
{
    "damage_type": "backtrace",
    "id": 4291586401,
    "ino": 1099522467412,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag25_nearest10_hog/state_apo_6tic_N11_lag25_1.pdb" 
},
{
    "damage_type": "backtrace",
    "id": 4293141981,
    "ino": 1099522471402,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/corr_frames10/p450_apoN11_lag83_spearmancorr.npy" 
},
{
    "damage_type": "backtrace",
    "id": 4293413627,
    "ino": 1099522467311,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag29_nearest10_hog/state_apo_6tic_N11_lag29_0.pdb" 
},
{
    "damage_type": "backtrace",
    "id": 4294040932,
    "ino": 1099522468824,
    "path": "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag31_nearest10_hog/state_apo_6tic_N11_lag31_10.pdb" 

[root@cmp03 ~]# cat /etc/ceph/ceph.conf
[global]
fsid = FSID
mon_initial_members = mon01, mon02, mon03
mon_host = 10.10.48.27,10.10.48.28,10.10.48.29
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public_network = 10.10.48.0/24
cluster_network= 10.10.53.0/24

mon_allow_pool_delete = true

log_to_syslog = true
err_to_syslog = true
mon_cluster_log_to_syslog = true
clog_to_syslog = true

[mds]
mds_cache_memory_limit = 68719476736
mds_cache_reservation = 0.05

[osd]
bluestore_cache_size_hdd = 8589934592
bluestore_cache_size_ssd = 8589934592

[client]
rbd_cache = true
rbd_cache_size = 67108864
rbd_cache_max_dirty = 50331648
rbd_cache_target_dirty = 33554432
rbd_cache_max_dirty_age = 2
rbd_cache_writethrough_until_flush = true
rbd_concurrent_management_ops = 20

[client.manila]
client_mount_uid = 0
client_mount_gid = 0
keyring = /etc/ceph/manila.keyring

[root@cmp04 ~]# cat /etc/ganesha/ceph.conf
LOG
{
    Default_Log_Level = DEBUG;
}
NFS_CORE_PARAM
{
    Enable_NLM = false;
    Enable_RQUOTA = false;
    Protocols = 4;
}
NFSv4
{
    Delegations = true;
    RecoveryBackend = rados_ng;
    Minor_Versions =  1,2;
}
MDCACHE {
    Dir_Chunk = 0;
    NParts = 1;
    Cache_Size = 1;
}
RADOS_URLS
{
    Ceph_Conf = /etc/ceph/ceph.conf;
    UserId = "admin";
    watch_url = "rados://nfs-ganesha/ganesha-export-counter";
}
%url rados://nfs-ganesha/ganesha-export-index
CEPH
{
}
RADOS_KV
{
    Ceph_Conf = /etc/ceph/ceph.conf;
    UserId = "admin";
    pool = "nfs-ganesha";
}
NFS_KRB5
{
       Active_krb5 = false;
}

RADOS EXPORT EXAMPLE:

EXPORT {
    CLIENT {
        Clients = {ips}
        Access_Type = "rw";
    }

    SecType = "sys";
    Pseudo = "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7";
    Squash = "None";
    FSAL {
        Secret_Access_Key = {secret}
        User_Id = {user}
        Name = "Ceph";
    }
    Tag = "share-a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7";
    Path = "/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7";
    Export_Id = 1022;
}

Here are the logs while running, the scrub. About 7485 lines of this. All seem the same.

2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576dc (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb) see mds.cmp03 log and `damage ls` output for details
2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576dc [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb auth v62 dirtyparent s=6229785 n(v0 rc2020-04-06 18:20:15.392548 b6229785 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@4},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a9bd6ee00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576dc:[<0x10000a576c9/state_apo_6tic_N11_lag3700_5.pdb v62>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2}
2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576e7 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt) see mds.cmp03 log and `damage ls` output for details
2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576e7 [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt auth v116 dirtyparent s=2065 n(v0 rc2020-04-06 18:20:52.995274 b2065 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@2},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a4541dc00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576e7:[<0x10000a576c9/msm_6tic_N11_lag3700.txt v116>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2}
2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000a576c9 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog) see mds.cmp03 log and `damage ls` output for details
2020-04-07 07:54:16.326 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 07:54:16.326 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576c9 [...2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/ auth v103691 dirtyparent f(v0 m2020-04-06 18:20:52.988600 12=12+0) n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1) (inest lock) (ifile excl) (iversion lock) caps={106550698=pAsLsXsFsx/AsLsXsFsx@14},l=106550698 | request=0 lock=0 dirfrag=1 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 waiter=0 authpin=0 0x555a53c58700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(38)0x10000a576c9:[<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104037>,<0x1/volumes v1692870>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","ondisk_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","memory_value.dirrstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","memory_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","error_str":""},"return_code":-2}
2020-04-07 07:54:16.356 7f3308d21700  0 log_channel(cluster) log [INF] : scrub complete with tag 'ce3c7e20-71eb-4bb6-bbc5-d2463ba8ff38'
2020-04-07 07:54:16.356 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 07:54:18.698 7f330f52e700  1 mds.cmp03 Updating MDS map to version 26580 from mon.2
2020-04-07 09:17:58.270 7f330f52e700  1 mds.cmp03 Updating MDS map to version 26581 from mon.2

Scrub Repair logs

2020-04-07 09:23:44.230 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:23:44.230 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576dc [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/state_apo_6tic_N11_lag3700_5.pdb auth v62 dirtyparent s=6229785 n(v0 rc2020-04-06 18:20:15.392548 b6229785 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@4},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a9bd6ee00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576dc:[<0x10000a576c9/state_apo_6tic_N11_lag3700_5.pdb v62>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2}
2020-04-07 09:23:44.230 7f3308d21700  0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000a576e7(/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt), rewriting it
2020-04-07 09:23:44.230 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:23:44.230 7f3308d21700  0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000a576e7 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt)
2020-04-07 09:23:44.230 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:23:44.230 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576e7 [2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/msm_6tic_N11_lag3700.txt auth v116 dirtyparent s=2065 n(v0 rc2020-04-06 18:20:52.995274 b2065 1=1+0) (iversion lock) caps={106550698=pAsLsXsFsc/-@2},l=106550698(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x555a4541dc00]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(39)0x10000a576e7:[<0x10000a576c9/msm_6tic_N11_lag3700.txt v116>,<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-2}
2020-04-07 09:23:44.231 7f3308d21700  0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000a576c9(/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog), rewriting it
2020-04-07 09:23:44.231 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:23:44.231 7f3308d21700  0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000a576c9 (/volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog)
2020-04-07 09:23:44.231 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:23:44.231 7f3308d21700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000a576c9 [...2,head] /volumes/_nogroup/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7/namdoutput/analysis_apo/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog/ auth v103691 dirtyparent f(v0 m2020-04-06 18:20:52.988600 12=12+0) n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1) (inest lock) (ifile excl) (iversion lock) caps={106550698=pAsLsXsFsx/AsLsXsFsx@16},l=106550698 | request=0 lock=0 dirfrag=1 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 waiter=0 authpin=0 0x555a53c58700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-2,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(38)0x10000a576c9:[<0x10000003c9f/tunneltriangle_20dcds_6ticN11_lag3700_nearest10_hog v103691>,<0x40000000f4e/analysis_apo v189475>,<0x400000011a5/namdoutput v89091>,<0x10000000001/a6c48cd4-17e8-49df-9a4a-0f8f8b4364c7 v3379713>,<0x10000000000/_nogroup v3104143>,<0x1/volumes v1692976>]//","error_str":"failed to decode on-disk backtrace (0 bytes)!"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","ondisk_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","memory_value.dirrstat":"f(v0 m2020-04-06 18:20:52.988600 12=12+0)","memory_value.rstat":"n(v0 rc2020-04-06 18:20:52.995274 b68529700 13=12+1)","error_str":""},"return_code":-2}
2020-04-07 09:27:33.890 7f3308d21700  0 log_channel(cluster) log [INF] : scrub complete with tag '47c8a435-7fdd-40f1-879f-3cb435ce530f'
2020-04-07 09:27:33.890 7f3308d21700  0 log_channel(cluster) do_log log to syslog
2020-04-07 09:31:42.355 7f330f52e700  1 mds.cmp03 Updating MDS map to version 26582 from mon.2

Files

ceph-mds.jy-cmp04.log-20200403.gz (72.5 KB) ceph-mds.jy-cmp04.log-20200403.gz Day it starts mitchell walls, 04/07/2020 03:21 PM
ceph-mds.jy-cmp04.log-20200404.gz (223 KB) ceph-mds.jy-cmp04.log-20200404.gz Debugging day mitchell walls, 04/07/2020 03:21 PM
ceph-mds.jy-cmp03.log-20200404.gz (400 KB) ceph-mds.jy-cmp03.log-20200404.gz mitchell walls, 04/07/2020 03:23 PM
ceph-mds.jy-cmp03.log-20200403.gz (129 KB) ceph-mds.jy-cmp03.log-20200403.gz mitchell walls, 04/07/2020 03:23 PM
ceph-mds.jy-blk02.log-20200403.gz (181 KB) ceph-mds.jy-blk02.log-20200403.gz mitchell walls, 04/07/2020 03:23 PM
ceph-mds.jy-blk01.log-20200404.gz (261 KB) ceph-mds.jy-blk01.log-20200404.gz mitchell walls, 04/07/2020 03:23 PM
ceph-mds.jy-blk01.log-20200403.gz (130 KB) ceph-mds.jy-blk01.log-20200403.gz mitchell walls, 04/07/2020 03:23 PM
ceph-mds.jy-blk02.log-20200404-part1.log.gz (800 KB) ceph-mds.jy-blk02.log-20200404-part1.log.gz mitchell walls, 04/07/2020 03:34 PM
ceph-mds.jy-blk02.log-20200404-part2.log.gz (755 KB) ceph-mds.jy-blk02.log-20200404-part2.log.gz mitchell walls, 04/07/2020 03:34 PM

Related issues 1 (0 open1 closed)

Related to CephFS - Feature #12334: nfs-ganesha: handle client cache pressure in NFS Ganesha FSALResolvedJeff Layton

Actions
Actions

Also available in: Atom PDF