Actions
Bug #24403
closedmon failed to return metadata for mds
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:
100%
Source:
Community (user)
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hello,
Redigging an error found into the ceph-users mailing list: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/026241.html
From there, it seems to be related to a mds-mgr communication issue ?
I have the same issue with a small cluster: spamming log error messages like in the mailing list for the active mgr, and
telegeo02:~ # ceph --cluster geoceph mds metadata sen2agriprod Error ENOENT: telegeo02:~ # ceph --cluster geoceph mds metadata [ { "name": "sen2agriprod" }, { "name": "geo09" }, { "name": "telegeo02", "addr": "10.36.2.2:6800/737495544", "arch": "x86_64", "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)", "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", "distro": "opensuse", "distro_description": "openSUSE Leap 42.3", "distro_version": "42.3", "hostname": "telegeo02", "kernel_description": "#1 SMP Sat Apr 7 05:22:50 UTC 2018 (f24992c)", "kernel_version": "4.4.126-48-default", "mem_swap_kb": "2104316", "mem_total_kb": "131933332", "os": "Linux" } ]
I'm using 2 mds servers and 1 backup:
cluster: id: c27607d1-9852-4aa2-b953-b5e3fa3845ea health: HEALTH_WARN 12410/2950041 objects misplaced (0.421%) Degraded data redundancy: 4353/2950041 objects degraded (0.148%), 33 pgs degraded, 33 pgs undersized services: mon: 3 daemons, quorum telegeo02,geo09,sen2agriprod mgr: sen2agriprod(active), standbys: geo09, telegeo02 mds: cephfs-2/2/2 up {0=geo09=up:active,1=sen2agriprod=up:active}, 1 up:standby osd: 80 osds: 77 up, 77 in; 95 remapped pgs data: pools: 2 pools, 384 pgs objects: 134k objects, 1973 GB usage: 4733 GB used, 253 TB / 258 TB avail pgs: 4353/2950041 objects degraded (0.148%) 12410/2950041 objects misplaced (0.421%) 256 active+clean 95 active+clean+remapped 33 active+undersized+degraded io: client: 10836 kB/s wr, 0 op/s rd, 17 op/s wr
I've some issues with the kernel client (I/O error with no pattern, no log), and wonder if it could be related.
Thanks !
Files
Actions