Project

General

Profile

Actions

Bug #24403

closed

mon failed to return metadata for mds

Added by Thomas De Maet almost 6 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
% Done:

100%

Source:
Community (user)
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Redigging an error found into the ceph-users mailing list: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/026241.html
From there, it seems to be related to a mds-mgr communication issue ?

I have the same issue with a small cluster: spamming log error messages like in the mailing list for the active mgr, and

telegeo02:~ # ceph --cluster geoceph mds metadata sen2agriprod
Error ENOENT: 
telegeo02:~ # ceph --cluster geoceph mds metadata 
[
    {
        "name": "sen2agriprod" 
    },
    {
        "name": "geo09" 
    },
    {
        "name": "telegeo02",
        "addr": "10.36.2.2:6800/737495544",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "telegeo02",
        "kernel_description": "#1 SMP Sat Apr 7 05:22:50 UTC 2018 (f24992c)",
        "kernel_version": "4.4.126-48-default",
        "mem_swap_kb": "2104316",
        "mem_total_kb": "131933332",
        "os": "Linux" 
    }
]

I'm using 2 mds servers and 1 backup:

  cluster:
    id:     c27607d1-9852-4aa2-b953-b5e3fa3845ea
    health: HEALTH_WARN
            12410/2950041 objects misplaced (0.421%)
            Degraded data redundancy: 4353/2950041 objects degraded (0.148%), 33 pgs degraded, 33 pgs undersized

  services:
    mon: 3 daemons, quorum telegeo02,geo09,sen2agriprod
    mgr: sen2agriprod(active), standbys: geo09, telegeo02
    mds: cephfs-2/2/2 up  {0=geo09=up:active,1=sen2agriprod=up:active}, 1 up:standby
    osd: 80 osds: 77 up, 77 in; 95 remapped pgs

  data:
    pools:   2 pools, 384 pgs
    objects: 134k objects, 1973 GB
    usage:   4733 GB used, 253 TB / 258 TB avail
    pgs:     4353/2950041 objects degraded (0.148%)
             12410/2950041 objects misplaced (0.421%)
             256 active+clean
             95  active+clean+remapped
             33  active+undersized+degraded

  io:
    client:   10836 kB/s wr, 0 op/s rd, 17 op/s wr

I've some issues with the kernel client (I/O error with no pattern, no log), and wonder if it could be related.

Thanks !


Files

log_mgr_telegeo02.txt (7.59 KB) log_mgr_telegeo02.txt Thomas De Maet, 06/05/2018 10:41 AM

Related issues 5 (0 open5 closed)

Related to CephFS - Bug #59318: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twiceResolvedPatrick Donnelly

Actions
Related to CephFS - Bug #63166: mon/MDSMonitor: metadata not loaded from PAXOS on updateResolvedMin Shi

Actions
Copied to CephFS - Backport #61691: quincy: mon failed to return metadata for mdsResolvedPatrick DonnellyActions
Copied to CephFS - Backport #61692: pacific: mon failed to return metadata for mdsResolvedPatrick DonnellyActions
Copied to CephFS - Backport #61693: reef: mon failed to return metadata for mdsResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF