Project

General

Profile

Bug #24403

mon failed to return metadata for mds

Added by Thomas De Maet over 5 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
% Done:

100%

Source:
Community (user)
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Redigging an error found into the ceph-users mailing list: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/026241.html
From there, it seems to be related to a mds-mgr communication issue ?

I have the same issue with a small cluster: spamming log error messages like in the mailing list for the active mgr, and

telegeo02:~ # ceph --cluster geoceph mds metadata sen2agriprod
Error ENOENT: 
telegeo02:~ # ceph --cluster geoceph mds metadata 
[
    {
        "name": "sen2agriprod" 
    },
    {
        "name": "geo09" 
    },
    {
        "name": "telegeo02",
        "addr": "10.36.2.2:6800/737495544",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "telegeo02",
        "kernel_description": "#1 SMP Sat Apr 7 05:22:50 UTC 2018 (f24992c)",
        "kernel_version": "4.4.126-48-default",
        "mem_swap_kb": "2104316",
        "mem_total_kb": "131933332",
        "os": "Linux" 
    }
]

I'm using 2 mds servers and 1 backup:

  cluster:
    id:     c27607d1-9852-4aa2-b953-b5e3fa3845ea
    health: HEALTH_WARN
            12410/2950041 objects misplaced (0.421%)
            Degraded data redundancy: 4353/2950041 objects degraded (0.148%), 33 pgs degraded, 33 pgs undersized

  services:
    mon: 3 daemons, quorum telegeo02,geo09,sen2agriprod
    mgr: sen2agriprod(active), standbys: geo09, telegeo02
    mds: cephfs-2/2/2 up  {0=geo09=up:active,1=sen2agriprod=up:active}, 1 up:standby
    osd: 80 osds: 77 up, 77 in; 95 remapped pgs

  data:
    pools:   2 pools, 384 pgs
    objects: 134k objects, 1973 GB
    usage:   4733 GB used, 253 TB / 258 TB avail
    pgs:     4353/2950041 objects degraded (0.148%)
             12410/2950041 objects misplaced (0.421%)
             256 active+clean
             95  active+clean+remapped
             33  active+undersized+degraded

  io:
    client:   10836 kB/s wr, 0 op/s rd, 17 op/s wr

I've some issues with the kernel client (I/O error with no pattern, no log), and wonder if it could be related.

Thanks !

log_mgr_telegeo02.txt View (7.59 KB) Thomas De Maet, 06/05/2018 10:41 AM


Related issues

Related to CephFS - Bug #59318: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice Resolved
Related to CephFS - Bug #63166: mon/MDSMonitor: metadata not loaded from PAXOS on update Pending Backport
Copied to CephFS - Backport #61691: quincy: mon failed to return metadata for mds Resolved
Copied to CephFS - Backport #61692: pacific: mon failed to return metadata for mds Resolved
Copied to CephFS - Backport #61693: reef: mon failed to return metadata for mds Resolved

History

#1 Updated by Zheng Yan over 5 years ago

please try newer kernel

#2 Updated by Thomas De Maet over 5 years ago

The "sen2agriprod" server actually runs on centOS7 (kernel 3.10.0) which is in the recommended platforms.

If you think that I have to update I will give a try on the other servers (this is not the official opensuse way, but ok if mandatory)

Thanks

#3 Updated by Thomas De Maet over 5 years ago

I have updated first telegeo02 with no different result (as mds on telegeo02 was standby as last one rebooted)

Then I have updated geo09 and get a half-result:

telegeo02:~ # ceph --cluster geoceph mds metadata 
[
    {
        "name": "sen2agriprod" 
    },
    {
        "name": "telegeo02",
        "addr": "10.36.2.2:6800/2496890649",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "telegeo02",
        "kernel_description": "#1 SMP PREEMPT Mon Jun 4 08:26:32 UTC 2018 (bcb3422)",
        "kernel_version": "4.17.0-1.gbcb3422-default",
        "mem_swap_kb": "2104316",
        "mem_total_kb": "131925600",
        "os": "Linux" 
    },
    {
        "name": "geo09",
        "addr": "10.36.22.9:6800/1549940573",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "geo09",
        "kernel_description": "#1 SMP PREEMPT Mon Jun 4 08:26:32 UTC 2018 (bcb3422)",
        "kernel_version": "4.17.0-1.gbcb3422-default",
        "mem_swap_kb": "16779260",
        "mem_total_kb": "32907816",
        "os": "Linux" 
    }
]

I then stopped the mgr on sen2agriprod (CentOS) and the active mgr passed on telegeo02. I then stopped the mds on sen2agriprod, producing the attached log. The same error was still present until the mds of centos was stopped. To conclude, it seems that the kernel version is indeed the cause, linked to the mds deamon.

However CentOS7 with kernel 3.10 is still not working. Maybe the doc should be modified ?

The ticket can be closed, but some hints should be added in the doc to avoid this sort of issue.

Thanks !

#4 Updated by Sebastian Wagner over 4 years ago

this ticket is now rather old. do you mind, if I just close it?

#5 Updated by Min Shi over 4 years ago

Do you restart the mds on sen2agriprod? Or just you restart all mds? We have the similar case, loosing all the mds's metadata(use the command `ceph mds metadata` and get just name pair), Then we restart all the mds, the cluster work's again and the command `ceph mds metaddata` return the correct metadata. Before the case, I just enable the prometheus module, which result mgr respam, and mgr ask mon to give its mds metadata, which failing. That's why command 'ceph fs status' failed, it doesn`t have mds metadata. I guess before I enable prometheus module, the cluster already have this problem.

#6 Updated by Venky Shankar over 1 year ago

  • Project changed from mgr to CephFS
  • Category set to Administration/Usability
  • Target version set to v18.0.0
  • Backport set to pacific,quincy

This was seen in pacific installation. MDS entries in FSMap are fine - that serves `fs dump` and `fs status` commands, so these command return accurate details. OTOH, `mds metadata` cli fetches from the monitor store (monstore) and that seems to be missing the required metadata for a particular MDS (the same happens when ceph-mgr invokes `mds metadata` command to the monitors).

The MDS as part of its beacon message to the monitor includes its metadata. This is done only when the MDS state is STATE_BOOT. The monitors persist this metadata (from the beacon) to its monstore. One explanation for the metadata to be missing in the monstore, is that the monitor should have missed the beacon (n/w interruption, etc..) from the MDS when the MDS was in STATE_BOOT. However, the monitors would ask an MDS to transition to the next state (standby, etc..) only when it sees and MDS boot state, so, not sure how the metadata went missing.

#7 Updated by Venky Shankar over 1 year ago

FYI - restarting the MDS fixes the issue.

#8 Updated by Venky Shankar over 1 year ago

  • Component(FS) MDS added

#9 Updated by Venky Shankar about 1 year ago

  • Assignee set to Venky Shankar
  • Severity changed from 4 - irritation to 3 - minor
  • Component(FS) MDSMonitor added

#10 Updated by Venky Shankar about 1 year ago

It seems the MDS can miss sending beacon in up:boot state. This state encodes the MDS metadata and includes that in the beacon message to the monitor. In MDSDaemon::handle_mds_map():

  if (whoami == MDS_RANK_NONE) {
    // We do not hold a rank:                                                                                                                                                                                                                                                                                               
    dout(10) <<  __func__ << ": handling map in rankless mode" << dendl;

    if (new_state == DS::STATE_STANDBY) {
      /* Note: STATE_BOOT is never an actual state in the FSMap. The Monitors                                                                                                                                                                                                                                               
       * generally mark a new MDS as STANDBY (although it's possible to                                                                                                                                                                                                                                                     
       * immediately be assigned a rank).                                                                                                                                                                                                                                                                                   
       */
      if (old_state == DS::STATE_NULL) {
        dout(1) << "Monitors have assigned me to become a standby." << dendl;
        beacon.set_want_state(*mdsmap, new_state);
      } else if (old_state == DS::STATE_STANDBY) {
        dout(5) << "I am still standby" << dendl;
      }

The MDS can switch from STATE_NULL to STATE_STANDBY from the above code block. When this happens the MDS doesn't go through STATE_BOOT (up:boot). What's unknown is under what circumstances this path can be hit.

#11 Updated by Patrick Donnelly about 1 year ago

Venky Shankar wrote:

It seems the MDS can miss sending beacon in up:boot state. This state encodes the MDS metadata and includes that in the beacon message to the monitor. In MDSDaemon::handle_mds_map():

[...]

The MDS can switch from STATE_NULL to STATE_STANDBY from the above code block. When this happens the MDS doesn't go through STATE_BOOT (up:boot). What's unknown is under what circumstances this path can be hit.

STATE_NULL comes from the previous mdsmap ("oldmap" in the code). That is, the MDS is not in the map at all. Once the mons receive the first beacon with want_state STATE_BOOT, the mds is added as a standby (or immediately promoted if a rank is available). At no point is the MDS in the mdsmap with state STATE_BOOT.

So, I don't think that's the cause of the bug here.

#12 Updated by Greg Farnum about 1 year ago

The MDS is identified using a nonce as well as an IP in the map, right? After the containerized OSDs managed to clobber their bluestores, I wonder about multiple MDS instances running in close enough proximity to confuse things, but I don't think that could break this...

#13 Updated by Venky Shankar about 1 year ago

Was discussion about this tracker with Patrick - there are separate paxos proposals for fsmap update and the metadata update. If the metadata update fails the MDS still transitions to the next state. Theoretical case atm. Requires test with some paxos failure injection to conclude.

#14 Updated by Patrick Donnelly 11 months ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Venky Shankar to Patrick Donnelly
  • Target version changed from v18.0.0 to v19.0.0
  • Source set to Community (user)
  • Backport changed from pacific,quincy to reef,quincy,pacific
  • Pull request ID set to 50862

#15 Updated by Patrick Donnelly 11 months ago

  • Related to Bug #59318: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice added

#16 Updated by Patrick Donnelly 11 months ago

#59318 may also be related somehow but I'm not sure.

#17 Updated by Venky Shankar 8 months ago

  • Status changed from Fix Under Review to Pending Backport

#18 Updated by Backport Bot 8 months ago

  • Copied to Backport #61691: quincy: mon failed to return metadata for mds added

#19 Updated by Backport Bot 8 months ago

  • Copied to Backport #61692: pacific: mon failed to return metadata for mds added

#20 Updated by Backport Bot 8 months ago

  • Copied to Backport #61693: reef: mon failed to return metadata for mds added

#21 Updated by Backport Bot 8 months ago

  • Tags set to backport_processed

#22 Updated by Min Shi 5 months ago

There is a another case that may cause mds metadata lost.

The root cause is that peon's pending_metadata (in memory) may be inconsistent with mon's db (in disk). When a peon turns into leader, and at the same time a standby mds stops, the new leader may flush wrong mds metadata into db.

It can be reproduced like this:

A Cluster with 3 mon and 3 mds (one active, other two standby), 6 osd.
step 1. stop two standby mds;
step 2. restart all mon; (make pending_medata consistent with db)
step 3. start other two mds
step 4. stop leader mon
step 5. run "ceph mds metadata" command to check mds metadata
step 6. stop one standby mds
step 7. run "ceph mds metadata" command to check mds metadata again

#23 Updated by Patrick Donnelly 5 months ago

  • Related to Bug #63166: mon/MDSMonitor: metadata not loaded from PAXOS on update added

#24 Updated by Konstantin Shalygin 3 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF