Bug #24403: mon failed to return metadata for mds - CephFS - Ceph

Actions

Copy link

Bug #24403

closed

mon failed to return metadata for mds

Added by Thomas De Maet almost 6 years ago. Updated 5 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Administration/Usability

Target version:

Ceph - v19.0.0

% Done:

100%

Source:

Community (user)

Tags:

backport_processed

Backport:

reef,quincy,pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.2.5

ceph-qa-suite:

ceph-ansible

Component(FS):

MDS, MDSMonitor

Labels (FS):

Pull request ID:

50862

Crash signature (v1):

Crash signature (v2):

Description

Hello,

Redigging an error found into the ceph-users mailing list: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/026241.html
From there, it seems to be related to a mds-mgr communication issue ?

I have the same issue with a small cluster: spamming log error messages like in the mailing list for the active mgr, and

telegeo02:~ # ceph --cluster geoceph mds metadata sen2agriprod
Error ENOENT: 
telegeo02:~ # ceph --cluster geoceph mds metadata 
[
    {
        "name": "sen2agriprod" 
    },
    {
        "name": "geo09" 
    },
    {
        "name": "telegeo02",
        "addr": "10.36.2.2:6800/737495544",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "telegeo02",
        "kernel_description": "#1 SMP Sat Apr 7 05:22:50 UTC 2018 (f24992c)",
        "kernel_version": "4.4.126-48-default",
        "mem_swap_kb": "2104316",
        "mem_total_kb": "131933332",
        "os": "Linux" 
    }
]

I'm using 2 mds servers and 1 backup:

  cluster:
    id:     c27607d1-9852-4aa2-b953-b5e3fa3845ea
    health: HEALTH_WARN
            12410/2950041 objects misplaced (0.421%)
            Degraded data redundancy: 4353/2950041 objects degraded (0.148%), 33 pgs degraded, 33 pgs undersized

  services:
    mon: 3 daemons, quorum telegeo02,geo09,sen2agriprod
    mgr: sen2agriprod(active), standbys: geo09, telegeo02
    mds: cephfs-2/2/2 up  {0=geo09=up:active,1=sen2agriprod=up:active}, 1 up:standby
    osd: 80 osds: 77 up, 77 in; 95 remapped pgs

  data:
    pools:   2 pools, 384 pgs
    objects: 134k objects, 1973 GB
    usage:   4733 GB used, 253 TB / 258 TB avail
    pgs:     4353/2950041 objects degraded (0.148%)
             12410/2950041 objects misplaced (0.421%)
             256 active+clean
             95  active+clean+remapped
             33  active+undersized+degraded

  io:
    client:   10836 kB/s wr, 0 op/s rd, 17 op/s wr

I've some issues with the kernel client (I/O error with no pattern, no log), and wonder if it could be related.

Thanks !

Files

log_mgr_telegeo02.txt (7.59 KB) log_mgr_telegeo02.txt

Thomas De Maet, 06/05/2018 10:41 AM

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Updated by Zheng Yan almost 6 years ago

please try newer kernel

Actions

Copy link

Updated by Thomas De Maet almost 6 years ago

The "sen2agriprod" server actually runs on centOS7 (kernel 3.10.0) which is in the recommended platforms.

If you think that I have to update I will give a try on the other servers (this is not the official opensuse way, but ok if mandatory)

Thanks

Actions

Copy link

Updated by Thomas De Maet almost 6 years ago

File log_mgr_telegeo02.txt log_mgr_telegeo02.txt added

I have updated first telegeo02 with no different result (as mds on telegeo02 was standby as last one rebooted)

Then I have updated geo09 and get a half-result:

telegeo02:~ # ceph --cluster geoceph mds metadata 
[
    {
        "name": "sen2agriprod" 
    },
    {
        "name": "telegeo02",
        "addr": "10.36.2.2:6800/2496890649",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "telegeo02",
        "kernel_description": "#1 SMP PREEMPT Mon Jun 4 08:26:32 UTC 2018 (bcb3422)",
        "kernel_version": "4.17.0-1.gbcb3422-default",
        "mem_swap_kb": "2104316",
        "mem_total_kb": "131925600",
        "os": "Linux" 
    },
    {
        "name": "geo09",
        "addr": "10.36.22.9:6800/1549940573",
        "arch": "x86_64",
        "ceph_version": "ceph version 12.2.5-407-g5e7ea8cf03 (5e7ea8cf03603e1dc8937665b599f6a8fcb0213e) luminous (stable)",
        "cpu": "Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz",
        "distro": "opensuse",
        "distro_description": "openSUSE Leap 42.3",
        "distro_version": "42.3",
        "hostname": "geo09",
        "kernel_description": "#1 SMP PREEMPT Mon Jun 4 08:26:32 UTC 2018 (bcb3422)",
        "kernel_version": "4.17.0-1.gbcb3422-default",
        "mem_swap_kb": "16779260",
        "mem_total_kb": "32907816",
        "os": "Linux" 
    }
]

I then stopped the mgr on sen2agriprod (CentOS) and the active mgr passed on telegeo02. I then stopped the mds on sen2agriprod, producing the attached log. The same error was still present until the mds of centos was stopped. To conclude, it seems that the kernel version is indeed the cause, linked to the mds deamon.

However CentOS7 with kernel 3.10 is still not working. Maybe the doc should be modified ?

The ticket can be closed, but some hints should be added in the doc to avoid this sort of issue.

Thanks !

Actions

Copy link

Updated by Sebastian Wagner over 4 years ago

this ticket is now rather old. do you mind, if I just close it?

Actions

Copy link

Updated by Min Shi over 4 years ago

Do you restart the mds on sen2agriprod? Or just you restart all mds? We have the similar case, loosing all the mds's metadata(use the command `ceph mds metadata` and get just name pair), Then we restart all the mds, the cluster work's again and the command `ceph mds metaddata` return the correct metadata. Before the case, I just enable the prometheus module, which result mgr respam, and mgr ask mon to give its mds metadata, which failing. That's why command 'ceph fs status' failed, it doesn`t have mds metadata. I guess before I enable prometheus module, the cluster already have this problem.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Project changed from mgr to CephFS
Category set to Administration/Usability
Target version set to v18.0.0
Backport set to pacific,quincy

This was seen in pacific installation. MDS entries in FSMap are fine - that serves `fs dump` and `fs status` commands, so these command return accurate details. OTOH, `mds metadata` cli fetches from the monitor store (monstore) and that seems to be missing the required metadata for a particular MDS (the same happens when ceph-mgr invokes `mds metadata` command to the monitors).

The MDS as part of its beacon message to the monitor includes its metadata. This is done only when the MDS state is STATE_BOOT. The monitors persist this metadata (from the beacon) to its monstore. One explanation for the metadata to be missing in the monstore, is that the monitor should have missed the beacon (n/w interruption, etc..) from the MDS when the MDS was in STATE_BOOT. However, the monitors would ask an MDS to transition to the next state (standby, etc..) only when it sees and MDS boot state, so, not sure how the metadata went missing.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

FYI - restarting the MDS fixes the issue.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Component(FS) MDS added

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Assignee set to Venky Shankar
Severity changed from 4 - irritation to 3 - minor
Component(FS) MDSMonitor added

Actions

Copy link

#10

Updated by Venky Shankar over 1 year ago

It seems the MDS can miss sending beacon in up:boot state. This state encodes the MDS metadata and includes that in the beacon message to the monitor. In MDSDaemon::handle_mds_map():

  if (whoami == MDS_RANK_NONE) {
    // We do not hold a rank:                                                                                                                                                                                                                                                                                               
    dout(10) <<  __func__ << ": handling map in rankless mode" << dendl;

    if (new_state == DS::STATE_STANDBY) {
      /* Note: STATE_BOOT is never an actual state in the FSMap. The Monitors                                                                                                                                                                                                                                               
       * generally mark a new MDS as STANDBY (although it's possible to                                                                                                                                                                                                                                                     
       * immediately be assigned a rank).                                                                                                                                                                                                                                                                                   
       */
      if (old_state == DS::STATE_NULL) {
        dout(1) << "Monitors have assigned me to become a standby." << dendl;
        beacon.set_want_state(*mdsmap, new_state);
      } else if (old_state == DS::STATE_STANDBY) {
        dout(5) << "I am still standby" << dendl;
      }

The MDS can switch from STATE_NULL to STATE_STANDBY from the above code block. When this happens the MDS doesn't go through STATE_BOOT (up:boot). What's unknown is under what circumstances this path can be hit.

Actions

Copy link

#11

Updated by Patrick Donnelly over 1 year ago

Venky Shankar wrote:

It seems the MDS can miss sending beacon in up:boot state. This state encodes the MDS metadata and includes that in the beacon message to the monitor. In MDSDaemon::handle_mds_map():

[...]

The MDS can switch from STATE_NULL to STATE_STANDBY from the above code block. When this happens the MDS doesn't go through STATE_BOOT (up:boot). What's unknown is under what circumstances this path can be hit.

STATE_NULL comes from the previous mdsmap ("oldmap" in the code). That is, the MDS is not in the map at all. Once the mons receive the first beacon with want_state STATE_BOOT, the mds is added as a standby (or immediately promoted if a rank is available). At no point is the MDS in the mdsmap with state STATE_BOOT.

So, I don't think that's the cause of the bug here.

Actions

Copy link

#12

Updated by Greg Farnum over 1 year ago

The MDS is identified using a nonce as well as an IP in the map, right? After the containerized OSDs managed to clobber their bluestores, I wonder about multiple MDS instances running in close enough proximity to confuse things, but I don't think that could break this...

Actions

Copy link

#13

Updated by Venky Shankar over 1 year ago

Was discussion about this tracker with Patrick - there are separate paxos proposals for fsmap update and the metadata update. If the metadata update fails the MDS still transitions to the next state. Theoretical case atm. Requires test with some paxos failure injection to conclude.

Actions

Copy link

#14

Updated by Patrick Donnelly about 1 year ago

Status changed from New to Fix Under Review
Assignee changed from Venky Shankar to Patrick Donnelly
Target version changed from v18.0.0 to v19.0.0
Source set to Community (user)
Backport changed from pacific,quincy to reef,quincy,pacific
Pull request ID set to 50862

Actions

Copy link

#15

Updated by Patrick Donnelly about 1 year ago

Related to Bug #59318: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice added

Actions

Copy link

#16

Updated by Patrick Donnelly about 1 year ago

#59318 may also be related somehow but I'm not sure.

Actions

Copy link

#17

Updated by Venky Shankar 10 months ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#18

Updated by Backport Bot 10 months ago

Copied to Backport #61691: quincy: mon failed to return metadata for mds added

Actions

Copy link

#19

Updated by Backport Bot 10 months ago

Copied to Backport #61692: pacific: mon failed to return metadata for mds added

Actions

Copy link

#20

Updated by Backport Bot 10 months ago

Copied to Backport #61693: reef: mon failed to return metadata for mds added

Actions

Copy link

#21

Updated by Backport Bot 10 months ago

Tags set to backport_processed

Actions

Copy link

#22

Updated by Min Shi 6 months ago

There is a another case that may cause mds metadata lost.

The root cause is that peon's pending_metadata (in memory) may be inconsistent with mon's db (in disk). When a peon turns into leader, and at the same time a standby mds stops, the new leader may flush wrong mds metadata into db.

It can be reproduced like this:

A Cluster with 3 mon and 3 mds (one active, other two standby), 6 osd.
    step 1. stop two standby mds;
    step 2. restart all mon; (make pending_medata consistent with db)
    step 3. start other two mds
    step 4. stop leader mon
    step 5. run "ceph mds metadata" command to check mds metadata
    step 6. stop one standby mds
    step 7. run "ceph mds metadata" command to check mds metadata again

Actions

Copy link

#23

Updated by Patrick Donnelly 6 months ago

Related to Bug #63166: mon/MDSMonitor: metadata not loaded from PAXOS on update added

Actions

Copy link

#24

Updated by Konstantin Shalygin 5 months ago

Status changed from Pending Backport to Resolved
% Done changed from 0 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #24403

mon failed to return metadata for mds

Updated by Zheng Yan almost 6 years ago

Updated by Thomas De Maet almost 6 years ago

Updated by Thomas De Maet almost 6 years ago

Updated by Sebastian Wagner over 4 years ago

Updated by Min Shi over 4 years ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Patrick Donnelly over 1 year ago

Updated by Greg Farnum over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Patrick Donnelly about 1 year ago

Updated by Patrick Donnelly about 1 year ago

Updated by Patrick Donnelly about 1 year ago

Updated by Venky Shankar 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Min Shi 6 months ago

Updated by Patrick Donnelly 6 months ago

Updated by Konstantin Shalygin 5 months ago