Project

General

Profile

Actions

Bug #57059

open

ceph mds dump tree - root inode is not in cache

Added by Frank Schilder over 1 year ago. Updated 7 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Observed on octopus 15.2.16. and probably affecting any newer version.

It is not possible to dump stray buckets in MDS cache; ceph-user thread: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/G63US2VE3AQMSFCNO5TGO7NTBO57HDUC/

Part 1:

In file https://github.com/ceph/ceph/blob/main/src/mds/MDSRank.cc:

  3111    void MDSRank::command_dump_tree(const cmdmap_t &cmdmap, std::ostream &ss, Formatter *f) 
  3112    {
  3113      std::string root;
  3114      int64_t depth;
  3115      cmd_getval(cmdmap, "root", root);
  3116      if (root.empty()) {
  3117        root = "/";
  3118      }
  3119      if (!cmd_getval(cmdmap, "depth", depth))
  3120        depth = -1;
  3121      std::lock_guard l(mds_lock);
  3122      CInode *in = mdcache->cache_traverse(filepath(root.c_str()));
  3123      if (!in) {
  3124        ss << "root inode is not in cache";
  3125        return;
  3126      }
  3127      f->open_array_section("inodes");
  3128      mdcache->dump_tree(in, 0, depth, f);
  3129      f->close_section();
  3130    }

the error message in line 3124 is both, misleading and unhelpful. It should be changed to something like

    ss << "inode for path '" << filepath(root.c_str()) << "' is not in cache";

to give an indication of what the command actually tries to find, which often is not the root inode.

Part 2

Trying to dump a tree under a stray bucket fails for an unknown reason. Dumping any tree under "/" succeeds:

This command works:

[root@rit-tceph ~]# ceph tell mds.0 dump tree '/' | jq ".[] | .dirfrags |.[] | .path" 
2022-08-07T17:25:34.430+0200 7fbcfbfff700  0 client.439291 ms_handle_reset on v2:10.41.24.14:6812/3943985176
2022-08-07T17:25:34.473+0200 7fbd017fa700  0 client.456018 ms_handle_reset on v2:10.41.24.14:6812/3943985176
"/data/blobs" 
"/data" 
"" 

However, this does not:

[root@rit-tceph ~]# ceph tell mds.0 dump tree '~mds0/stray0' | jq ".[] | .dirfrags |.[] | .path" 
2022-08-07T17:27:16.623+0200 7fb294ff9700  0 client.439345 ms_handle_reset on v2:10.41.24.14:6812/3943985176
2022-08-07T17:27:16.665+0200 7fb295ffb700  0 client.456072 ms_handle_reset on v2:10.41.24.14:6812/3943985176
root inode is not in cache

The dir "~mds0/stray0" is in cache though (see the dump in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/PDJWTUAL2GRM7DYVBQT6BQLOJGOFIE4O/).

For the problem with dumping the stray buckets, the modification requested in Part 1 would help a lot with debugging. It is possible that the "~" symbol is interpreted by the command parser or its something as simple as a UTF-whatever conversion that goes wrong. Knowing the exact contents of filepath(root.c_str()) at the time of failure would almost certainly give a good lead.

Part 3

Whenever executing a "ceph tell mds. ..." command, these really annoying messages show up:

2022-08-07T17:27:16.623+0200 7fb294ff9700  0 client.439345 ms_handle_reset on v2:10.41.24.14:6812/3943985176
2022-08-07T17:27:16.665+0200 7fb295ffb700  0 client.456072 ms_handle_reset on v2:10.41.24.14:6812/3943985176

Are they indicating a problem? If not, would it be possible to stop dumping these to the root console?

Actions #1

Updated by Laura Flores over 1 year ago

  • Translation missing: en.field_tag_list set to low-hanging-fruit
Actions #2

Updated by Laura Flores 10 months ago

  • Translation missing: en.field_tag_list changed from low-hanging-fruit to low-hanging-fruit, open-source-day
Actions #3

Updated by Laura Flores 8 months ago

  • Translation missing: en.field_tag_list changed from low-hanging-fruit, open-source-day to low-hanging-fruit
Actions #4

Updated by Laura Flores 7 months ago

Part 1 might be a good piece for beginners to work on.

Claiming for Grace Hopper Open Source Day.

Actions #5

Updated by Yaarit Hatuka 7 months ago

  • Pull request ID set to 53611
Actions #6

Updated by Laura Flores 7 months ago

  • Status changed from New to Fix Under Review
Actions

Also available in: Atom PDF