Bug #58878: mds: FAILED ceph_assert(trim_to > trimming_pos) - CephFS - Ceph

Actions

Copy link

Bug #58878

open

mds: FAILED ceph_assert(trim_to > trimming_pos)

Added by Venky Shankar about 1 year ago. Updated 3 months ago.

Status:

New

Priority:

Normal

Assignee:

Venky Shankar

Category:

Correctness/Safety

Target version:

Ceph - v19.0.0

% Done:

Source:

Tags:

Backport:

reef,quincy,pacific

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

crash

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

One of the MDS crash with the following backtrace:

   -24> 2023-02-10T03:40:41.099+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2590 from mon.1
   -23> 2023-02-10T03:40:41.099+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16688
   -22> 2023-02-10T03:40:41.101+0000 7f00bc188700  5 mds.beacon.ocs-storagecluster-cephfilesystem-b received beacon reply up:active seq 785 rtt 0.612017
   -21> 2023-02-10T03:40:41.674+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
   -20> 2023-02-10T03:40:41.793+0000 7f00b8981700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 46677) from client.24866268
   -19> 2023-02-10T03:40:41.885+0000 7f00b797f700 10 monclient: tick
   -18> 2023-02-10T03:40:41.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:11.886692+0000)
   -17> 2023-02-10T03:40:42.068+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
   -16> 2023-02-10T03:40:42.068+0000 7f00b8981700 10 monclient: _renew_subs
   -15> 2023-02-10T03:40:42.068+0000 7f00b8981700 10 monclient: _send_mon_message to mon.d at v2:172.30.8.118:3300/0
   -14> 2023-02-10T03:40:42.108+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2591 from mon.1
   -13> 2023-02-10T03:40:42.108+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16689
   -12> 2023-02-10T03:40:42.674+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
   -11> 2023-02-10T03:40:42.885+0000 7f00b797f700 10 monclient: tick
   -10> 2023-02-10T03:40:42.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:12.886850+0000)
    -9> 2023-02-10T03:40:43.002+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2592 from mon.1
    -8> 2023-02-10T03:40:43.002+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16689
    -7> 2023-02-10T03:40:43.675+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
    -6> 2023-02-10T03:40:43.885+0000 7f00b797f700 10 monclient: tick
    -5> 2023-02-10T03:40:43.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:13.886984+0000)
    -4> 2023-02-10T03:40:44.185+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
    -3> 2023-02-10T03:40:44.222+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
    -2> 2023-02-10T03:40:44.298+0000 7f00b2975700 -1 /builddir/build/BUILD/ceph-16.2.7/src/osdc/Journaler.cc: In function 'void Journaler::_trim()' thread 7f00b2975700 time 2023-02-10T03:40:44.298139+0000
/builddir/build/BUILD/ceph-16.2.7/src/osdc/Journaler.cc: 1358: FAILED ceph_assert(trim_to > trimming_pos)

 ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f00c119acfe]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7f00c119af18]
 3: (Journaler::_trim()+0x807) [0x55e95e8d0447]
 4: (Journaler::_finish_write_head(int, Journaler::Header&, C_OnFinisher*)+0x208) [0x55e95e8d0ab8]
 5: (Context::complete(int)+0xd) [0x55e95e56a57d]
 6: (Finisher::finisher_thread_entry()+0x1a5) [0x7f00c123c1e5]
 7: /lib64/libpthread.so.0(+0x81cf) [0x7f00c017e1cf]
 8: clone()

No recovery steps or cephfs recovery tools were performed/used on the file system. Logs are limited. Seems like a possible bug in mdlog.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Venky Shankar about 1 year ago

Status changed from New to Can't reproduce

This was suspected due to various metadata inconsistencies which probably surfaced due to destructive tools being run inappropriately.

Please reopen if this is seen again.

Actions

Copy link

Updated by Jos Collin 7 months ago

Status changed from Can't reproduce to New
Target version deleted (~~v18.0.0~~)

Actions

Copy link

Updated by Venky Shankar 7 months ago

Target version set to v19.0.0
Backport changed from pacific,quincy to reef,quincy,pacific

Reopned for investigating a possible bug in the MDS that causes bogus values to be persisted in the journal header.

Actions

Copy link

Updated by Venky Shankar 6 months ago

Greg pointed out that this tracker lacks details of why it got reopened, so here are some of the details that happened in various other channels (slack, etc..):

I suspect Journaler::write_head() being invoked concurrently, and the journal positions getting mangled based on which RADOS call returns first. If you see Journaler::_finish_write_head():

void Journaler::_finish_write_head(int r, Header &wrote,
                                   C_OnFinisher *oncommit)
{
  lock_guard l(lock);

  if (r < 0) {
    lderr(cct) << "_finish_write_head got " << cpp_strerror(r) << dendl;
    handle_write_error(r);
    return;
  }
  ceph_assert(!readonly);
  ldout(cct, 10) << "_finish_write_head " << wrote << dendl;
  last_committed = wrote;
  if (oncommit) {
    oncommit->complete(r);
  }

  _trim();  // trim?
}

Maybe just to be safe and catch this particular case (out-of-order reply), _finish_write_head() can (derr) log when wrote < last_committed.

Greg mentioned that the journal ordering has been audited a few times in the past especially for cases this this to be certain that its not broken, however, its possible that some rules have been broken when it got moved to its own thread.

Also, just FYI - The case which prompted reopening this tracker had Patrick suggesting fixing up the journal positions manually, but that's a bit tricky since the read position should point to the SUBTREEMAP event rather than the beginning of the object. We should have tools to do this rather than doing it manually.

Actions

Copy link