Project

General

Profile

Actions

Bug #58878

open

mds: FAILED ceph_assert(trim_to > trimming_pos)

Added by Venky Shankar about 1 year ago. Updated 4 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One of the MDS crash with the following backtrace:

   -24> 2023-02-10T03:40:41.099+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2590 from mon.1
   -23> 2023-02-10T03:40:41.099+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16688
   -22> 2023-02-10T03:40:41.101+0000 7f00bc188700  5 mds.beacon.ocs-storagecluster-cephfilesystem-b received beacon reply up:active seq 785 rtt 0.612017
   -21> 2023-02-10T03:40:41.674+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
   -20> 2023-02-10T03:40:41.793+0000 7f00b8981700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 46677) from client.24866268
   -19> 2023-02-10T03:40:41.885+0000 7f00b797f700 10 monclient: tick
   -18> 2023-02-10T03:40:41.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:11.886692+0000)
   -17> 2023-02-10T03:40:42.068+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
   -16> 2023-02-10T03:40:42.068+0000 7f00b8981700 10 monclient: _renew_subs
   -15> 2023-02-10T03:40:42.068+0000 7f00b8981700 10 monclient: _send_mon_message to mon.d at v2:172.30.8.118:3300/0
   -14> 2023-02-10T03:40:42.108+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2591 from mon.1
   -13> 2023-02-10T03:40:42.108+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16689
   -12> 2023-02-10T03:40:42.674+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
   -11> 2023-02-10T03:40:42.885+0000 7f00b797f700 10 monclient: tick
   -10> 2023-02-10T03:40:42.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:12.886850+0000)
    -9> 2023-02-10T03:40:43.002+0000 7f00b8981700  1 mds.ocs-storagecluster-cephfilesystem-b Updating MDS map to version 2592 from mon.1
    -8> 2023-02-10T03:40:43.002+0000 7f00b8981700  4 mds.0.2313 set_osd_epoch_barrier: epoch=16689
    -7> 2023-02-10T03:40:43.675+0000 7f00b4979700  2 mds.0.cache Memory usage:  total 998012, rss 335560, heap 356620, baseline 332044, 2411 / 32894 inodes have caps, 2412 caps, 0.0733264 caps per inode
    -6> 2023-02-10T03:40:43.885+0000 7f00b797f700 10 monclient: tick
    -5> 2023-02-10T03:40:43.885+0000 7f00b797f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-10T03:40:13.886984+0000)
    -4> 2023-02-10T03:40:44.185+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
    -3> 2023-02-10T03:40:44.222+0000 7f00b8981700  4 mds.0.2313 apply_blocklist: killed 0 blocklisted sessions (0 blocklist entries, 2)
    -2> 2023-02-10T03:40:44.298+0000 7f00b2975700 -1 /builddir/build/BUILD/ceph-16.2.7/src/osdc/Journaler.cc: In function 'void Journaler::_trim()' thread 7f00b2975700 time 2023-02-10T03:40:44.298139+0000
/builddir/build/BUILD/ceph-16.2.7/src/osdc/Journaler.cc: 1358: FAILED ceph_assert(trim_to > trimming_pos)

 ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f00c119acfe]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7f00c119af18]
 3: (Journaler::_trim()+0x807) [0x55e95e8d0447]
 4: (Journaler::_finish_write_head(int, Journaler::Header&, C_OnFinisher*)+0x208) [0x55e95e8d0ab8]
 5: (Context::complete(int)+0xd) [0x55e95e56a57d]
 6: (Finisher::finisher_thread_entry()+0x1a5) [0x7f00c123c1e5]
 7: /lib64/libpthread.so.0(+0x81cf) [0x7f00c017e1cf]
 8: clone()

No recovery steps or cephfs recovery tools were performed/used on the file system. Logs are limited. Seems like a possible bug in mdlog.


Related issues 1 (1 open0 closed)

Related to CephFS - Feature #64101: tools/cephfs: toolify updating mdlog journal pointers to a sane valueNewMilind Changire

Actions
Actions

Also available in: Atom PDF