Project

General

Profile

Actions

Bug #54271

open

mds/OpenFileTable.cc: 777: FAILED ceph_assert(omap_num_objs == num_objs)

Added by Patrick Donnelly about 2 years ago. Updated 7 months ago.

Status:
Triaged
Priority:
Low
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NPRA7OGSNQUI32WPZQMKH3SJNPPOSBRK/?sort=date

MDS Logs
---

2020-05-01 10:07:35.559 7eff10cc3700  1 mds.prdceph01  9: 'ceph'
2020-05-01 10:07:35.559 7eff10cc3700  1 mds.prdceph01 respawning with exe
/usr/bin/ceph-mds
2020-05-01 10:07:35.559 7eff10cc3700  1 mds.prdceph01  exe_path
/proc/self/exe
2020-05-01 10:07:50.785 7fbff66291c0  0 ceph version 14.2.9
(581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process
ceph-mds, pid 9710
2020-05-01 10:07:50.787 7fbff66291c0  0 pidfile_write: ignore empty
--pid-file
2020-05-01 10:07:50.817 7fbfe4408700  1 mds.prdceph01 Updating MDS map to
version 1487238 from mon.2
2020-05-01 10:07:55.820 7fbfe4408700  1 mds.prdceph01 Updating MDS map to
version 1487239 from mon.2
2020-05-01 10:07:55.820 7fbfe4408700  1 mds.prdceph01 Map has assigned me
to become a standby
2020-05-01 10:11:07.369 7fbfe4408700  1 mds.prdceph01 Updating MDS map to
version 1487282 from mon.2
2020-05-01 10:11:07.373 7fbfe4408700  1 mds.0.1487282 handle_mds_map i am
now mds.0.1487282
2020-05-01 10:11:07.373 7fbfe4408700  1 mds.0.1487282 handle_mds_map state
change up:boot --> up:replay
2020-05-01 10:11:07.373 7fbfe4408700  1 mds.0.1487282 replay_start
2020-05-01 10:11:07.374 7fbfe4408700  1 mds.0.1487282  recovery set is
2020-05-01 10:11:07.374 7fbfe4408700  1 mds.0.1487282  waiting for osdmap
2198096 (which blacklists prior instance)
2020-05-01 10:11:09.517 7fbfdd3fa700  0 mds.0.cache creating system inode
with ino:0x100
2020-05-01 10:11:09.518 7fbfdd3fa700  0 mds.0.cache creating system inode
with ino:0x1
2020-05-01 10:11:09.753 7fbfddbfb700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
In function 'void OpenFileTable::_load_finish(int, int, int, unsigned int,
bool, bool, ceph::bufferlist&, std::map<std::basic_string<char>,
ceph::buffer::v14_2_0::list>&)' thread 7fbfddbfb700 time 2020-05-01
10:11:09.752945
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
777: FAILED ceph_assert(omap_num_objs == num_objs)

 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x7fbfed691875]
 2: (()+0x253a3d) [0x7fbfed691a3d]
 3: (OpenFileTable::_load_finish(int, int, int, unsigned int, bool, bool,
ceph::buffer::v14_2_0::list&, std::map<std::string,
ceph::buffer::v14_2_0::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::v14_2_0::list> >
...
 &)+0x1bb0) [0x563c26b1e8b0] 
 4:
(C_IO_OFT_Load::finish(int)+0x3b) [0x563c26b2467b]
 5: (MDSContext::complete(int)+0x74) [0x563c26afbce4]
 6: (MDSIOContextBase::complete(int)+0x177) [0x563c26afbf47]
 7: (Finisher::finisher_thread_entry()+0x16f) [0x7fbfed71a44f]
 8: (()+0x7e65) [0x7fbfeb551e65]
 9: (clone()+0x6d) [0x7fbfea1ff88d]


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #54253: Avoid OOM exceeding 10x MDS cache limit on restart after many files were openedNew

Actions
Actions #1

Updated by Venky Shankar about 2 years ago

This seems to be happening in nautilus. Should check if it can be hit in master (quincy, etc.).

Actions #2

Updated by Venky Shankar about 2 years ago

Just FYI - the workaround is to remove the mds<>_openfiles objects from the metadata pool.

Actions #3

Updated by Xiubo Li about 2 years ago

  • Related to Bug #54253: Avoid OOM exceeding 10x MDS cache limit on restart after many files were opened added
Actions #4

Updated by Venky Shankar about 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
Actions #5

Updated by Kotresh Hiremath Ravishankar over 1 year ago

  • Description updated (diff)
Actions #6

Updated by Kotresh Hiremath Ravishankar over 1 year ago

  • Priority changed from High to Low

Lowering the priority as this is seen only in nautilus and not seen in supported versions.

Actions #7

Updated by Kotresh Hiremath Ravishankar over 1 year ago

We will wait for this to happen in recent versions.

Actions #8

Updated by Patrick Donnelly 7 months ago

  • Target version deleted (v18.0.0)
Actions

Also available in: Atom PDF