Project

General

Profile

Actions

Bug #53597

closed

mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())

Added by 玮文 胡 over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

# ceph crash info 2021-12-13T17:07:59.644235Z_674d6c2a-ec54-4bf3-a040-2a53ba7f93fe
{
    "assert_condition": "dir->get_projected_version() == dir->get_version()",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Migrator.cc",
    "assert_func": "void Migrator::encode_export_dir(ceph::bufferlist&, CDir*, std::map<client_t, entity_inst_t>&, std::map<client_t, client_metadata_t>&, uint64_t&)",
    "assert_line": 1753,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Migrator.cc: In function 'void Migrator::encode_export_dir(ceph::bufferlist&, CDir*, std::map<client_t, entity_inst_t>&, std::map<client_t, client_metadata_t>&, uint64_t&)' thread 7f31ea44e700 time 2021-12-13T17:07:59.638997+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Migrator.cc: 1753: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())\n",
    "assert_thread_name": "MR_Finisher",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f31f7e6fb20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f31f8e7ed1f]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276ee8) [0x7f31f8e7eee8]",
        "(Migrator::encode_export_dir(ceph::buffer::v15_2_0::list&, CDir*, std::map<client_t, entity_inst_t, std::less<client_t>, std::allocator<std::pair<client_t const, entity_inst_t> > >&, std::map<client_t, client_metadata_t, std::less<client_t>, std::allocator<std::pair<client_t const, client_metadata_t> > >&, unsigned long&)+0xbce) [0x55bca1101f4e]",
        "(Migrator::export_go_synced(CDir*, unsigned long)+0x52d) [0x55bca11024dd]",
        "(C_M_ExportGo::finish(int)+0x19) [0x55bca1128689]",
        "(MDSContext::complete(int)+0x56) [0x55bca1209906]",
        "(C_IO_Wrapper::finish(int)+0x12) [0x55bca120a622]",
        "(MDSContext::complete(int)+0x56) [0x55bca1209906]",
        "(MDSIOContextBase::complete(int)+0x5ac) [0x55bca120a13c]",
        "(C_IO_Wrapper::complete(int)+0x12d) [0x55bca120a56d]",
        "(Finisher::finisher_thread_entry()+0x1a5) [0x7f31f8f1e6d5]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f31f7e6514a]",
        "clone()" 
    ],
    "ceph_version": "16.2.6",
    "crash_id": "2021-12-13T17:07:59.644235Z_674d6c2a-ec54-4bf3-a040-2a53ba7f93fe",
    "entity_name": "mds.cephfs.gpu006.ddpekw",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "3f92007b85dc9e8d2220c46c5d3cfa748d5a1e634cd303ccd3e2bfc96ce02b3f",
    "timestamp": "2021-12-13T17:07:59.644235Z",
    "utsname_hostname": "gpu006",
    "utsname_machine": "x86_64",
    "utsname_release": "5.8.0-55-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#62~20.04.1-Ubuntu SMP Wed Jun 2 08:55:04 UTC 2021" 
}

This happens when we are trying to upgrade from 16.2.6 to .7 with cephadm, while reducing max_mds to 1, rank 1 repeatedly crash with this stack trace. And we cannot proceed.

Now we have paused the upgrade and reset max_mds to 2. And it at least stops crashing loop.


Files

mds.log.gz (334 KB) mds.log.gz logs of crash after setting debug_mds to 1/20 玮文 胡, 12/14/2021 01:57 AM

Related issues 3 (1 open2 closed)

Related to CephFS - Bug #62381: mds: Bug still exists: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())In ProgressVenky Shankar

Actions
Copied to CephFS - Backport #55928: quincy: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())ResolvedXiubo LiActions
Copied to CephFS - Backport #55929: pacific: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())ResolvedXiubo LiActions
Actions

Also available in: Atom PDF