Project

General

Profile

Bug #53645

MDCache::shutdown_pass: ceph_assert(!migrator->is_importing())

Added by Dan van der Ster about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm running a pinning/multimds thrash test (see stressfs.sh attached) on a 3 node test cluster and occasionally seeing this crash while stopping:

   -10> 2021-12-16 15:50:15.760 7f4ab1070700  7 mds.2.cache shutdown_pass
    -9> 2021-12-16 15:50:15.760 7f4ab1070700 10 mds.2.cache shutdown_export_strays 0x61d ''
    -8> 2021-12-16 15:50:15.760 7f4ab1070700  7 mds.2.cache trim bytes_used=41kB limit=2GB reservation=0.05% count=18446744073709551615
    -7> 2021-12-16 15:50:15.760 7f4ab1070700  7 mds.2.cache trim_lru trimming 18446744073709551615 items from LRU size=10 mid=0 pintail=10 pinned=10
    -6> 2021-12-16 15:50:15.760 7f4ab1070700  7 mds.2.cache trim_lru trimmed 0 items
    -5> 2021-12-16 15:50:15.760 7f4ab1070700  5 mds.2.cache lru size now 10/0
    -4> 2021-12-16 15:50:15.760 7f4ab1070700  7 mds.2.cache looking for subtrees to export to mds0
    -3> 2021-12-16 15:50:15.760 7f4ab1070700 10 mds.2.log trim_all: 1/0/0
    -2> 2021-12-16 15:50:15.760 7f4ab1070700 10 mds.2.log _trim_expired_segments waiting for 20758/71894891982 to expire
    -1> 2021-12-16 15:50:15.761 7f4ab1070700 -1 /builddir/build/BUILD/ceph-14.2.22/src/mds/MDCache.cc: In function 'bool MDCache::shutdown_pass()' thread 7f4ab1070700 time 2021-12-16 15:50:15.760517
/builddir/build/BUILD/ceph-14.2.22/src/mds/MDCache.cc: 7786: FAILED ceph_assert(!migrator->is_importing())

 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f4abcc3282a]
 2: (()+0x274a44) [0x7f4abcc32a44]
 3: (MDCache::shutdown_pass()+0x13a7) [0x55d286352e87]
 4: (MDSRankDispatcher::tick()+0x2a8) [0x55d2862375c8]
 5: (FunctionContext::finish(int)+0x30) [0x55d2862212e0]
 6: (Context::complete(int)+0xd) [0x55d28621f46d]
 7: (SafeTimer::timer_thread()+0x19c) [0x7f4abcd0e76c]
 8: (SafeTimerThread::entry()+0x11) [0x7f4abcd10171]
 9: (()+0x817a) [0x7f4abaa1217a]
 10: (clone()+0x43) [0x7f4ab9466d83]

mds log is at ceph-post-file: 358f9d7b-5953-4a7d-818a-092d1a645b3c

There have been some changes in shutdown_pass related to ephemeral pinning, so we'll try this on a pacific test cluster next to see if it is still a bug.

stressfs.sh View (1.62 KB) Dan van der Ster, 12/16/2021 04:32 PM

History

#1 Updated by Venky Shankar about 1 year ago

Dan, thanks for the report. Please let us know if you hit this in pacific.

#2 Updated by Dan van der Ster about 1 year ago

  • Affected Versions v15.2.15 added

Still present in octopus:

2022-01-13T16:35:30.607+0100 7f1c07330700  2 mds.2.cache Memory usage:  total 2930892, rss 208844, heap 323792, baseline 323792, 0 / 12 inodes have caps, 0 caps, 0 caps per inode
2022-01-13T16:35:30.984+0100 7f1c09b35700 -1 /builddir/build/BUILD/ceph-15.2.15/src/mds/MDCache.cc: In function 'bool MDCache::shutdown_pass()' thread 7f1c09b35700 time 2022-01-13T16:35:30.976714+0100
/builddir/build/BUILD/ceph-15.2.15/src/mds/MDCache.cc: 7922: FAILED ceph_assert(!migrator->is_importing())

 ceph version 15.2.15-2 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1c13515ee0]
 2: (()+0x27e0fa) [0x7f1c135160fa]
 3: (MDCache::shutdown_pass()+0x1771) [0x56402b685f81]
 4: (MDSRankDispatcher::tick()+0x298) [0x56402b5670b8]
 5: (Context::complete(int)+0xd) [0x56402b55015d]
 6: (SafeTimer::timer_thread()+0x1b7) [0x7f1c135f0d57]
 7: (SafeTimerThread::entry()+0x11) [0x7f1c135f2331]
 8: (()+0x817f) [0x7f1c1233117f]
 9: (clone()+0x43) [0x7f1c10d85d83]

Also available in: Atom PDF