Project

General

Profile

Actions

Bug #23848

closed

mds: stuck shutdown procedure

Added by Patrick Donnelly about 6 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following outputs in an infinite loop:

2018-04-25 03:59:12.370831 7f5fe3724700 20 mds.beacon.li569-18 0 slow request found
2018-04-25 03:59:13.863322 7f5fe2f23700 10 mds.beacon.li569-18 _send up:stopping seq 6125
2018-04-25 03:59:13.865970 7f5fe5f29700 10 mds.beacon.li569-18 handle_mds_beacon up:stopping seq 6125 rtt 0.002612
2018-04-25 03:59:17.370052 7f5fe3724700  7 mds.1.cache trim bytes_used=9kB limit=4GB reservation=0.05% count=0
2018-04-25 03:59:17.370120 7f5fe3724700  7 mds.1.cache trim_lru trimming 0 items from LRU size=2 mid=0 pintail=2 pinned=2
2018-04-25 03:59:17.370135 7f5fe3724700  7 mds.1.cache trim_lru trimmed 0 items
2018-04-25 03:59:17.370144 7f5fe3724700 20 mds.1.cache trim: maybe trimming base: [inode 0x1 [...2,head] / rep@0.1 v98 snaprealm=0x55bb5134a780 f(v0 m2018-04-25 03:45:17.892060 4=0+4) n(v15 rc2018-04-25 03:55:44.238018 b1629718024 56859=56846+13)/n(v0 1=0+1) (inest mix) (iversion lock) | dirfrag=1 dirwaiter=0 discoverbase=0 0x55bb50687500]
2018-04-25 03:59:17.370179 7f5fe3724700 10 mds.1.cache trim_client_leases
2018-04-25 03:59:17.370737 7f5fe3724700  2 mds.1.cache check_memory_usage total 415688, rss 34236, heap 313964, baseline 313964, buffers 0, 0 / 4 inodes have caps, 0 caps, 0 caps per inode
2018-04-25 03:59:17.370763 7f5fe3724700 10 mds.1.log trim 1 / 30 segments, 1 / -1 events, 0 (0) expiring, 0 (0) expired
2018-04-25 03:59:17.370770 7f5fe3724700 10 mds.1.log _trim_expired_segments waiting for 1/4194304 to expire
2018-04-25 03:59:17.370778 7f5fe3724700 10 mds.1.server find_idle_sessions.  laggy until 0.000000
2018-04-25 03:59:17.370784 7f5fe3724700 10 mds.1.locker scatter_tick
2018-04-25 03:59:17.370787 7f5fe3724700 20 mds.1.locker caps_tick 0 revoking caps
2018-04-25 03:59:17.370791 7f5fe3724700 20 mds.1.1445 updating export targets, currently 0 ranks are targets
2018-04-25 03:59:17.370794 7f5fe3724700 10 mds.1.log trim 1 / 30 segments, 1 / -1 events, 0 (0) expiring, 0 (0) expired
2018-04-25 03:59:17.370797 7f5fe3724700 10 mds.1.log _trim_expired_segments waiting for 1/4194304 to expire
2018-04-25 03:59:17.370800 7f5fe3724700  7 mds.1.cache shutdown_pass
2018-04-25 03:59:17.370803 7f5fe3724700 10 mds.1.cache shutdown_export_strays
2018-04-25 03:59:17.370808 7f5fe3724700  7 mds.1.cache trim bytes_used=9kB limit=4GB reservation=0.05% count=18446744073709551615
2018-04-25 03:59:17.370820 7f5fe3724700  7 mds.1.cache trim_lru trimming 18446744073709551615 items from LRU size=2 mid=0 pintail=2 pinned=2
2018-04-25 03:59:17.370831 7f5fe3724700  7 mds.1.cache trim_lru trimmed 0 items
2018-04-25 03:59:17.370835 7f5fe3724700 20 mds.1.cache trim: maybe trimming base: [inode 0x1 [...2,head] / rep@0.1 v98 snaprealm=0x55bb5134a780 f(v0 m2018-04-25 03:45:17.892060 4=0+4) n(v15 rc2018-04-25 03:55:44.238018 b1629718024 56859=56846+13)/n(v0 1=0+1) (inest mix) (iversion lock) | dirfrag=1 dirwaiter=0 discoverbase=0 0x55bb50687500]
2018-04-25 03:59:17.370850 7f5fe3724700  5 mds.1.cache lru size now 2/0
2018-04-25 03:59:17.370853 7f5fe3724700  7 mds.1.cache looking for subtrees to export to mds0
2018-04-25 03:59:17.370856 7f5fe3724700  7 mds.1.cache still have 2 subtrees
2018-04-25 03:59:17.370859 7f5fe3724700 15 mds.1.cache show_subtrees
2018-04-25 03:59:17.370867 7f5fe3724700 10 mds.1.cache |__ 1    auth [dir 0x101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 10=0+10) n(v0 10=0+10) hs=0+0,ss=0+0 | child=0 subtree=1 subtreetemp=0 waiter=0 authpin=0 0x55bb5079e000]
2018-04-25 03:59:17.370887 7f5fe3724700 10 mds.1.cache |__ 0     rep [dir 0x1 / [2,head] rep@0.1 REP dir_auth=0 state=0 f(v0 m2018-04-25 03:45:17.892060 4=0+4) n(v13 rc2018-04-25 03:55:40.094123 b1579203286 54824=54812+12)/n(v13 rc2018-04-25 03:55:39.878990 b1573821568 54445=54433+12) hs=2+0,ss=0+0 | dnwaiter=0 child=1 subtree=1 0x55bb5079ea80]
2018-04-25 03:59:17.370905 7f5fe3724700 10 mds.1.migrator show_importing
2018-04-25 03:59:17.370909 7f5fe3724700 10 mds.1.migrator  importing from 0: (2) discovered 0x100000003e9
2018-04-25 03:59:17.370916 7f5fe3724700 10 mds.1.migrator  importing from 0: (2) discovered 0x10000000bbb
2018-04-25 03:59:17.370919 7f5fe3724700 10 mds.1.migrator show_exporting
2018-04-25 03:59:17.370921 7f5fe3724700  7 mds.1.1445 shutdown_pass=false

While trying to reproduce an unrelated issue where I have 2 actives and some client I/O (crefi), the above happened while deactivating rank 1.

Actions

Also available in: Atom PDF