Project

General

Profile

Actions

Bug #44132

closed

mds: assertion failure due to blacklist

Added by Patrick Donnelly about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-02-13T19:22:22.879+0000 7f4e012d1700  1 mds.0.76 handle_mds_map state change up:standby-replay --> up:replay
2020-02-13T19:22:22.879+0000 7f4e012d1700  5 mds.beacon.c set_want_state: up:standby-replay -> up:replay
2020-02-13T19:22:22.879+0000 7f4e012d1700 10 mds.0.76 Monitor activated us! Deactivating replay loop
2020-02-13T19:22:23.043+0000 7f4e012d1700  1 -- [v2:172.21.15.14:6836/2101899460,v1:172.21.15.14:6837/2101899460] <== mon.2 v2:172.21.15.152:3301/0 33 ==== osd_map(84..84 src has 1..84) v4 ==== 366+0+0 (crc 0 0 0) 0x558a08d57640 con 0x558a08d7c800
2020-02-13T19:22:23.043+0000 7f4e012d1700  7 mds.0.server operator(): full = 0 epoch = 84
2020-02-13T19:22:23.043+0000 7f4e012d1700  4 mds.0.76 handle_osd_map epoch 84, 0 new blacklist entries
2020-02-13T19:22:23.043+0000 7f4e012d1700 10 mds.0.server apply_blacklist: killed 0
2020-02-13T19:22:23.043+0000 7f4e012d1700  1 -- [v2:172.21.15.14:6836/2101899460,v1:172.21.15.14:6837/2101899460] --> [v2:172.21.15.152:3301/0,v1:172.21.15.152:6790/0] -- mon_subscribe({osdmap=85}) v3 -- 0x558a09a42be0 con 0x558a08d7c800
2020-02-13T19:22:23.763+0000 7f4dfcac8700 10 mds.0.cache cache not ready for trimming
2020-02-13T19:22:23.763+0000 7f4dfcac8700 20 mds.0.cache upkeep thread waiting interval 1s
2020-02-13T19:22:23.835+0000 7f4dff2cd700 10 MDSContext::complete: N7MDSRank26C_MDS_StandbyReplayRestartE
2020-02-13T19:22:23.835+0000 7f4dff2cd700  5 mds.0.76 Restarting replay as standby-replay
2020-02-13T19:22:23.835+0000 7f4dff2cd700  1 -- [v2:172.21.15.14:6836/2101899460,v1:172.21.15.14:6837/2101899460] --> [v2:172.21.15.152:6824/13891,v1:172.21.15.152:6825/13891] -- osd_op(unknown.0.76:72 12.4 12:292cf221:::200.00000000:head [read 0~0] snapc 0=[] ondisk+read+known_if_redirected+full_force e84) v8 -- 0x558a09955e00 con 0x558a08d7dc00
2020-02-13T19:22:23.835+0000 7f4e03ad6700  1 -- [v2:172.21.15.14:6836/2101899460,v1:172.21.15.14:6837/2101899460] <== osd.6 v2:172.21.15.152:6824/13891 66 ==== osd_op_reply(72 200.00000000 [read 0~0] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 156+0+0 (crc 0 0 0) 0x558a08cd9840 con 0x558a08d7dc00
2020-02-13T19:22:23.835+0000 7f4dfaac4700 -1 /build/ceph-15.1.0-645-g4520659/src/osdc/Journaler.cc: In function 'void Journaler::_finish_reread_head_and_probe(int, C_OnFinisher*)' thread 7f4dfaac4700 time 2020-02-13T19:22:23.836161+0000
/build/ceph-15.1.0-645-g4520659/src/osdc/Journaler.cc: 420: FAILED ceph_assert(!r)

 ceph version 15.1.0-645-g4520659 (452065957480d8475dcb57dd5023eaca663a3bd5) octopus (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x7f4e083a5560]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f4e083a573b]
 3: (()+0x41edfa) [0x558a074c5dfa]
 4: (Context::complete(int)+0x9) [0x558a071be159]
 5: (Journaler::_finish_reread_head(int, ceph::buffer::v14_2_0::list&, Context*)+0x2ae) [0x558a074c332e]
 6: (Context::complete(int)+0x9) [0x558a071be159]
 7: (Finisher::finisher_thread_entry()+0x195) [0x7f4e083f94c5]
 8: (()+0x76db) [0x7f4e07cb76db]
 9: (clone()+0x3f) [0x7f4e06e9d88f]

From: /ceph/teuthology-archive/pdonnell-2020-02-13_18:27:35-fs-wip-pdonnell-testing-20200213.155332-distro-basic-smithi/4760812/remote/smithi014/log/ceph-mds.c.log.gz

PR being tested is unrelated.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #44483: nautilus: mds: assertion failure due to blacklistResolvedWei-Chung ChengActions
Actions #1

Updated by Patrick Donnelly about 4 years ago

  • Status changed from New to Triaged
  • Assignee set to Milind Changire
Actions #2

Updated by Milind Changire about 4 years ago

  • Pull request ID set to 33662
Actions #3

Updated by Patrick Donnelly about 4 years ago

  • Status changed from Triaged to Fix Under Review
Actions #4

Updated by Patrick Donnelly about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v16.0.0 to v15.0.0
Actions #5

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #44483: nautilus: mds: assertion failure due to blacklist added
Actions #6

Updated by Nathan Cutler about 4 years ago

  • Backport changed from octopus,nautilus to nautilus
Actions #7

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF