Actions
Bug #37543
closedmds: purge queue recovery hangs during boot if PQ journal is damaged
Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Failure: Test failure: test_object_deletion (tasks.cephfs.test_damage.TestDamage) 2 jobs: ['3291762', '3291647'] suites intersection: ['clusters/1-mds-4-client-coloc.yaml', 'conf/{client.yaml', 'fs/basic_functional/{begin.yaml', 'mds.yaml', 'mon.yaml', 'mount/fuse.yaml', 'no_client_pidfile.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/damage.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}'] suites union: ['clusters/1-mds-4-client-coloc.yaml', 'conf/{client.yaml', 'fs/basic_functional/{begin.yaml', 'mds.yaml', 'mon.yaml', 'mount/fuse.yaml', 'no_client_pidfile.yaml', 'objectstore/bluestore-ec-root.yaml', 'objectstore/bluestore.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'supported-random-distros$/{ubuntu_16.04.yaml}', 'supported-random-distros$/{ubuntu_latest.yaml}', 'tasks/damage.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
e.g.:
2018-11-29 11:59:29.590 7fc726f32700 -1 mds.0.purge_queue operator(): Error -22 loading Journaler 2018-11-29 11:59:29.590 7fc726f32700 -1 mds.0.139 unhandled write error (22) Invalid argument, force readonly... 2018-11-29 11:59:29.590 7fc726f32700 1 mds.0.cache force file system read-only 2018-11-29 11:59:29.590 7fc726f32700 0 log_channel(cluster) log [WRN] : force file system read-only ... 2018-11-29 11:59:29.594 7fc725f30700 2 mds.0.139 boot_start 2: replaying mds log 2018-11-29 11:59:29.594 7fc725f30700 2 mds.0.139 boot_start 2: waiting for purge queue recovered
From: /ceph/teuthology-archive/pdonnell-2018-11-29_06:44:45-fs-wip-pdonnell-testing-20181129.042324-distro-basic-smithi/3291762/remote/smithi061/log/ceph-mds.a-s.log.gz
The purge queue never recovers so the MDS sits in up:replay.
This is with testing of https://github.com/ceph/ceph/pull/25270 . I will proceed with merging #25270 since an MDS sitting in up:replay is not much different from a damaged rank from a user perspective. This still needs fixed.
Updated by Patrick Donnelly over 5 years ago
- Related to Bug #37394: mds: PurgeQueue write error handler does not handle EBLACKLISTED added
Updated by Patrick Donnelly over 5 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 25621
Updated by Patrick Donnelly over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37898: mimic: mds: purge queue recovery hangs during boot if PQ journal is damaged added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37899: luminous: mds: purge queue recovery hangs during boot if PQ journal is damaged added
Updated by Patrick Donnelly over 5 years ago
- Related to Bug #37944: qa: test_damage needs to silence MDS_READ_ONLY added
Updated by Patrick Donnelly about 5 years ago
- Status changed from Pending Backport to Resolved
Actions