Project

General

Profile

Actions

Bug #21430

closed

ceph-fuse blocked OSD op threads => OSD restart loop

Added by Martin Millnert over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I can now seemingly easily reproduce a trigger of OSD meltdowns from what seems to be blocked OSD op threads, using a single ceph-fuse write client (only client mounted the cephfs).

Details:
Single MDS
Single MON
5 BlueStore OSD
CephFS EC base pool (no file layout tricks)
Version all around: 12.2.0
Dist: Debian
Write operation: rsync of a maildir folder, files like "VMs/ncis.millnert.se/home/vmail/millnert.se/martin/.Archive.Chalmers.2008-Inbox/cur/1397939074.M5617P1519V000000000000FD06I00000000000E6107_2081.ncis,S=8596431:2,"

After blocking the writes, I appear to sometimes have been able to simply ctrl+c them and then recover. Other times and usually I need to systemctl restart the blocked OSD in question. I never appear to need to break the mount and remount.

I figure since this is a pretty straighforward setup and I'm seemingly able to easily reproduce it we could swiftly get at the core of the issue?


Files

ceph-osd.4.log.gz (340 KB) ceph-osd.4.log.gz Martin Millnert, 09/19/2017 09:01 AM
Actions #1

Updated by Josh Durgin over 6 years ago

This sounds like the bluestore aio issue that'll be fixed in 12.2.1 - if this is a test cluster you could try the current luminous branch

Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to Closed

Closing this assuming 12.2.1 resolved the reporter's problem.

Actions

Also available in: Atom PDF