Project

General

Profile

Actions

Bug #17781

closed

All OSDs restart randomly on "hit timeout suicide" when scrub activate

Added by Yoann Moulin over 7 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
jewel
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

On my ceph cluster running Jewel 10.2.2, All OSDs die randomly by hitting suicide timeout as soon as scrubing is set.

This behavior appeared few minutes after I started to push 30TB of data on a S3 bucket on an EC 8+2 pool. Previously, I had pushed 4TB on that bucket without any issue.

here the ceph-post-file ID for logs : c86638df-a297-4f58-a337-0e570d4b8702

list of file :

cephprod_20161015_nodebug.log
cephprod_20161025_debug.log
cephprod-osd.0_20161025_debug.log
cephprod-osd.107_20161015_nodebug.log
cephprod-osd.131_20161015_nodebug.log
cephprod-osd.136_20161015_nodebug.log
cephprod-osd.24_20161015_nodebug.log
cephprod-osd.27_20161015_nodebug.log
cephprod-osd.37_20161015_nodebug.log
cephprod-osd.46_20161015_nodebug.log
cephprod-osd.64_20161015_nodebug.log
cephprod-osd.86_20161015_nodebug.log
cephprod-osd.90_20161025_debug.log
cephprod-osd.93_20161025_debug.log
cephprod-osd.95_20161015_nodebug.log
report.log

tag 20161015_nodebug : log file when the behaviors has started without debug activate
tag 20161025_debug : log file with debug activate when I reactivate scrubing
report.log : some information on the cluster

my previous mail on the ceph-user list about this : https://www.mail-archive.com/ceph-users@lists.ceph.com/msg33179.html

I can reproduce the behavior with more logs if needed, I just need to run "ceph osd set noscrub" and within 1 minute, the ceph status switch do HEALTH_ERR

thanks for your help

Yoann


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #17859: filestore: can get stuck in an unbounded loop during scrubResolved11/10/2016

Actions
Actions

Also available in: Atom PDF