Project

General

Profile

Actions

Bug #20687

closed

osd: crashing on ec read partial failure

Added by Ashley Merrick almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since upgrading to latest RC : ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) seeing multiple OSD's crashing at a time happens at random period's however seems further effected during Recovery & Back-filling traffic.

OSD's most Filestore with a few a couple converted to bluestore, issue is effecting both.

Errors seen during some of the crashes:

1

172.16.3.10:6802/21760 --> 172.16.3.6:6808/15997 -- pg_update_log_missing(6.19ds12 epoch 101931/101928 rep_tid 59 entries 101931'55683 (0'0) error 6:b984d72a:::rbd_data.a1d870238e1f29.0000000000007c0b:head by client.30604127.0:31963 0.000000 2) v2 - 0x55bea0faefc0 con 0

2

log_channel(cluster) log [ERR] : 4.11c required past_interval bounds are empty [101500,100085) but past_intervals is not: ([90726,100084...0083] acting 28)
luminous
The above seems to be the same as : http://tracker.ceph.com/issues/20167 (Marked as Resolved in earlier luminous dev release)

3

2017-07-19 12:50:57.587194 7f19348f1700 -1 /build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: In function 'virtual void C_CopyFrom_AsyncReadCb::finish(int)' thread 7f19348f1700 time 2017-07-19 12:50:57.583192
/build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: 7585: FAILED assert(len <= reply_obj.data.length())


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #20167: osd/PG.cc: 806: FAILED assert(past_intervals.empty())Resolved06/02/2017

Actions
Actions

Also available in: Atom PDF