Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but is not in recovering, error! - RADOS - Ceph

Actions

Copy link

Bug #18162

closed

osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but is not in recovering, error!

Added by Aaron T over 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

David Zafman

Category:

EC Pools

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

jewel,luminous

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v10.2.3, Ceph - v10.2.4

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I encountered the bug in #13937. I wanted to help test PR12088, and may have encountered an unrelated bug as a result.

     0> 2016-12-06 14:33:35.773259 7f3357278700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)' thread 7f3357278700 time 2016-12-06 14:33:35.758593
osd/ReplicatedPG.cc: 10740: FAILED assert(0)

 ceph version 10.2.3-366-g289696d (289696d533038c2248c1fe0c8ee03adad343cfa9)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5619aedc4af0]
 2: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0xa3f) [0x5619ae87843f]
 3: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xc2e) [0x5619ae87ffee]
 4: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x372) [0x5619ae6f3d72]
 5: (ThreadPool::WorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x20) [0x5619ae742090]
 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x5619aedb6cc1]
 7: (ThreadPool::WorkThread::entry()+0x10) [0x5619aedb7dc0]
 8: (()+0x770a) [0x7f33819c970a]
 9: (clone()+0x6d) [0x7f337fa4282d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

After applying the PR at https://github.com/ceph/ceph/pull/12088, building ceph version 10.2.3-366-g289696d (289696d533038c2248c1fe0c8ee03adad343cfa9) on both Ubuntu 14.04 and 16.04 using the steps at http://docs.ceph.com/docs/jewel/install/build-ceph/ ...

I started the OSDs which had been marked "out" as per discussion related to #13937. Fairly shortly thereafter, the same OSDs which were crashing on 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) started crashing again, but with a new error. Attaching the entire log from one such OSD below with debug settings at 0/20.

As usual, please let me know what other information I can provide or tests I can run to help troubleshoot :)

Files

Download all files

ceph-osd.3.log.bz2 (442 KB) ceph-osd.3.log.bz2		Aaron T, 12/07/2016 12:14 AM
ec-handle-error-create-loc-list.patch (5.17 KB) ec-handle-error-create-loc-list.patch		Alexandre Oliva, 01/03/2017 01:34 AM
ec-handle-error-in-backfill-read.patch (10.8 KB) ec-handle-error-in-backfill-read.patch		Alexandre Oliva, 01/06/2017 02:24 AM
adjust.patch (542 Bytes) adjust.patch		Alexandre Oliva, 01/13/2017 10:11 AM
retrying-while-recovering.patch (1017 Bytes) retrying-while-recovering.patch		Alexandre Oliva, 01/22/2017 02:30 AM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #18162

osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but is not in recovering, error!

Updated by David Zafman over 7 years ago

Updated by David Zafman over 7 years ago

Updated by Aaron T over 7 years ago

Updated by Aaron T over 7 years ago

Updated by David Zafman over 7 years ago

Updated by Aaron T over 7 years ago

Updated by Aaron T over 7 years ago

Updated by Aaron T over 7 years ago

Updated by Alexandre Oliva over 7 years ago

Updated by Alexandre Oliva over 7 years ago

Updated by Alexandre Oliva over 7 years ago

Updated by Alexandre Oliva over 7 years ago

Updated by Alexandre Oliva over 7 years ago

Updated by Aaron T over 7 years ago

Updated by Nathan Cutler over 7 years ago

Updated by Nathan Cutler over 7 years ago

Updated by Nathan Cutler over 7 years ago

Updated by Alexandre Oliva about 7 years ago

Updated by David Zafman about 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by Alastair Dewhurst over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by Alastair Dewhurst over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman over 6 years ago

Updated by David Zafman about 6 years ago