Project

General

Profile

Bug #1727

osd: failed assert(pending_ops > 0) in dequeue_op

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

from ml:

From:     Martin Mailand <martin@tuxadero.com>
To:     ceph-devel@vger.kernel.org
Subject:     osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
Date:     Mon, 14 Nov 2011 15:04:34 +0100 (11/14/2011 06:04:34 AM)

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
  in thread 7faeb6139700
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: /usr/bin/ceph-osd() [0x5b8b52]
  2: (()+0xfc60) [0x7faec4d1bc60]
  3: (gsignal()+0x35) [0x7faec34a1d05]
  4: (abort()+0x186) [0x7faec34a5ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
  6: (()+0xb9926) [0x7faec3d56926]
  7: (()+0xb9953) [0x7faec3d56953]
  8: (()+0xb9a5e) [0x7faec3d56a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x396) [0x5bddb6]
  10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  13: (()+0x6d8c) [0x7faec4d12d8c]
  14: (clone()+0x6d) [0x7faec355404d]

Associated revisions

Revision b47347bd (diff)
Added by Sage Weil over 9 years ago

osd: protect handle_osd_map requeueing with queue lock

pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.

Rip it out, and track queue size under the queue lock.

Fixes: #1727
Signed-off-by: Sage Weil <>

History

#1 Updated by Anonymous over 9 years ago

Happened again in the 11/21 nightlies - 2791, sepia33

#2 Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to High

#3 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position set to 5

#4 Updated by Sage Weil over 9 years ago

  • Status changed from New to 7
  • Assignee set to Sage Weil

#5 Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF