Project

General

Profile

Bug #17973

"FAILED assert(m_processing == 0)" while running test_lock_fence.sh

Added by Venky Shankar 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
11/21/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

Seems like a race somewhere, unable to reproduce it locally (master branch), hit once here: http://qa-proxy.ceph.com/teuthology/vshankar-2016-11-20_12:31:27-rbd-wip-ec-partial-overwrites---basic-vps/565152/teuthology.log

2016-11-20T12:43:16.540 INFO:tasks.workunit.client.0.vpm117.stderr:/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.2-1776-g32c3569/src/common/WorkQueue.h: In function 'ThreadPool::PointerWQ<T>::~PointerWQ() [with T = librbd::AioImageRequest<librbd::ImageCtx>]' thread 7fdcc3b54740 time 2016-11-20 12:43:16.542665
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr:/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.2-1776-g32c3569/src/common/WorkQueue.h: 351: FAILED assert(m_processing == 0)
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: ceph version 11.0.2-1776-g32c3569 (32c3569ffd6bee14c2cc0072f5ab76b61094afe6)

Doesn't seem related to image datapool. Backtrace hints at the destruction path (rbd_close()):

2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: ceph version 11.0.2-1776-g32c3569 (32c3569ffd6bee14c2cc0072f5ab76b61094afe6)
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 1: (()+0x2277bb) [0x7fdcaf40e7bb]
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 2: (()+0x39d62) [0x7fdcaf220d62]
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 3: (()+0x68215) [0x7fdcaf24f215]
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 4: (()+0x7bd9f) [0x7fdcaf262d9f]
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 5: (()+0x862ed) [0x7fdcaf26d2ed]
2016-11-20T12:43:16.541 INFO:tasks.workunit.client.0.vpm117.stderr: 6: (rbd_close()+0x32) [0x7fdcaf232452]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 7: (()+0x1def8) [0x7fdcb7a2def8]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 8: (()+0x106f8) [0x7fdcb7a206f8]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 9: (PyObject_CallFunctionObjArgs()+0x1a1) [0x4dcd81]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 10: (PyEval_EvalFrameEx()+0x1f8c) [0x49b20c]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 11: python() [0x4a1634]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 12: (PyRun_FileExFlags()+0x92) [0x44e4a5]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 13: (PyRun_SimpleFileExFlags()+0x2ee) [0x44ec9f]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 14: (Py_Main()+0xb5e) [0x44f904]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 15: (__libc_start_main()+0xf5) [0x7fdcc3384f45]
2016-11-20T12:43:16.542 INFO:tasks.workunit.client.0.vpm117.stderr: 16: python() [0x578c4e]


Related issues

Copied to Backport #18024: jewel: "FAILED assert(m_processing == 0)" while running test_lock_fence.sh Resolved

History

#1 Updated by Venky Shankar 5 months ago

  • Status changed from New to In Progress
  • Assignee set to Venky Shankar
  • Severity changed from 3 - minor to 2 - major

Ran test_lock_fence.sh (master) in a loop and able to hit this backtrace in under 100 runs:

blacklisting 192.168.1.6:0/1578831909 until 2016-11-21 21:56:39.826806 (3600 sec)
/home/vshankar/ceph/src/common/WorkQueue.h: In function 'ThreadPool::PointerWQ<T>::~PointerWQ() [with T = librbd::AioImageRequest<librbd::ImageCtx>]' thread 7fd503ec4700 time 2016-11-21 20:56:40.350747
/home/vshankar/ceph/src/common/WorkQueue.h: 351: FAILED assert(m_processing == 0)
ceph version 11.0.2-1781-g9773abb (9773abb50fdb15b1efeaa93aeab08a99180f4ad8)
1: (()+0x2459c0) [0x7fd4e7ba09c0]
2: (()+0x4f3b7) [0x7fd4e79aa3b7]
3: (()+0x67929) [0x7fd4e79c2929]
4: (()+0x71eee) [0x7fd4e79cceee]
5: (()+0x2b93d) [0x7fd4f01dc93d]
6: (()+0x11e5c) [0x7fd4f01c2e5c]
7: (PyObject_Call()+0x43) [0x7fd503940ed3]
8: (PyObject_CallFunctionObjArgs()+0xc2) [0x7fd5039417b2]
9: (PyEval_EvalFrameEx()+0x442a) [0x7fd5039d725a]
10: (PyEval_EvalCodeEx()+0x8dc) [0x7fd5039dc76c]
11: (PyEval_EvalCode()+0x19) [0x7fd5039dc859]
12: (()+0xff08f) [0x7fd5039f608f]
13: (PyRun_FileExFlags()+0x72) [0x7fd5039f72a2]
14: (PyRun_SimpleFileExFlags()+0xe5) [0x7fd5039f84b5]
15: (Py_Main()+0xcd0) [0x7fd503a0a4a0]
16: (__libc_start_main()+0xf1) [0x7fd502c27731]
17: (_start()+0x29) [0x559de38cc7e9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
+ wait 14519
../qa/workunits/rbd/test_lock_fence.sh: line 32: 14519 Aborted (core dumped) python $RBDRW $IMAGE $LOCKID
+ rbdrw_exitcode=134
+ '[' 134 '!=' 108 ']'
+ echo 'wrong exitcode from rbdrw: 134'
wrong exitcode from rbdrw: 134
+ exit 1

Will have a look soon...

#2 Updated by Venky Shankar 5 months ago

  • Status changed from In Progress to Need Review
  • Backport set to jewel

#3 Updated by Jason Dillaman 5 months ago

  • Status changed from Need Review to Pending Backport

#4 Updated by Nathan Cutler 5 months ago

  • Copied to Backport #18024: jewel: "FAILED assert(m_processing == 0)" while running test_lock_fence.sh added

#5 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF