Project

General

Profile

Actions

Bug #15347

closed

Failure while running ceph_objectstore_tool.py

Added by Anonymous about 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
David Zafman
Category:
build
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hey,

While running the current master tree, I'm facing an ceph_objectstore_tool.py failure during the make check.

It does report "ERROR:Incorrect number of replicas seen 69" while the number of replicas is expected to be 30.

While diffing the job output with one that passed the check on another system, mine reports in addition :

WARNING: Split occurred, some objects may be ignored
Skipping object '#4:82a5f373:ns1::split3:head#' which belongs in pg 4.1
Skipping object '#4:be195a8d:::split3:head#' which belongs in pg 4.1
[...]

I'm reaching my debugging limits and would love to get insights on what could be possibly wrong here.

Note that my underlying filesystem is btrfs, I'm building in a minimalist chroot and run a FC23 with a 4.4.6 kernel.

The log file is attached to this ticket.


Files

ceph_objectstore_tool.py.log (26 KB) ceph_objectstore_tool.py.log Anonymous, 04/01/2016 08:33 AM
data.tar.gz (3.35 KB) data.tar.gz Anonymous, 04/01/2016 08:37 AM
Actions #1

Updated by Anonymous about 8 years ago

I'm adding some complementary output :

I made a very simple patch to save the two directories that are used in this part of the code.
That will surely helps are debugging my case.
I pasted the diff to ease understand what I'm uploading here.

diff --git a/src/test/ceph_objectstore_tool.py b/src/test/ceph_objectstore_tool.py
index 2ca623f..f003f32 100755
--- a/src/test/ceph_objectstore_tool.py
++ b/src/test/ceph_objectstore_tool.py
@ -1836,7 +1836,9 @ def main(argv):
data_errors, count = check_data(DATADIR, TMPFILE, OSDDIR, SPLIT_NAME)
ERRORS += data_errors
if count != (SPLIT_OBJ_COUNT * SPLIT_NSPACE_COUNT * pool_size):
- logging.error("Incorrect number of replicas seen {count}".format(count=count))
logging.error("Incorrect number of replicas seen {count} while expecting {amount}".format(count=count,amount=SPLIT_OBJ_COUNT*SPLIT_NSPACE_COUNT*pool_size))
+ call("mkdir -p /tmp/DATADIR_FAILED/; cp -av {dir} /tmp/DATADIR_FAILED".format(dir=DATADIR), shell=True)
+ call("mkdir -p /tmp/TESTDIR_FAILED; cp -av {dir} /tmp/TESTDIR_FAILED".format(dir=TESTDIR), shell=True)
ERRORS += 1
vstart(new=False)
wait_for_health()

Actions #2

Updated by David Zafman about 8 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to David Zafman
Actions #3

Updated by Sage Weil about 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #4

Updated by Anonymous about 8 years ago

Please keep it open, that is still occuring on master.

Actions #5

Updated by Sage Weil about 8 years ago

  • Status changed from Resolved to 12
Actions #6

Updated by Greg Farnum about 7 years ago

  • Status changed from 12 to Can't reproduce

I'm not seeing anything in my email about this, and it's btrfs anyway.

Actions

Also available in: Atom PDF