Project

General

Profile

Bug #15347

Failure while running ceph_objectstore_tool.py

Added by Anonymous over 6 years ago. Updated over 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
build
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hey,

While running the current master tree, I'm facing an ceph_objectstore_tool.py failure during the make check.

It does report "ERROR:Incorrect number of replicas seen 69" while the number of replicas is expected to be 30.

While diffing the job output with one that passed the check on another system, mine reports in addition :

WARNING: Split occurred, some objects may be ignored
Skipping object '#4:82a5f373:ns1::split3:head#' which belongs in pg 4.1
Skipping object '#4:be195a8d:::split3:head#' which belongs in pg 4.1
[...]

I'm reaching my debugging limits and would love to get insights on what could be possibly wrong here.

Note that my underlying filesystem is btrfs, I'm building in a minimalist chroot and run a FC23 with a 4.4.6 kernel.

The log file is attached to this ticket.

ceph_objectstore_tool.py.log View (26 KB) Anonymous, 04/01/2016 08:33 AM

data.tar.gz (3.35 KB) Anonymous, 04/01/2016 08:37 AM

History

#1 Updated by Anonymous over 6 years ago

I'm adding some complementary output :

I made a very simple patch to save the two directories that are used in this part of the code.
That will surely helps are debugging my case.
I pasted the diff to ease understand what I'm uploading here.

diff --git a/src/test/ceph_objectstore_tool.py b/src/test/ceph_objectstore_tool.py
index 2ca623f..f003f32 100755
--- a/src/test/ceph_objectstore_tool.py
++ b/src/test/ceph_objectstore_tool.py
@ -1836,7 +1836,9 @ def main(argv):
data_errors, count = check_data(DATADIR, TMPFILE, OSDDIR, SPLIT_NAME)
ERRORS += data_errors
if count != (SPLIT_OBJ_COUNT * SPLIT_NSPACE_COUNT * pool_size):
- logging.error("Incorrect number of replicas seen {count}".format(count=count))
logging.error("Incorrect number of replicas seen {count} while expecting {amount}".format(count=count,amount=SPLIT_OBJ_COUNT*SPLIT_NSPACE_COUNT*pool_size))
+ call("mkdir -p /tmp/DATADIR_FAILED/; cp -av {dir} /tmp/DATADIR_FAILED".format(dir=DATADIR), shell=True)
+ call("mkdir -p /tmp/TESTDIR_FAILED; cp -av {dir} /tmp/TESTDIR_FAILED".format(dir=TESTDIR), shell=True)
ERRORS += 1
vstart(new=False)
wait_for_health()

#2 Updated by David Zafman over 6 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to David Zafman

#3 Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Resolved

#4 Updated by Anonymous over 6 years ago

Please keep it open, that is still occuring on master.

#5 Updated by Sage Weil over 6 years ago

  • Status changed from Resolved to 12

#6 Updated by Greg Farnum over 5 years ago

  • Status changed from 12 to Can't reproduce

I'm not seeing anything in my email about this, and it's btrfs anyway.

Also available in: Atom PDF