Project

General

Profile

Actions

Bug #11930

closed

test/ceph_objectstore_tool.py fails with OSD has the store locked

Added by Nathan Cutler almost 9 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I run "run-make-check.sh" on master, all tests pass except ceph_objectstore_tool.py

$ cat ./src/test/ceph_objectstore_tool.py.log
vstarting.... DONE
Wait for health_ok... DONE
Created Replicated pool #1
Created Erasure coded pool #2
Creating 4 objects in replicated pool
Creating 4 objects in erasure coded pool
Test invalid parameters
Test --op dump-journal
        Journal max_size = 104857600
Test --op list variants
Test --op list by generating json for all objects using default format
Test get-bytes and set-bytes
Test list-attrs get-attr
Test pg info
Test pg logging
Test list-pgs
Test pg export --dry-run
Test pg export
Test pg removal
Test pg import
Verify replicated import data
vstarting.... DONE
Wait for health_ok... DONE
Verify erasure coded import data
Test import-rados
Wait for health_ok... DONE
Remove pgs for another import
OSD has the store locked
ERROR:Removing failed for pg 1.0 on osd0 with 1
WARNING:SKIPPING IMPORT TESTS DUE TO PREVIOUS FAILURES
vstarting.... DONE
Wait for health_ok... DONE
TEST FAILED WITH 1 ERRORS

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #11398: tests: osd/osd-scrub-repair.sh objectstore-tool racesResolvedLoïc Dachary04/15/2015

Actions
Actions #2

Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to 12
  • Assignee deleted (Loïc Dachary)
Actions #3

Updated by Loïc Dachary almost 9 years ago

the machine running the tests does not have a SSD and make -j4 ( nproc == 8 ) and is therefore expected to be very slow at times.

Actions #4

Updated by Loïc Dachary almost 9 years ago

maybe the pid file does not exist for some reason: it's the only way kill_daemon could return without actually killing the daemon

Actions #5

Updated by Loïc Dachary almost 9 years ago

  • Subject changed from test/ceph_objectstore_tool.py fails on fast machine with HDD to test/ceph_objectstore_tool.py fails with OSD has the store locked
Actions #6

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #7

Updated by Nathan Cutler almost 9 years ago

I ran run-make-check.sh again on a clean checkout of master. Again ceph_objectstore_tool.py is the only failing test, but the log is different:

FAIL: ./src/test/ceph_objectstore_tool.py.log
vstarting.... DONE
Wait for health_ok... DONE
Created Replicated pool #1
Created Erasure coded pool #2
Creating 4 objects in replicated pool
Creating 4 objects in erasure coded pool
Test invalid parameters
Test all --op dump-journal
        Journal max_size = 104857600
WARNING:No ops found in entry 66 trans 0
WARNING:No ops found in entry 74 trans 0
WARNING:No ops found in entry 79 trans 0
WARNING:No ops found in entry 84 trans 0
        Journal max_size = 104857600
        Journal max_size = 104857600
WARNING:No ops found in entry 7 trans 0
WARNING:No ops found in entry 11 trans 0
WARNING:No ops found in entry 15 trans 0
WARNING:No ops found in entry 28 trans 0
        Journal max_size = 104857600
Test --op list variants
Test --op list by generating json for all objects using default format
Test get-bytes and set-bytes
Test list-attrs get-attr
Test pg info
Test pg logging
Test list-pgs
Test pg export --dry-run
Test pg export
Test pg removal
Test pg import
Verify replicated import data
Test all --op dump-journal again
        Journal max_size = 104857600
        Journal max_size = 104857600
        Journal max_size = 104857600
        Journal max_size = 104857600
vstarting.... DONE
Wait for health_ok... DONE
Verify erasure coded import data
Test import-rados
Testing import all objects after a split
Wait for health_ok... DONE
ERROR:Exporting failed for pg 4.0 on osd0 with 1
TEST FAILED WITH 1 ERRORS

Actions #8

Updated by Nathan Cutler almost 9 years ago

For completeness, here is how I am setting up the build environment:

test/docker-test.sh --os-type fedora --os-version 21 --shell

Actions #9

Updated by Samuel Just almost 9 years ago

  • Assignee set to David Zafman
Actions #10

Updated by David Zafman almost 9 years ago

  • Status changed from 12 to Need More Info

I've never run into this issue. Please try the following after you've built source to see if it fails:

# cd src
# test/ceph_objectstore_tool.py

If it does fail lets get some extra output by making a change to the test code and running the test again. Also, I'd be curious if it fails in different ways if you try it multiple times.

diff --git a/src/test/ceph_objectstore_tool.py b/src/test/ceph_objectstore_tool.py
index 3bb1d34..9b6901d 100755
--- a/src/test/ceph_objectstore_tool.py
+++ b/src/test/ceph_objectstore_tool.py
@@ -357,7 +357,7 @@ def check_data(DATADIR, TMPFILE, OSDDIR, SPLIT_NAME):

 def main(argv):
     sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
-    nullfd = open(os.devnull, "w")
+    nullfd = sys.stdout

     call("rm -fr {dir}; mkdir {dir}".format(dir=CEPH_DIR), shell=True)
     os.environ["CEPH_DIR"] = CEPH_DIR
Actions #11

Updated by David Zafman about 8 years ago

  • Status changed from Need More Info to Can't reproduce

I bet this has been fixed, but if not reopen.

Actions

Also available in: Atom PDF