Project

General

Profile

Actions

Bug #18018

closed

tests: ceph-helpers.sh races when killing daemons

Added by Loïc Dachary over 7 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://jenkins.ceph.com/job/ceph-pull-requests/14885/console

/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:50: add_something:  local dir=td/osd-scrub-repair
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:51: add_something:  local poolname=ecpool
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:52: add_something:  local obj=SOMETHING
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:54: add_something:  ceph osd set noscrub
noscrub is set
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:55: add_something:  ceph osd set nodeep-scrub
nodeep-scrub is set
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:57: add_something:  local payload=ABCDEF
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:58: add_something:  echo ABCDEF
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:59: add_something:  rados --pool ecpool put SOMETHING td/osd-scrub-repair/ORIGINAL
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh: line 49: 10051 Terminated              rados --pool $poolname put $obj $dir/ORIGINAL
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:59: add_something:  return 1

One possible explanation for this unexpected kill is a kill_daemon still running in the background although it should not.


Files

osd-scrub-repair.txt.gz (856 KB) osd-scrub-repair.txt.gz Loïc Dachary, 11/24/2016 07:00 AM
osd-crush.txt.gz (674 KB) osd-crush.txt.gz Loïc Dachary, 11/24/2016 08:04 AM
Actions #2

Updated by Loïc Dachary over 7 years ago

Blocked by http://tracker.ceph.com/issues/18019, wait for it to be resolved to restore something stable.

Actions #3

Updated by Loïc Dachary over 7 years ago

osd-crush.sh experienced a similar and unexplained termination (see logs)

Actions #4

Updated by Loïc Dachary over 7 years ago

I thought maybe it was jenkins being rebooted or something, but both osd-crush.sh and osd-scrub-repair.sh happened hours from each other.

Actions #5

Updated by Loïc Dachary over 7 years ago

  • Status changed from In Progress to Can't reproduce
Actions #6

Updated by Kefu Chai about 7 years ago

i spotted it again

/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:150: corrupt_and_repair_one:  rados --pool ecpool get SOMETHING td/osd-scrub-repair/COPY
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh: line 132: 20631 Terminated              rados --pool $poolname get SOMETHING $dir/COPY
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:150: corrupt_and_repair_one:  return 1
2017-04-18 09:39:52.460333 7f4e62f9fb00  1 journal _open td/osd-scrub-repair/3/journal fd 25: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2017-04-18 09:39:52.460608 7f4e62f9fb00  1 filestore(td/osd-scrub-repair/3) upgrade
2017-04-18 09:39:52.460692 7f4e62f9fb00 -1 filestore(td/osd-scrub-repair/3) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-04-18 09:39:52.521025 7f4e62f9fb00  1 journal close td/osd-scrub-repair/3/journal
2017-04-18 09:39:52.523247 7f4e62f9fb00 -1 created object store td/osd-scrub-repair/3 for osd.3 fsid 030953c5-acc6-4470-a57b-301b0d90ec27
2017-04-18 09:39:52.523286 7f4e62f9fb00 -1 auth: error reading file: td/osd-scrub-repair/3/keyring: can't open td/osd-scrub-repair/3/keyring: (2) No such file or directory
2017-04-18 09:39:52.523414 7f4e62f9fb00 -1 created new key in keyring td/osd-scrub-repair/3/keyring
2017-04-18 09:39:52.728703 7f2b38f62b00  0 ceph version 12.0.0-2702-g540e725 (540e7255e8ad650639353cb8461dfef832051d28), process ceph-osd, pid 7022
2017-04-18 09:39:52.728789 7f2b38f62b00  5 object store type is filestore
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:171: corrupt_and_repair_erasure_coded:  return 1
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:241: TEST_corrupt_and_repair_jerasure:  return 1
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:45: run:  return 1

in "TEST_corrupt_and_repair_jerasure" this time.

Actions

Also available in: Atom PDF