Project

General

Profile

Actions

Bug #13804

closed

rbd/copy.sh: hangs with features/layering.yaml msgr-failures/many.yaml (hammer)

Added by Loïc Dachary over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2015-11-12_15:36:25-rbd-hammer-backports---basic-multi/1147551/

2015-11-15T01:43:53.961 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd create --image-format 1 -s 1 test1
2015-11-15T01:43:54.105 INFO:tasks.workunit.client.0.plana24.stderr:+ rados rm -p rbd test1.rbd
2015-11-15T01:43:54.179 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd rm test1
2015-11-15T01:43:54.252 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.247359 7f6cd51a8840 -1 librbd::ImageCtx: error finding header: (2) No such file or directory
2015-11-15T01:43:54.283 INFO:tasks.workunit.client.0.plana24.stderr:
Removing image: 100% complete...done.
2015-11-15T01:43:54.288 INFO:tasks.workunit.clien0t.0.plana24.stderr:+ rbd ls
2015-11-15T01:43:54.288 INFO:tasks.workunit.client.0.plana24.stderr:+ wc -l
2015-11-15T01:43:54.289 INFO:tasks.workunit.client.0.plana24.stderr:+ grep ^0$
2015-11-15T01:43:54.366 INFO:tasks.workunit.client.0.plana24.stdout:0
2015-11-15T01:43:54.367 INFO:tasks.workunit.client.0.plana24.stderr:+ [ 1 -eq 0 ]
2015-11-15T01:43:54.367 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd create --new-format -s 1 test2
2015-11-15T01:43:54.535 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd snap create test2@snap
2015-11-15T01:43:54.986 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.980533 7f7f504f5700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x4616010 sd=11 :59520 s=2 pgs=575 cs=1 l=1 c=0x4614dc0).injecting socket failure
2015-11-15T01:43:54.986 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.981061 7f7f62cef840  0 -- 10.214.131.16:0/977191717 submit_message osd_op(client.4816.0:13 rbd_header.12872a487cb0 [watch unwatch cookie 73403024] 1.18ad0156 ondisk+write+known_if_redirected e19) v5 remote, 10.214.131.16:6800/6988, failed lossy con, dropping message 0x4604150
2015-11-15T01:48:27.606 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:48:27.600351 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T01:58:27.619 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:27.612928 7f7f581f2700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f400145d0 sd=12 :56232 s=2 pgs=269 cs=1 l=1 c=0x7f7f40008f90).injecting socket failure
2015-11-15T01:58:27.619 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:27.613172 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T01:58:55.086 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:55.081019 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x7f7f40014680 sd=11 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f4001c540).injecting socket failure
2015-11-15T01:58:55.087 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:55.081143 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x7f7f40014680 sd=11 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f4001c540).fault
2015-11-15T02:00:57.826 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:00:57.820936 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T02:13:54.732 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:13:54.726151 7f7f581f2700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6804/6997 pipe(0x7f7f400236e0 sd=9 :52720 s=1 pgs=0 cs=0 l=1 c=0x7f7f40000ff0).fault
2015-11-15T02:21:47.648 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:21:47.641871 7f7f503f4700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f40010330 sd=12 :56540 s=2 pgs=269 cs=1 l=1 c=0x7f7f4001cb40).injecting socket failure
2015-11-15T02:21:47.648 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:21:47.642052 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T02:24:57.651 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:24:57.645925 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T02:52:27.685 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:52:27.680893 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T02:59:57.694 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:59:57.689881 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T03:08:17.705 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:08:17.700055 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f400145d0 sd=12 :58120 s=2 pgs=341 cs=1 l=1 c=0x7f7f40004af0).injecting socket failure
2015-11-15T03:08:17.705 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:08:17.700237 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T03:16:47.715 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.710627 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f4001e830 sd=10 :58354 s=2 pgs=383 cs=1 l=1 c=0x7f7f40008e70).injecting socket failure
2015-11-15T03:16:47.716 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.710802 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T03:16:47.716 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.711610 7f7f503f4700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f400097e0).injecting socket failure
2015-11-15T03:16:47.717 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.711701 7f7f503f4700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f400097e0).fault
2015-11-15T03:19:17.718 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:19:17.713744 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :41838 s=2 pgs=372 cs=1 l=1 c=0x7f7f400097e0).injecting socket failure
2015-11-15T03:19:17.719 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:19:17.714113 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T03:49:47.756 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:49:47.751349 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f4001d9e0 sd=10 :58654 s=2 pgs=366 cs=1 l=1 c=0x7f7f400025b0).injecting socket failure
2015-11-15T03:49:47.756 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:49:47.751532 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T04:03:27.772 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:03:27.768277 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f40010330 sd=10 :42634 s=2 pgs=391 cs=1 l=1 c=0x7f7f40018560).injecting socket failure
2015-11-15T04:03:27.773 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:03:27.768493 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T04:28:27.802 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:28:27.798108 7f7f62ced700  0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f40023660 sd=10 :59764 s=2 pgs=440 cs=1 l=1 c=0x7f7f40021c80).injecting socket failure
2015-11-15T04:28:27.803 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:28:27.798431 7f7f527fc700  0 monclient: hunting for new mon
2015-11-15T04:39:42.217 INFO:tasks.workunit:Stopping ['rbd/copy.sh'] on client.0...

The mon did not crash and their log do not show anything obviously wrong.

Actions #1

Updated by Loïc Dachary over 8 years ago

  • Subject changed from rbd/copy.sh: hangs forever (hammer) to rbd/copy.sh: hangs with features/layering.yaml msgr-failures/many.yaml (hammer)
Actions #2

Updated by Loïc Dachary over 8 years ago

Re-run failed job

filter='rbd/cli/{base/install.yaml cachepool/small.yaml clusters/fixed-1.yaml features/layering.yaml fs/btrfs.yaml msgr-failures/many.yaml workloads/rbd_cli_copy.yaml}'
teuthology-suite --priority 101 --suite rbd --filter="$filter" --suite-branch hammer --distro ubuntu --email loic@dachary.org --ceph hammer-backports --machine-type plana,burnupi,mira

Re-run the failed job 10 times

filter='rbd/cli/{base/install.yaml cachepool/small.yaml clusters/fixed-1.yaml features/layering.yaml fs/btrfs.yaml msgr-failures/many.yaml workloads/rbd_cli_copy.yaml}'
teuthology-suite --num 10 --priority 101 --suite rbd --filter="$filter" --suite-branch hammer --distro ubuntu --email loic@dachary.org --ceph hammer-backports --machine-type plana,burnupi,mira
Actions #3

Updated by Loïc Dachary over 8 years ago

  • Assignee deleted (Loïc Dachary)

Wait a few weeks before declaring this "Can't reproduce"

Actions #4

Updated by Josh Durgin over 8 years ago

Looks to me like the ms failure injection just got unlucky and failed auth enough to make the client back off and eventually hit the teuthology timeout. The auth is successful on the monitor's side each time.

Actions #5

Updated by Josh Durgin over 8 years ago

  • Status changed from Need More Info to Can't reproduce

Can reopen if it recurs. Seems to be just failure injection getting unlucky.

Actions

Also available in: Atom PDF