Actions
Bug #13804
closedrbd/copy.sh: hangs with features/layering.yaml msgr-failures/many.yaml (hammer)
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
http://pulpito.ceph.com/loic-2015-11-12_15:36:25-rbd-hammer-backports---basic-multi/1147551/
2015-11-15T01:43:53.961 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd create --image-format 1 -s 1 test1 2015-11-15T01:43:54.105 INFO:tasks.workunit.client.0.plana24.stderr:+ rados rm -p rbd test1.rbd 2015-11-15T01:43:54.179 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd rm test1 2015-11-15T01:43:54.252 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.247359 7f6cd51a8840 -1 librbd::ImageCtx: error finding header: (2) No such file or directory 2015-11-15T01:43:54.283 INFO:tasks.workunit.client.0.plana24.stderr: Removing image: 100% complete...done. 2015-11-15T01:43:54.288 INFO:tasks.workunit.clien0t.0.plana24.stderr:+ rbd ls 2015-11-15T01:43:54.288 INFO:tasks.workunit.client.0.plana24.stderr:+ wc -l 2015-11-15T01:43:54.289 INFO:tasks.workunit.client.0.plana24.stderr:+ grep ^0$ 2015-11-15T01:43:54.366 INFO:tasks.workunit.client.0.plana24.stdout:0 2015-11-15T01:43:54.367 INFO:tasks.workunit.client.0.plana24.stderr:+ [ 1 -eq 0 ] 2015-11-15T01:43:54.367 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd create --new-format -s 1 test2 2015-11-15T01:43:54.535 INFO:tasks.workunit.client.0.plana24.stderr:+ rbd snap create test2@snap 2015-11-15T01:43:54.986 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.980533 7f7f504f5700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x4616010 sd=11 :59520 s=2 pgs=575 cs=1 l=1 c=0x4614dc0).injecting socket failure 2015-11-15T01:43:54.986 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:43:54.981061 7f7f62cef840 0 -- 10.214.131.16:0/977191717 submit_message osd_op(client.4816.0:13 rbd_header.12872a487cb0 [watch unwatch cookie 73403024] 1.18ad0156 ondisk+write+known_if_redirected e19) v5 remote, 10.214.131.16:6800/6988, failed lossy con, dropping message 0x4604150 2015-11-15T01:48:27.606 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:48:27.600351 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T01:58:27.619 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:27.612928 7f7f581f2700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f400145d0 sd=12 :56232 s=2 pgs=269 cs=1 l=1 c=0x7f7f40008f90).injecting socket failure 2015-11-15T01:58:27.619 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:27.613172 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T01:58:55.086 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:55.081019 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x7f7f40014680 sd=11 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f4001c540).injecting socket failure 2015-11-15T01:58:55.087 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 01:58:55.081143 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6800/6988 pipe(0x7f7f40014680 sd=11 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f4001c540).fault 2015-11-15T02:00:57.826 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:00:57.820936 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T02:13:54.732 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:13:54.726151 7f7f581f2700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6804/6997 pipe(0x7f7f400236e0 sd=9 :52720 s=1 pgs=0 cs=0 l=1 c=0x7f7f40000ff0).fault 2015-11-15T02:21:47.648 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:21:47.641871 7f7f503f4700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f40010330 sd=12 :56540 s=2 pgs=269 cs=1 l=1 c=0x7f7f4001cb40).injecting socket failure 2015-11-15T02:21:47.648 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:21:47.642052 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T02:24:57.651 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:24:57.645925 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T02:52:27.685 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:52:27.680893 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T02:59:57.694 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 02:59:57.689881 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T03:08:17.705 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:08:17.700055 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f400145d0 sd=12 :58120 s=2 pgs=341 cs=1 l=1 c=0x7f7f40004af0).injecting socket failure 2015-11-15T03:08:17.705 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:08:17.700237 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T03:16:47.715 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.710627 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f4001e830 sd=10 :58354 s=2 pgs=383 cs=1 l=1 c=0x7f7f40008e70).injecting socket failure 2015-11-15T03:16:47.716 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.710802 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T03:16:47.716 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.711610 7f7f503f4700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f400097e0).injecting socket failure 2015-11-15T03:16:47.717 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:16:47.711701 7f7f503f4700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f400097e0).fault 2015-11-15T03:19:17.718 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:19:17.713744 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f4000bf30 sd=9 :41838 s=2 pgs=372 cs=1 l=1 c=0x7f7f400097e0).injecting socket failure 2015-11-15T03:19:17.719 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:19:17.714113 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T03:49:47.756 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:49:47.751349 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6791/0 pipe(0x7f7f4001d9e0 sd=10 :58654 s=2 pgs=366 cs=1 l=1 c=0x7f7f400025b0).injecting socket failure 2015-11-15T03:49:47.756 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 03:49:47.751532 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T04:03:27.772 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:03:27.768277 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6790/0 pipe(0x7f7f40010330 sd=10 :42634 s=2 pgs=391 cs=1 l=1 c=0x7f7f40018560).injecting socket failure 2015-11-15T04:03:27.773 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:03:27.768493 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T04:28:27.802 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:28:27.798108 7f7f62ced700 0 -- 10.214.131.16:0/977191717 >> 10.214.131.16:6789/0 pipe(0x7f7f40023660 sd=10 :59764 s=2 pgs=440 cs=1 l=1 c=0x7f7f40021c80).injecting socket failure 2015-11-15T04:28:27.803 INFO:tasks.workunit.client.0.plana24.stderr:2015-11-15 04:28:27.798431 7f7f527fc700 0 monclient: hunting for new mon 2015-11-15T04:39:42.217 INFO:tasks.workunit:Stopping ['rbd/copy.sh'] on client.0...
The mon did not crash and their log do not show anything obviously wrong.
Updated by Loïc Dachary over 8 years ago
- Subject changed from rbd/copy.sh: hangs forever (hammer) to rbd/copy.sh: hangs with features/layering.yaml msgr-failures/many.yaml (hammer)
Updated by Loïc Dachary over 8 years ago
Re-run failed job
filter='rbd/cli/{base/install.yaml cachepool/small.yaml clusters/fixed-1.yaml features/layering.yaml fs/btrfs.yaml msgr-failures/many.yaml workloads/rbd_cli_copy.yaml}' teuthology-suite --priority 101 --suite rbd --filter="$filter" --suite-branch hammer --distro ubuntu --email loic@dachary.org --ceph hammer-backports --machine-type plana,burnupi,mira
Re-run the failed job 10 times
filter='rbd/cli/{base/install.yaml cachepool/small.yaml clusters/fixed-1.yaml features/layering.yaml fs/btrfs.yaml msgr-failures/many.yaml workloads/rbd_cli_copy.yaml}' teuthology-suite --num 10 --priority 101 --suite rbd --filter="$filter" --suite-branch hammer --distro ubuntu --email loic@dachary.org --ceph hammer-backports --machine-type plana,burnupi,mira
Updated by Loïc Dachary over 8 years ago
- Assignee deleted (
Loïc Dachary)
Wait a few weeks before declaring this "Can't reproduce"
Updated by Josh Durgin over 8 years ago
Looks to me like the ms failure injection just got unlucky and failed auth enough to make the client back off and eventually hit the teuthology timeout. The auth is successful on the monitor's side each time.
Updated by Josh Durgin over 8 years ago
- Status changed from Need More Info to Can't reproduce
Can reopen if it recurs. Seems to be just failure injection getting unlucky.
Actions