Bug #2600
closed
osd: crazy long watch timeout?
Added by Sage Weil almost 12 years ago.
Updated almost 12 years ago.
Description
2012-06-17T14:44:56.132 INFO:teuthology.task.workunit.client.0.err:Iteration 9992
2012-06-17T14:44:56.149 INFO:teuthology.task.workunit.client.0.err:Iteration 9993
2012-06-17T14:44:56.171 INFO:teuthology.task.workunit.client.0.err:Iteration 9994
2012-06-17T14:44:56.191 INFO:teuthology.task.workunit.client.0.err:Iteration 9995
2012-06-17T14:44:56.213 INFO:teuthology.task.workunit.client.0.err:Iteration 9996
2012-06-17T14:44:56.246 INFO:teuthology.task.workunit.client.0.err:Iteration 9997
2012-06-17T14:44:56.266 INFO:teuthology.task.workunit.client.0.err:Iteration 9998
2012-06-17T14:44:56.288 INFO:teuthology.task.workunit.client.0.err:Iteration 9999
2012-06-17T17:14:59.491 INFO:teuthology.task.workunit.client.0.out:[ OK ] WatchStress.Stress1 (9662328 ms)
2012-06-17T17:14:59.491 INFO:teuthology.task.workunit.client.0.out:[----------] 1 test from WatchStress (9662328 ms total)
2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:
2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[----------] Global test environment tear-down
2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[==========] 1 test from 1 test case ran. (9662328 ms total)
2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[ PASSED ] 1 test.
that's 150 minutes.. maybe there is a minutes vs second thing going on?
also, i don't think it should block at all, if things are working properly.
ubuntu@teuthology:/a/sage-2012-06-17_14:10:40-regression-next-master-basic/1142
Possibly related to #2476
In another recurrence, there are no objecter requests:
$ ceph --admin-daemon /tmp/cephtest/asok.client.0 objecter_requests
{ "ops": [],
"linger_ops": [],
"pool_ops": [],
"pool_stat_ops": [],
"statfs_ops": []}
I think we'll need to turn up logging to figure out what's going on.
- Priority changed from High to Urgent
- Target version set to v0.50
another instance:
ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ cat config.yaml
kernel: &id001
kdb: true
sha1: c92a3ead0da1f13f5c971bba4eaa041ed22bb06e
nuke-on-error: true
overrides:
ceph:
fs: btrfs
log-whitelist:
- slow request
sha1: 23d31d3e2aa7f2b474a7b8e9d40deb245d8be9de
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
ubuntu@plana40.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv
ubuntu@plana42.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLK24JC5ni+UfmxncggOIxjHNsF5J/y/iureYMSpWSQLUsVIudQstka+qcojWrVfcG21W/2kokrvLHqLWyd1ar27ovQE3pMv+nG8jxL+OpEpR72QxAH5g+e7Uv3VewKVfssbyOvtOYWIVfp3z7GtpVlCO9V0DuUgbFyoiNih93kZGpIEkv2QA6j0YA65JntISwMwvXGF+VNilUOERHKXLdmLHKIbZ5aAjxGCJMs8fhsVTfPu9l1nEXklK7bgRa7395BJa48dfONdoESoZZrr4IwE3C6acc+NIw+qGwSnNweAVFvH9C+rbN6gAvW9ax0MMeK5TveDiwaTbEB6Po5UKF
ubuntu@plana67.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDqiZsiT7h3fNR9yzwK2WToaotO4olIxmVdh+aSf3ILwEpHjFYbWXymL0C77hn0MdGRbaOzWOSMMng3MAHKy9xR3/CGNXXqO7iEK1fJOvSfmypkvJDyrMY/RuSvdifcXJyREvFsSK6cdmRpO235ODhfui4FC5BLmgv/VvasH/1Ur4ALfe7UE9L+cU4VeoJdl082oYeo1nn1beERgaypX67MXepG2NKbEY77jG5FXbGVpKWmsgIEWiiX8p6+afTOP+8cGsM3vsAG7nTJeFVKkEHc7A8cPkT4l/iXKjSiwWAtU5NV0QmRC/1ad78+xTOWNzJaTrIxoKuuGpB+DjdvrJgN
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
clients:
client.0:
- rados/stress_watch.sh
ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ tail teuthology.log
2012-07-06T13:07:34.479 INFO:teuthology.task.workunit.client.0.err:Iteration 9990
2012-07-06T13:07:34.493 INFO:teuthology.task.workunit.client.0.err:Iteration 9991
2012-07-06T13:07:34.509 INFO:teuthology.task.workunit.client.0.err:Iteration 9992
2012-07-06T13:07:34.528 INFO:teuthology.task.workunit.client.0.err:Iteration 9993
2012-07-06T13:07:34.543 INFO:teuthology.task.workunit.client.0.err:Iteration 9994
2012-07-06T13:07:34.560 INFO:teuthology.task.workunit.client.0.err:Iteration 9995
2012-07-06T13:07:34.577 INFO:teuthology.task.workunit.client.0.err:Iteration 9996
2012-07-06T13:07:34.593 INFO:teuthology.task.workunit.client.0.err:Iteration 9997
2012-07-06T13:07:34.610 INFO:teuthology.task.workunit.client.0.err:Iteration 9998
2012-07-06T13:07:34.627 INFO:teuthology.task.workunit.client.0.err:Iteration 9999
ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ date
Fri Jul 6 15:50:52 PDT 2012
Oh.
2012-07-09 21:08:28.098888 7fdd12ffd700 1 -- 10.214.131.3:0/256001452 --> 10.214.131.4:6803/31761 -- osd_op(client.4877.0:1 foo [watch 1~0] 3.7fc1f406) v4 -- ?+0 0x7fdd08721530 con 0x7fdd087210e0
2012-07-09 21:08:28.098902 7fdd12ffd700 5 client.4877.objecter 0 unacked, 1 uncommitted
2012-07-09 21:08:28.098953 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
2012-07-09 21:08:28.098993 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
....
2012-07-09 21:37:18.158734 7fda9759e700 10 client.4877.objecter tick
2012-07-09 21:37:18.158746 7fda9759e700 2 client.4877.objecter tid 1 on osd.4 is laggy
2012-07-09 21:37:18.158751 7fda9759e700 10 client.4877.objecter pinging osd that serves lingering tid 1 (osd.4)
2012-07-09 21:37:18.158754 7fda9759e700 10 client.4877.objecter maybe_request_map subscribing (onetime) to next osd map
2012-07-09 21:37:18.158759 7fda9759e700 1 -- 10.214.131.3:0/256001452 --> 10.214.131.4:6803/31761 -- ping v1 -- ?+0 0x7fdbc82f9d30 con 0x7fdd087210e0
2012-07-09 21:37:18.158874 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
- Status changed from New to In Progress
- Assignee set to Sage Weil
- Status changed from In Progress to Resolved
Also available in: Atom
PDF