Bug #2600
osd: crazy long watch timeout?
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2012-06-17T14:44:56.132 INFO:teuthology.task.workunit.client.0.err:Iteration 9992 2012-06-17T14:44:56.149 INFO:teuthology.task.workunit.client.0.err:Iteration 9993 2012-06-17T14:44:56.171 INFO:teuthology.task.workunit.client.0.err:Iteration 9994 2012-06-17T14:44:56.191 INFO:teuthology.task.workunit.client.0.err:Iteration 9995 2012-06-17T14:44:56.213 INFO:teuthology.task.workunit.client.0.err:Iteration 9996 2012-06-17T14:44:56.246 INFO:teuthology.task.workunit.client.0.err:Iteration 9997 2012-06-17T14:44:56.266 INFO:teuthology.task.workunit.client.0.err:Iteration 9998 2012-06-17T14:44:56.288 INFO:teuthology.task.workunit.client.0.err:Iteration 9999 2012-06-17T17:14:59.491 INFO:teuthology.task.workunit.client.0.out:[ OK ] WatchStress.Stress1 (9662328 ms) 2012-06-17T17:14:59.491 INFO:teuthology.task.workunit.client.0.out:[----------] 1 test from WatchStress (9662328 ms total) 2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out: 2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[----------] Global test environment tear-down 2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[==========] 1 test from 1 test case ran. (9662328 ms total) 2012-06-17T17:14:59.492 INFO:teuthology.task.workunit.client.0.out:[ PASSED ] 1 test.
that's 150 minutes.. maybe there is a minutes vs second thing going on?
also, i don't think it should block at all, if things are working properly.
ubuntu@teuthology:/a/sage-2012-06-17_14:10:40-regression-next-master-basic/1142
History
#1 Updated by Josh Durgin over 11 years ago
Possibly related to #2476
#2 Updated by Josh Durgin over 11 years ago
In another recurrence, there are no objecter requests:
$ ceph --admin-daemon /tmp/cephtest/asok.client.0 objecter_requests { "ops": [], "linger_ops": [], "pool_ops": [], "pool_stat_ops": [], "statfs_ops": []}
I think we'll need to turn up logging to figure out what's going on.
#3 Updated by Sage Weil about 11 years ago
- Priority changed from High to Urgent
#4 Updated by Sage Weil about 11 years ago
- Target version set to v0.50
#5 Updated by Sage Weil about 11 years ago
another instance:
ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ cat config.yaml kernel: &id001 kdb: true sha1: c92a3ead0da1f13f5c971bba4eaa041ed22bb06e nuke-on-error: true overrides: ceph: fs: btrfs log-whitelist: - slow request sha1: 23d31d3e2aa7f2b474a7b8e9d40deb245d8be9de roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana40.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv ubuntu@plana42.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLK24JC5ni+UfmxncggOIxjHNsF5J/y/iureYMSpWSQLUsVIudQstka+qcojWrVfcG21W/2kokrvLHqLWyd1ar27ovQE3pMv+nG8jxL+OpEpR72QxAH5g+e7Uv3VewKVfssbyOvtOYWIVfp3z7GtpVlCO9V0DuUgbFyoiNih93kZGpIEkv2QA6j0YA65JntISwMwvXGF+VNilUOERHKXLdmLHKIbZ5aAjxGCJMs8fhsVTfPu9l1nEXklK7bgRa7395BJa48dfONdoESoZZrr4IwE3C6acc+NIw+qGwSnNweAVFvH9C+rbN6gAvW9ax0MMeK5TveDiwaTbEB6Po5UKF ubuntu@plana67.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDqiZsiT7h3fNR9yzwK2WToaotO4olIxmVdh+aSf3ILwEpHjFYbWXymL0C77hn0MdGRbaOzWOSMMng3MAHKy9xR3/CGNXXqO7iEK1fJOvSfmypkvJDyrMY/RuSvdifcXJyREvFsSK6cdmRpO235ODhfui4FC5BLmgv/VvasH/1Ur4ALfe7UE9L+cU4VeoJdl082oYeo1nn1beERgaypX67MXepG2NKbEY77jG5FXbGVpKWmsgIEWiiX8p6+afTOP+8cGsM3vsAG7nTJeFVKkEHc7A8cPkT4l/iXKjSiwWAtU5NV0QmRC/1ad78+xTOWNzJaTrIxoKuuGpB+DjdvrJgN tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - ceph-fuse: null - workunit: clients: client.0: - rados/stress_watch.sh ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ tail teuthology.log 2012-07-06T13:07:34.479 INFO:teuthology.task.workunit.client.0.err:Iteration 9990 2012-07-06T13:07:34.493 INFO:teuthology.task.workunit.client.0.err:Iteration 9991 2012-07-06T13:07:34.509 INFO:teuthology.task.workunit.client.0.err:Iteration 9992 2012-07-06T13:07:34.528 INFO:teuthology.task.workunit.client.0.err:Iteration 9993 2012-07-06T13:07:34.543 INFO:teuthology.task.workunit.client.0.err:Iteration 9994 2012-07-06T13:07:34.560 INFO:teuthology.task.workunit.client.0.err:Iteration 9995 2012-07-06T13:07:34.577 INFO:teuthology.task.workunit.client.0.err:Iteration 9996 2012-07-06T13:07:34.593 INFO:teuthology.task.workunit.client.0.err:Iteration 9997 2012-07-06T13:07:34.610 INFO:teuthology.task.workunit.client.0.err:Iteration 9998 2012-07-06T13:07:34.627 INFO:teuthology.task.workunit.client.0.err:Iteration 9999 ubuntu@teuthology:/a/elder-2012-07-06_14:31:38-regression-stable-linux-3.4.4-ceph-basic/6999$ date Fri Jul 6 15:50:52 PDT 2012
#6 Updated by Sage Weil about 11 years ago
Oh.
2012-07-09 21:08:28.098888 7fdd12ffd700 1 -- 10.214.131.3:0/256001452 --> 10.214.131.4:6803/31761 -- osd_op(client.4877.0:1 foo [watch 1~0] 3.7fc1f406) v4 -- ?+0 0x7fdd08721530 con 0x7fdd087210e0 2012-07-09 21:08:28.098902 7fdd12ffd700 5 client.4877.objecter 0 unacked, 1 uncommitted 2012-07-09 21:08:28.098953 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files 2012-07-09 21:08:28.098993 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files .... 2012-07-09 21:37:18.158734 7fda9759e700 10 client.4877.objecter tick 2012-07-09 21:37:18.158746 7fda9759e700 2 client.4877.objecter tid 1 on osd.4 is laggy 2012-07-09 21:37:18.158751 7fda9759e700 10 client.4877.objecter pinging osd that serves lingering tid 1 (osd.4) 2012-07-09 21:37:18.158754 7fda9759e700 10 client.4877.objecter maybe_request_map subscribing (onetime) to next osd map 2012-07-09 21:37:18.158759 7fda9759e700 1 -- 10.214.131.3:0/256001452 --> 10.214.131.4:6803/31761 -- ping v1 -- ?+0 0x7fdbc82f9d30 con 0x7fdd087210e0 2012-07-09 21:37:18.158874 7fdd20397700 -1 -- 10.214.131.3:0/256001452 >> 10.214.131.4:6803/31761 pipe(0x7fdd0871fcb0 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
#7 Updated by Sage Weil about 11 years ago
- Status changed from New to In Progress
#8 Updated by Sage Weil about 11 years ago
- Assignee set to Sage Weil
#9 Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Resolved