Actions
Bug #8320
closedheartbeat timeouts too low for vps machines
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
There are several of those in this suite/run
And valgrind does not seem to be enabled in the orig.config.yaml
archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-08_19:33:19-upgrade:dumpling-x:parallel-firefly---basic-vps/244048 branch: firefly description: upgrade/dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml 2-workload/rados_loadgenbig.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-upgrade/client.yaml 5-final-workload/rbd_cls.yaml distros/rhel_6.4.yaml} email: null job_id: '244048' last_in_suite: false machine_type: vps name: teuthology-2014-05-08_19:33:19-upgrade:dumpling-x:parallel-firefly---basic-vps nuke-on-error: true os_type: rhel os_version: '6.4' overrides: admin_socket: branch: firefly ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - scrub mismatch - ScrubResult sha1: db8873b69c73b40110bf1512c114e4a0395671ab ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: db8873b69c73b40110bf1512c114e4a0395671ab s3tests: branch: master workunit: sha1: db8873b69c73b40110bf1512c114e4a0395671ab owner: scheduled_teuthology@teuthology roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 - client.1 suite: upgrade:dumpling-x:parallel targets: ubuntu@vpm115.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAszawQx1jNRmtPq2Gj3fqH1SfmgAYOBuSowujH1lTj1aXGprZh+mVaxty2o6gutS5bAbK7PwNXiSwcvR7dB9OwbioUcTMYCjOd1t5+I68Q9iYMP0bAH5DPr94LkVSkbLyI5sJEEjGs/fS0YhgTP79w7IQW8YeGuhst+P/BiV4+jbFqAUEgxqakfGhE4PgyN+GpAweRubGkIp1deDyKfhQJHcuuoAVey9MDRe9/4WmCYKcU3DQjMCKgUoYYV8Czdulmo883MHKTfS7v1aN6KEjOg6As9rsBb79LxYYtZkjjB6pV8WquPayaeaBcQu5zk0WA9ask0vbgasVlbkAgt8zWQ== ubuntu@vpm116.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAmdfTdH1+YVKSWTzmjeCdoQaPsdWO57KaVPUp+bz7HsB5YZcxhp1TJ8sPRHfcOCUlHF4SOMtZPTWGmYAPiZHchI78utbaNQtyY6jY64QXRXUag+j+FCoGk7fYlHSX9grDe6gY71I49ueVF+691ii4k3uYE+cCLP6DuOaXlFwo94zM0anNag9eyNdxS6uzm6/e4vUIUchUUUojZRUPLdBQNIw4bQNpG2K4n+mCqkO1NGlVgchXzkGWrCImguc+DUnXHGwsVEjwdf564x5fqpSv73pRZe3GAVQIbV3HAL0BnHefG3Dzdx2iZZQ5UleHvW1PpQWiXZCpniVSQB/crSUOiw== ubuntu@vpm117.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAvG0Msn4lUl1VCBvILd9Kto1yRa98FSoMUh3wH9pBeQVEhVnWLJzfn03zcmI4n7BvyJRnabC5VvVlt30BPNrQlHBWVZGGSRjxYh6QYvLO+NVtj+8ooJJ4SckdZ+hyUlTKYYvhq7cy/p4K9KmYfX6drdghcH7vdO0GcizyAr6BwF6779tBs6dZ+JUo8efFRc+pmNJfp1OJxpyVouijlV2FqtLCmz79G7/poZZXgllOHSONgOJq2zzxMf2dxEbYNa/1cn1i3iSes83J6xgn/oH8vlKpvgDbsiX3hTL21QQamK3bsX4JITtE6qYyrereWMtwCL4L5+PP9qIk1HZLRIEVwQ== tasks: - internal.lock_machines: - 3 - vps - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: cuttlefish - print: '**** done cuttlefish install' - ceph: fs: xfs - print: '**** done ceph' - install.upgrade: all: branch: dumpling - ceph.restart: null - parallel: - workload - upgrade-sequence - print: '**** done parallel' - install.upgrade: client.0: null - print: '**** done install.upgrade' - workunit: clients: client.1: - cls/test_cls_rbd.sh teuthology_branch: firefly upgrade-sequence: sequential: - install.upgrade: mon.a: null mon.b: null - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - sleep: duration: 60 - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - sleep: duration: 60 - ceph.restart: - mon.c - sleep: duration: 60 - ceph.restart: - osd.0 - sleep: duration: 60 - ceph.restart: - osd.1 - sleep: duration: 60 - ceph.restart: - osd.2 - sleep: duration: 60 - ceph.restart: - osd.3 - sleep: duration: 60 - ceph.restart: - mds.a verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.19330 workload: sequential: - workunit: branch: dumpling clients: client.0: - rados/load-gen-big.sh
description: upgrade/dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml 2-workload/rados_loadgenbig.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-upgrade/client.yaml 5-final-workload/rbd_cls.yaml distros/rhel_6.4.yaml} duration: 3378.6129529476166 failure_reason: '"2014-05-09 03:27:49.176216 osd.0 10.214.138.182:6808/6617 305 : [WRN] map e11 wrongly marked me down" in cluster log' flavor: basic owner: scheduled_teuthology@teuthology success: false
Updated by Sage Weil almost 10 years ago
- Status changed from New to 12
- Source changed from other to Q/A
From the logs it looks like the OSD just stalls and does nothing. I'm chalking it up to limited ram on the VPS nodes and swapping.
Updated by Sage Weil almost 10 years ago
- Subject changed from "[WRN] map e11 wrongly marked me down" in upgrade:dumpling-x:parallel-firefly---basic-vps suite to heartbeat timeouts too low for vps machines
- Priority changed from Normal to Urgent
Make this increase teh timeouts when running on vps.
Updated by Sage Weil almost 10 years ago
- Status changed from 12 to Resolved
added ~teuthology/vps.yaml and added it as an arg for all the vps scheduled suites. sets the heartbeat grace to 40s (from default of 20s)
Actions