Bug #15052
closedRebalance vps instances to allocate more RAM
Added by Zack Cerza about 8 years ago. Updated almost 8 years ago.
0%
Description
I accidentally deleted the original ticket (#14985) so I'll have to recreate this from my email history. The original ticket was filed by Yuri with this description:
VPS are not used and trustworthy because of low memory assignment.
We need to decide if number of VM's per host should be reduced in order to address this issues
Updated by Zack Cerza about 8 years ago
- Category set to Infrastructure Service
3/8/16
Issue #14985 has been updated by Tamilarasi muthamizhan.
hi Yuri, brought this topic up in the weekly leads meeting and everyone agreed with the decision to go with reconfiguring the VM host to create a few VMs than now[like having to create 2 instead of 4], so as to increase the memory on the VMs and make it more usable for our nightlies testing.
Updated by Zack Cerza about 8 years ago
3/10/16
Issue #14985 has been updated by Yuri Weinstein.
Note: list of jobs with 'bad_alloc' error on vps
teuthology@teuthology:/a$ find teuthology-2016-0*-vps -name teuthology.log | xargs grep 'bad_alloc' teuthology-2016-01-11_17:10:02-upgrade:infernalis-x-jewel-distro-basic-vps/24770/teuthology.log:2016-01-11T23:08:37.017 INFO:tasks.ceph.mon.c.vpm193.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-01-11_17:10:02-upgrade:infernalis-x-jewel-distro-basic-vps/24770/teuthology.log:2016-01-11T23:08:37.018 INFO:tasks.ceph.mon.c.vpm193.stderr: what(): buffer::bad_alloc teuthology-2016-01-27_18:49:02-rados-hammer-distro-basic-vps/46868/teuthology.log:2016-01-27T20:01:17.543 INFO:teuthology.orchestra.run.vpm050.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-01-28_09:00:02-rados-hammer-distro-basic-vps/47871/teuthology.log:2016-01-28T09:48:02.248 INFO:teuthology.orchestra.run.vpm080.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-01-28_09:00:02-rados-hammer-distro-basic-vps/47871/teuthology.log:2016-01-28T09:48:02.249 INFO:teuthology.orchestra.run.vpm080.stderr: what(): buffer::bad_alloc teuthology-2016-01-28_15:17:14-rados-hammer-distro-basic-vps/48402/teuthology.log:2016-01-28T16:20:26.224 INFO:teuthology.orchestra.run.vpm088.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-01-28_15:17:14-rados-hammer-distro-basic-vps/48402/teuthology.log:2016-01-28T16:20:26.224 INFO:teuthology.orchestra.run.vpm088.stderr: what(): buffer::bad_alloc teuthology-2016-01-30_09:00:02-rados-hammer-distro-basic-vps/51698/teuthology.log:2016-01-30T09:42:05.372 INFO:teuthology.orchestra.run.vpm093.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-01-30_09:00:02-rados-hammer-distro-basic-vps/51698/teuthology.log:2016-01-30T09:42:05.379 INFO:teuthology.orchestra.run.vpm093.stderr: what(): buffer::bad_alloc teuthology-2016-01-31_09:00:02-rados-hammer-distro-basic-vps/53523/teuthology.log:2016-01-31T09:32:18.046 INFO:teuthology.orchestra.run.vpm045.stderr:ceph::buffer::bad_alloc' teuthology-2016-01-31_09:00:02-rados-hammer-distro-basic-vps/53523/teuthology.log:2016-01-31T09:32:18.048 INFO:teuthology.orchestra.run.vpm045.stderr: what(): buffer::bad_alloc teuthology-2016-02-01_09:00:02-rados-hammer-distro-basic-vps/92/teuthology.log:2016-02-01T10:43:34.378 INFO:teuthology.orchestra.run.vpm080.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-01_09:00:02-rados-hammer-distro-basic-vps/92/teuthology.log:2016-02-01T10:43:34.459 INFO:teuthology.orchestra.run.vpm080.stderr: what(): buffer::bad_alloc teuthology-2016-02-02_09:00:01-rados-hammer-distro-basic-vps/2514/teuthology.log:2016-02-02T09:45:58.118 INFO:tasks.ceph.osd.5.vpm109.stderr:terminate called after throwing an instance of 'std::bad_alloc' teuthology-2016-02-02_09:00:01-rados-hammer-distro-basic-vps/2514/teuthology.log:2016-02-02T09:45:58.125 INFO:tasks.ceph.osd.5.vpm109.stderr: what(): std::bad_alloc teuthology-2016-02-02_09:00:01-rados-hammer-distro-basic-vps/2596/teuthology.log:2016-02-02T10:33:54.267 INFO:teuthology.orchestra.run.vpm106.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-02_09:00:01-rados-hammer-distro-basic-vps/2596/teuthology.log:2016-02-02T10:33:55.310 INFO:teuthology.orchestra.run.vpm106.stderr: what(): buffer::bad_alloc teuthology-2016-02-03_09:00:02-rados-hammer-distro-basic-vps/4506/teuthology.log:2016-02-03T10:43:43.988 INFO:teuthology.orchestra.run.vpm131.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-03_09:00:02-rados-hammer-distro-basic-vps/4506/teuthology.log:2016-02-03T10:43:43.989 INFO:teuthology.orchestra.run.vpm131.stderr: what(): buffer::bad_alloc teuthology-2016-02-05_09:00:01-rados-hammer-distro-basic-vps/7942/teuthology.log:2016-02-05T10:34:39.397 INFO:teuthology.orchestra.run.vpm096.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc*** Caught signal (Aborted) ** teuthology-2016-02-05_09:00:01-rados-hammer-distro-basic-vps/7942/teuthology.log:2016-02-05T10:34:39.450 INFO:teuthology.orchestra.run.vpm096.stderr: what(): buffer::bad_alloc teuthology-2016-02-06_09:00:02-rados-hammer-distro-basic-vps/9692/teuthology.log:2016-02-06T10:42:44.444 INFO:teuthology.orchestra.run.vpm176.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-06_09:00:02-rados-hammer-distro-basic-vps/9692/teuthology.log:2016-02-06T10:42:44.539 INFO:teuthology.orchestra.run.vpm176.stderr: what(): buffer::bad_alloc teuthology-2016-02-07_09:00:01-rados-hammer-distro-basic-vps/10177/teuthology.log:2016-02-07T10:44:56.557 INFO:teuthology.orchestra.run.vpm130.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-07_09:00:01-rados-hammer-distro-basic-vps/10177/teuthology.log:2016-02-07T10:44:56.935 INFO:teuthology.orchestra.run.vpm130.stderr: what(): buffer::bad_alloc teuthology-2016-02-08_09:00:02-rados-hammer-distro-basic-vps/136/teuthology.log:2016-02-08T15:45:36.330 INFO:teuthology.orchestra.run.vpm057.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-08_09:00:02-rados-hammer-distro-basic-vps/136/teuthology.log:2016-02-08T15:45:36.332 INFO:teuthology.orchestra.run.vpm057.stderr: what(): buffer::bad_alloc teuthology-2016-02-12_09:00:02-rados-hammer-distro-basic-vps/7433/teuthology.log:2016-02-12T10:31:34.529 INFO:teuthology.orchestra.run.vpm123.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-12_09:00:02-rados-hammer-distro-basic-vps/7433/teuthology.log:2016-02-12T10:31:34.529 INFO:teuthology.orchestra.run.vpm123.stderr: what(): buffer::bad_alloc teuthology-2016-02-13_02:10:10-upgrade:infernalis-x-jewel-distro-basic-vps/8916/teuthology.log:2016-02-13T05:11:50.472 INFO:tasks.ceph.osd.1.vpm077.stderr:terminate called after throwing an instance of 'std::bad_alloc' teuthology-2016-02-13_02:10:10-upgrade:infernalis-x-jewel-distro-basic-vps/8916/teuthology.log:2016-02-13T05:11:50.472 INFO:tasks.ceph.osd.1.vpm077.stderr: what(): std::bad_alloc*** Caught signal (Aborted) ** teuthology-2016-02-13_09:00:02-rados-hammer-distro-basic-vps/9065/teuthology.log:2016-02-13T10:24:43.859 INFO:teuthology.orchestra.run.vpm113.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-13_09:00:02-rados-hammer-distro-basic-vps/9065/teuthology.log:2016-02-13T10:24:43.860 INFO:teuthology.orchestra.run.vpm113.stderr: what(): buffer::bad_alloc teuthology-2016-02-15_09:00:01-rados-hammer-distro-basic-vps/9701/teuthology.log:2016-02-15T10:17:01.052 INFO:teuthology.orchestra.run.vpm100.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-15_09:00:01-rados-hammer-distro-basic-vps/9701/teuthology.log:2016-02-15T10:17:01.052 INFO:teuthology.orchestra.run.vpm100.stderr: what(): buffer::bad_alloc teuthology-2016-02-16_09:00:01-rados-hammer-distro-basic-vps/11997/teuthology.log:2016-02-16T10:15:25.262 INFO:teuthology.orchestra.run.vpm008.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-16_09:00:01-rados-hammer-distro-basic-vps/11997/teuthology.log:2016-02-16T10:15:25.264 INFO:teuthology.orchestra.run.vpm008.stderr: what(): buffer::bad_alloc teuthology-2016-02-17_09:00:02-rados-hammer-distro-basic-vps/13326/teuthology.log:2016-02-17T12:40:32.275 INFO:teuthology.orchestra.run.vpm013.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-17_09:00:02-rados-hammer-distro-basic-vps/13326/teuthology.log:2016-02-17T12:40:32.278 INFO:teuthology.orchestra.run.vpm013.stderr: what(): buffer::bad_alloc teuthology-2016-02-19_09:00:02-rados-hammer-distro-basic-vps/17025/teuthology.log:2016-02-19T10:40:20.704 INFO:teuthology.orchestra.run.vpm033.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-19_09:00:02-rados-hammer-distro-basic-vps/17025/teuthology.log:2016-02-19T10:40:20.705 INFO:teuthology.orchestra.run.vpm033.stderr:buffer::bad_alloc teuthology-2016-02-20_09:00:02-rados-hammer-distro-basic-vps/18557/teuthology.log:2016-02-20T10:29:07.392 INFO:teuthology.orchestra.run.vpm103.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-20_09:00:02-rados-hammer-distro-basic-vps/18557/teuthology.log:2016-02-20T10:29:07.557 INFO:teuthology.orchestra.run.vpm103.stderr: what(): buffer::bad_alloc teuthology-2016-02-22_09:00:02-rados-hammer-distro-basic-vps/20610/teuthology.log:2016-02-22T10:06:52.020 INFO:teuthology.orchestra.run.vpm154.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-22_09:00:02-rados-hammer-distro-basic-vps/20610/teuthology.log:2016-02-22T10:06:52.020 INFO:teuthology.orchestra.run.vpm154.stderr: what(): buffer::bad_alloc teuthology-2016-02-23_09:00:02-rados-hammer-distro-basic-vps/23070/teuthology.log:2016-02-23T10:16:39.817 INFO:teuthology.orchestra.run.vpm077.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-23_09:00:02-rados-hammer-distro-basic-vps/23070/teuthology.log:2016-02-23T10:16:39.818 INFO:teuthology.orchestra.run.vpm077.stderr: what(): buffer::bad_alloc teuthology-2016-02-25_09:00:02-rados-hammer-distro-basic-vps/27069/teuthology.log:2016-02-25T10:20:51.434 INFO:teuthology.orchestra.run.vpm081.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-25_09:00:02-rados-hammer-distro-basic-vps/27069/teuthology.log:2016-02-25T10:20:51.434 INFO:teuthology.orchestra.run.vpm081.stderr: what(): buffer::bad_alloc teuthology-2016-02-26_09:00:01-rados-hammer-distro-basic-vps/29212/teuthology.log:2016-02-26T10:25:02.558 INFO:teuthology.orchestra.run.vpm074.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-26_09:00:01-rados-hammer-distro-basic-vps/29212/teuthology.log:2016-02-26T10:25:02.559 INFO:teuthology.orchestra.run.vpm074.stderr: what(): buffer::bad_alloc teuthology-2016-02-27_09:00:02-rados-hammer-distro-basic-vps/31054/teuthology.log:2016-02-28T23:51:39.974 INFO:teuthology.orchestra.run.vpm137.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-28_09:00:02-rados-hammer-distro-basic-vps/31676/teuthology.log:2016-02-29T01:59:10.164 INFO:teuthology.orchestra.run.vpm148.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-02-28_09:00:02-rados-hammer-distro-basic-vps/31676/teuthology.log:2016-02-29T01:59:10.200 INFO:teuthology.orchestra.run.vpm148.stderr: what(): buffer::bad_alloc teuthology-2016-02-29_09:00:11-rados-hammer-distro-basic-vps/34261/teuthology.log:2016-02-29T14:56:22.072 INFO:teuthology.orchestra.run.vpm032.stderr:ceph::buffer::bad_alloc' teuthology-2016-02-29_09:00:11-rados-hammer-distro-basic-vps/34261/teuthology.log:2016-02-29T14:56:22.072 INFO:teuthology.orchestra.run.vpm032.stderr: what(): buffer::bad_alloc teuthology-2016-03-01_09:00:02-rados-hammer-distro-basic-vps/36219/teuthology.log:2016-03-01T10:44:24.134 INFO:teuthology.orchestra.run.vpm089.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-01_09:00:02-rados-hammer-distro-basic-vps/36219/teuthology.log:2016-03-01T10:44:24.279 INFO:teuthology.orchestra.run.vpm089.stderr: what(): buffer::bad_alloc teuthology-2016-03-02_09:00:12-rados-hammer-distro-basic-vps/38214/teuthology.log:2016-03-02T10:30:41.156 INFO:teuthology.orchestra.run.vpm004.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-02_09:00:12-rados-hammer-distro-basic-vps/38214/teuthology.log:2016-03-02T10:30:41.158 INFO:teuthology.orchestra.run.vpm004.stderr: what(): buffer::bad_alloc teuthology-2016-03-05_09:00:02-rados-hammer-distro-basic-vps/42161/teuthology.log:2016-03-05T10:21:59.479 INFO:teuthology.orchestra.run.vpm195.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-05_09:00:02-rados-hammer-distro-basic-vps/42161/teuthology.log:2016-03-05T10:21:59.483 INFO:teuthology.orchestra.run.vpm195.stderr: what(): buffer::bad_alloc teuthology-2016-03-06_09:00:02-rados-hammer-distro-basic-vps/43351/teuthology.log:2016-03-06T10:26:59.222 INFO:teuthology.orchestra.run.vpm036.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-06_09:00:02-rados-hammer-distro-basic-vps/43351/teuthology.log:2016-03-06T10:26:59.223 INFO:teuthology.orchestra.run.vpm036.stderr: what(): buffer::bad_alloc teuthology-2016-03-07_09:00:01-rados-hammer-distro-basic-vps/44729/teuthology.log:2016-03-07T10:20:43.427 INFO:teuthology.orchestra.run.vpm121.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-07_09:00:01-rados-hammer-distro-basic-vps/44729/teuthology.log:2016-03-07T10:20:43.427 INFO:teuthology.orchestra.run.vpm121.stderr: what(): buffer::bad_alloc teuthology-2016-03-08_09:00:01-rados-hammer-distro-basic-vps/46753/teuthology.log:2016-03-08T10:20:45.652 INFO:teuthology.orchestra.run.vpm108.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-08_09:00:01-rados-hammer-distro-basic-vps/46753/teuthology.log:2016-03-08T10:20:45.652 INFO:teuthology.orchestra.run.vpm108.stderr: what(): buffer::bad_alloc teuthology-2016-03-09_09:00:02-rados-hammer-distro-basic-vps/48794/teuthology.log:2016-03-09T10:20:51.589 INFO:teuthology.orchestra.run.vpm067.stderr:terminate called after throwing an instance of 'ceph::buffer::bad_alloc' teuthology-2016-03-09_09:00:02-rados-hammer-distro-basic-vps/48794/teuthology.log:2016-03-09T10:20:51.676 INFO:teuthology.orchestra.run.vpm067.stderr: what(): buffer::bad_alloc
Updated by Zack Cerza about 8 years ago
3/10/16
Issue #14985 has been updated by Yuri Weinstein.
For testing - this job seems to be reliably hanging on vps
http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-09_15:31:25-rbd-master-distro-basic-vps/49795/teuthology.log
Updated by Zack Cerza about 8 years ago
3/10/16
Issue #14985 has been updated by Yuri Weinstein.
Another good job for low memory testing:
/a/teuthology-2016-03-09_19:14:02-rbd-master-distro-basic-vps/49966/teuthology.log:2016-03-09T20:03:45.487 INFO:teuthology.orchestra.run.vpm148.stderr:ImportError: librados.so.2: failed to map segment from shared object: Cannot allocate memory
Updated by Zack Cerza about 8 years ago
3/10/16
Issue #14985 has been updated by Yuri Weinstein.
Zack Cerza wrote:
Yuri Weinstein wrote:
For testing - this job seems to be reliably hanging on vps
http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-09_15:31:25-rbd-master-distro-basic-vps/49795/teuthology.logI don't see anything about "bad alloc" here - is there an indicator that it's memory-related that you saw?
True Zack, but it's "reliably" hangs, I am testing another job now that hopefully will produce better error.
Updated by Zack Cerza about 8 years ago
3/10/16
Issue #14985 has been updated by Yuri Weinstein.
The job below produces 'ceph::buffer::bad_alloc' error
interactive-on-error: true kernel: kdb: true sha1: distro last_in_suite: false machine_type: vps name: teuthology-2016-03-09_09:00:02-rados-hammer-distro-basic-vps nuke-on-error: true openstack: - machine: cpus: 1 disk: 40 ram: 15000 volumes: count: 0 size: 1 os_type: ubuntu overrides: admin_socket: branch: hammer ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request sha1: 7083829c227403a77fcf35a2376dc02e4c9693c8 ceph-deploy: branch: dev-commit: 7083829c227403a77fcf35a2376dc02e4c9693c8 conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 7083829c227403a77fcf35a2376dc02e4c9693c8 workunit: sha1: 7083829c227403a77fcf35a2376dc02e4c9693c8 owner: scheduled_teuthology@teuthology priority: 1000 roles: - - mon.0 - osd.0 - osd.1 - client.0 sha1: 7083829c227403a77fcf35a2376dc02e4c9693c8 suite: rados suite_branch: hammer suite_path: /var/lib/teuthworker/src/ceph-qa-suite_hammer suite_sha1: 9cbd6a467b43d48f86db66ddd0dba2c91c12d28e tasks: - install: null - exec: client.0: - ceph_test_async_driver - ceph_test_msgr teuthology_branch: master
Updated by Zack Cerza about 8 years ago
I manually enabled sar
and told it to sample every second for these tests. Bear in mind that teuthology.front
uses PST whereas the VMs use UTC.
Testing on a VM with 2GB RAM failed as expected:
http://qa-proxy.ceph.com/teuthology/issue_15052/2gig/teuthology.log
http://qa-proxy.ceph.com/teuthology/issue_15052/2gig/sar.txt
http://qa-proxy.ceph.com/teuthology/issue_15052/2gig/sa10
Testing on a VM with 4GB RAM failed as well:
http://qa-proxy.ceph.com/teuthology/issue_15052/4gig/teuthology.log
http://qa-proxy.ceph.com/teuthology/issue_15052/4gig/sar.txt
http://qa-proxy.ceph.com/teuthology/issue_15052/4gig/sa10
Testing on a VM with 8GB RAM passed:
http://qa-proxy.ceph.com/teuthology/issue_15052/8gig/teuthology.log
http://qa-proxy.ceph.com/teuthology/issue_15052/8gig/sar.txt
http://qa-proxy.ceph.com/teuthology/issue_15052/8gig/sa10
The sa10
files are sar
's binary logs. They can be queried to e.g. show only memory stats via: sar -f /path/to/sa10 -r
Updated by Zack Cerza almost 8 years ago
We've decided to move to 4GB VMs. All even-numbered VMs have been marked down. The teuthology change (just merged) is here:
https://github.com/ceph/teuthology/pull/868
Updated by Zack Cerza almost 8 years ago
- Status changed from In Progress to Resolved
Updated by David Galloway almost 8 years ago
Should we leave this open until we decide what to do with the unused disks?
Updated by David Galloway over 7 years ago
- Related to Bug #17049: "Cannot allocate memory" aka "rbd: error: image still has watchers" in rbd-master-distro-basic-vps added