Project

General

Profile

Actions

Bug #14457

closed

tcmalloc oom bug

Added by Yuri Weinstein over 8 years ago. Updated about 8 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/infernalis-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2016-01-20_14:48:08-upgrade:infernalis-x-jewel-distro-basic-vps/
Jobs: 25541, 35542
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-20_14:48:08-upgrade:infernalis-x-jewel-distro-basic-vps/35541/teuthology.log

2016-01-20T18:17:07.358 INFO:tasks.ceph:Waiting until ceph osds are all up...
2016-01-20T18:17:07.358 INFO:teuthology.orchestra.run.vpm122:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd dump --format=json'
2016-01-20T18:17:08.771 INFO:tasks.ceph.mon.b.vpm122.stdout:starting mon.b rank 1 at 172.21.2.122:6790/0 mon_data /var/lib/ceph/mon/ceph-b fsid 3fda6002-eea7-4a91-a94b-63e0a1a801c0
2016-01-20T18:17:09.273 INFO:teuthology.misc.health.vpm122.stderr:2016-01-21 02:17:09.269914 7f6a706a8700 -1 asok(0x7f6a68000f80) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.23496.asok': (13) Permission denied

Per IRC chat, suspect either missing sudo or changed ceph cli behavior.
Dan's assessment:

so /var/run/ceph is owned by ceph.ceph and is 770 dmick @ 10:39
it is indeed the ceph osd dump command that's failing to create a socket in /var/run/ceph
/var/run/ceph is also that ownership/permission on one of the LRC hosts, so I doubt it's new
but...it certainly used to be the case that you could run as nonroot as long as you had access to the ceph.conf and keyring files 10:41
hm
and it still is the case there
but that ceph command doesn't try to open a client-admin socket 10:43
so perhaps that's new behavior

interestingly, that message is apparently just a warning, and not fatal
and this would be librados

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #13522: Apparent deadlock between tcmalloc getting a stacktrace and dlopen allocating memoryResolved10/18/2015

Actions
Actions #1

Updated by Yuri Weinstein over 8 years ago

  • Description updated (diff)
Actions #2

Updated by Yuri Weinstein over 8 years ago

  • Description updated (diff)
Actions #3

Updated by Yuri Weinstein over 8 years ago

Run: http://pulpito.ceph.com/teuthology-2016-01-21_13:17:25-upgrade:infernalis-x-jewel-distro-basic-vps/

Further debugging by Josh (using gdb on osd 2 process) revealed that osd 2 process was stuck in tcmalloc in several threads.

so the action there is probably to get the updated tcmalloc installed on vps machines, so we can set the tcmalloc environment variable to mitigate this problem
Actions #4

Updated by Yuri Weinstein over 8 years ago

  • Project changed from Ceph to teuthology
Actions #5

Updated by Samuel Just over 8 years ago

  • Priority changed from Normal to Urgent
Actions #6

Updated by Dan Mick over 8 years ago

What is this updated tcmalloc you talk about? Do we need something later than the distro package?

Actions #7

Updated by Yuri Weinstein over 8 years ago

  • Project changed from teuthology to Ceph
Actions #8

Updated by Samuel Just about 8 years ago

  • Subject changed from "failed: AdminSocket::bind_and_listen..Permission denied" in upgrade:infernalis-x-jewel-distro-basic-vps to tcmalloc oom bug
Actions #10

Updated by Sage Weil about 8 years ago

  • Status changed from New to Need More Info
  • Priority changed from Urgent to High

waiting for VPS with more memory to see if this is low memory related.

Actions #11

Updated by Josh Durgin about 8 years ago

  • Status changed from Need More Info to Duplicate
Actions #12

Updated by Sage Weil about 8 years ago

  • Related to Bug #13522: Apparent deadlock between tcmalloc getting a stacktrace and dlopen allocating memory added
Actions

Also available in: Atom PDF