Project

General

Profile

Bug #13522

Apparent deadlock between tcmalloc getting a stacktrace and dlopen allocating memory

Added by Loic Dachary almost 2 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
10/18/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Related issues

Related to Ceph - Bug #14457: tcmalloc oom bug Duplicate 01/21/2016
Related to Ceph - Bug #16103: ceph-disk: workaround gperftool hang Resolved 06/01/2016

History

#1 Updated by Loic Dachary almost 2 years ago

  • Description updated (diff)

#2 Updated by Loic Dachary almost 2 years ago

  • Description updated (diff)

#3 Updated by Loic Dachary almost 2 years ago

  • Description updated (diff)

#4 Updated by Loic Dachary almost 2 years ago

  • Description updated (diff)

#5 Updated by Loic Dachary almost 2 years ago

  • Status changed from In Progress to Can't reproduce

#6 Updated by Nathan Cutler over 1 year ago

  • Duplicated by Bug #8433: SSHException: Key-exchange timed out waiting for key negotiation added

#7 Updated by Nathan Cutler over 1 year ago

  • Duplicated by deleted (Bug #8433: SSHException: Key-exchange timed out waiting for key negotiation)

#8 Updated by Nathan Cutler over 1 year ago

  • Duplicates Bug #8433: SSHException: Key-exchange timed out waiting for key negotiation added

#9 Updated by Nathan Cutler over 1 year ago

I can reproduce this.

#10 Updated by Loic Dachary over 1 year ago

  • Status changed from Can't reproduce to Verified

#11 Updated by Loic Dachary over 1 year ago

When I looked at the VMs while the test was failing it had plenty of RAM and disk space. I think getting past the re-keying problem reveals another problem. It happens at

  - ceph.restart:
      daemons:
      - mon.b
      wait-for-healthy: false
      wait-for-osds-up: true
  - print: '**** done ceph.restart mon.b 6-next-mon'

which is not supposed to lose an OSD or even restart one. Yet, one of them goes down. Running another test and setting up an alarm to look into the logs while it happens.

ceph-workbench ceph-qa-suite --verbose --simultaneous-jobs 10 --suite upgrade/infernalis-x --suite-branch master --ceph jewel --ceph-git-url https://github.com/ceph/ceph --filter 'upgrade:infernalis-x/stress-split/{0-cluster/start.yaml 1-infernalis-install/infernalis.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml 7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{rbd-python.yaml rgw-swift.yaml snaps-many-objects.yaml} distros/centos_7.2.yaml}'

After ~7h and on the last workload

2016-03-21T15:58:12.906 INFO:teuthology.task.print:**** done swift 9-workload
...
2016-03-21T16:49:44.947 INFO:tasks.rados.rados.0.target167114242166.stderr:
2016-03-21T16:49:44.948 DEBUG:paramiko.transport:[chan 108] EOF received (108)
2016-03-21T16:49:44.949 DEBUG:paramiko.transport:[chan 108] EOF sent (108)
2016-03-21T16:49:44.950 INFO:tasks.ceph.ceph_manager:removing pool_name unique_pool_3
2016-03-21T16:49:44.951 INFO:teuthology.orchestra.run.target167114242167:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_3 unique_pool_3 --yes-i-really-really-mean-it'
2016-03-21T16:49:44.951 DEBUG:paramiko.transport:[chan 5212] Max packet in: 32768 bytes
2016-03-21T16:49:44.953 DEBUG:paramiko.transport:Rekeying (hit 173137 packets, 2100587824 bytes received)
2016-03-21T16:49:44.953 DEBUG:paramiko.transport:[chan 5212] Max packet out: 32768 bytes
2016-03-21T16:49:44.953 DEBUG:paramiko.transport:Secsh channel 5212 opened.
2016-03-21T16:49:44.962 DEBUG:paramiko.transport:Rekeying (hit 173138 packets, 2100587888 bytes received)
2016-03-21T16:49:44.963 DEBUG:paramiko.transport:Rekeying (hit 173139 packets, 2100587936 bytes received)
2016-03-21T16:49:44.963 DEBUG:paramiko.transport:[chan 5212] Sesch channel 5212 request ok
2016-03-21T16:49:44.964 DEBUG:paramiko.transport:[chan 5212] EOF sent (5212)
2016-03-21T16:49:45.184 DEBUG:paramiko.transport:Rekeying (hit 173140 packets, 2100588240 bytes received)
2016-03-21T16:49:45.184 INFO:teuthology.orchestra.run.target167114242167.stderr:2016-03-21 16:49:45.230131 7efd7cf01a40 -1 asok(0x7efd7d61ac60) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.16638.asok': (13) Permission denied
2016-03-21T16:49:45.314 DEBUG:paramiko.transport:Rekeying (hit 173141 packets, 2100588336 bytes received)
2016-03-21T16:49:45.315 INFO:teuthology.orchestra.run.target167114242167.stdout:successfully deleted pool unique_pool_3
2016-03-21T16:49:45.316 DEBUG:paramiko.transport:Rekeying (hit 173142 packets, 2100588416 bytes received)
2016-03-21T16:49:45.316 DEBUG:paramiko.transport:Rekeying (hit 173143 packets, 2100588464 bytes received)
2016-03-21T16:49:45.316 DEBUG:paramiko.transport:[chan 5212] EOF received (5212)
2016-03-21T16:49:45.317 DEBUG:paramiko.transport:Rekeying (hit 173144 packets, 2100588512 bytes received)
2016-03-21T16:49:45.317 DEBUG:teuthology.run_tasks:Unwinding manager swift
2016-03-21T16:49:45.336 DEBUG:teuthology.run_tasks:Unwinding manager rgw
2016-03-21T16:49:45.350 INFO:tasks.rgw:Stopping apache...
2016-03-21T16:49:45.350 DEBUG:paramiko.transport:[chan 98] EOF sent (98)
2016-03-21T16:49:45.361 DEBUG:paramiko.transport:[chan 98] EOF received (98)
2016-03-21T16:49:45.362 INFO:teuthology.misc:Shutting down rgw daemons...
2016-03-21T16:49:45.363 DEBUG:paramiko.transport:[chan 95] EOF sent (95)
2016-03-21T16:49:45.363 DEBUG:tasks.rgw.client.0:waiting for process to exit
2016-03-21T16:49:45.405 INFO:tasks.rgw.client.0.target167114242166.stdout:2016-03-21 16:49:45.452149 7f5d66993880 -1 shutting down
2016-03-21T16:49:45.809 DEBUG:paramiko.transport:[chan 95] EOF received (95)
2016-03-21T16:49:46.819 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:47.828 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:48.837 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:49.846 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:50.855 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:51.364 INFO:tasks.rgw.client.0:Stopped
2016-03-21T16:49:51.365 INFO:teuthology.orchestra.run.target167114242166:Running: 'rm -f /home/ubuntu/cephtest/rgw.opslog.client.0.sock'
2016-03-21T16:49:51.365 DEBUG:paramiko.transport:[chan 109] Max packet in: 32768 bytes
2016-03-21T16:49:51.366 DEBUG:paramiko.transport:[chan 109] Max packet out: 32768 bytes
2016-03-21T16:49:51.367 DEBUG:paramiko.transport:Secsh channel 109 opened.
2016-03-21T16:49:51.373 DEBUG:paramiko.transport:[chan 109] Sesch channel 109 request ok
2016-03-21T16:49:51.373 DEBUG:paramiko.transport:[chan 109] EOF sent (109)
2016-03-21T16:49:51.444 DEBUG:paramiko.transport:[chan 109] EOF received (109)
2016-03-21T16:49:51.445 INFO:tasks.rgw:Removing apache config...
2016-03-21T16:49:51.446 INFO:teuthology.orchestra.run.target167114242166:Running: 'rm -f /home/ubuntu/cephtest/apache/apache.client.0.conf && rm -f /home/ubuntu/cephtest/apache/htdocs.client.0/rgw.fcgi'
2016-03-21T16:49:51.446 DEBUG:paramiko.transport:[chan 110] Max packet in: 32768 bytes
2016-03-21T16:49:51.447 DEBUG:paramiko.transport:[chan 110] Max packet out: 32768 bytes
2016-03-21T16:49:51.448 DEBUG:paramiko.transport:Secsh channel 110 opened.
2016-03-21T16:49:51.453 DEBUG:paramiko.transport:[chan 110] Sesch channel 110 request ok
2016-03-21T16:49:51.453 DEBUG:paramiko.transport:[chan 110] EOF sent (110)
2016-03-21T16:49:51.515 DEBUG:paramiko.transport:[chan 110] EOF received (110)
2016-03-21T16:49:51.516 INFO:tasks.rgw:Cleaning up apache directories...
2016-03-21T16:49:51.516 INFO:teuthology.orchestra.run.target167114242166:Running: 'rm -rf /home/ubuntu/cephtest/apache/tmp.client.0 && rmdir /home/ubuntu/cephtest/apache/htdocs.client.0'
2016-03-21T16:49:51.517 DEBUG:paramiko.transport:[chan 111] Max packet in: 32768 bytes
2016-03-21T16:49:51.518 DEBUG:paramiko.transport:[chan 111] Max packet out: 32768 bytes
2016-03-21T16:49:51.519 DEBUG:paramiko.transport:Secsh channel 111 opened.
2016-03-21T16:49:51.523 DEBUG:paramiko.transport:[chan 111] Sesch channel 111 request ok
2016-03-21T16:49:51.523 DEBUG:paramiko.transport:[chan 111] EOF sent (111)
2016-03-21T16:49:51.581 DEBUG:paramiko.transport:[chan 111] EOF received (111)
2016-03-21T16:49:51.583 INFO:teuthology.orchestra.run.target167114242166:Running: 'rmdir /home/ubuntu/cephtest/apache'
2016-03-21T16:49:51.583 DEBUG:paramiko.transport:[chan 112] Max packet in: 32768 bytes
2016-03-21T16:49:51.584 DEBUG:paramiko.transport:[chan 112] Max packet out: 32768 bytes
2016-03-21T16:49:51.585 DEBUG:paramiko.transport:Secsh channel 112 opened.
2016-03-21T16:49:51.590 DEBUG:paramiko.transport:[chan 112] Sesch channel 112 request ok
2016-03-21T16:49:51.590 DEBUG:paramiko.transport:[chan 112] EOF sent (112)
2016-03-21T16:49:51.645 DEBUG:paramiko.transport:[chan 112] EOF received (112)
2016-03-21T16:49:51.646 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2016-03-21T16:49:51.665 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2016-03-21T16:49:51.679 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2016-03-21T16:49:51.693 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2016-03-21T16:49:51.708 INFO:tasks.thrashosds:joining thrashosds
2016-03-21T16:49:52.615 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T16:49:52.632 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
...
2016-03-21T17:32:32.622 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 
2016-03-21T17:32:32.720 DEBUG:paramiko.transport:Sending global request "keepalive@lag.net" 

When on the machine running osd.5

[ubuntu@target167114242169 ~]$ sudo ceph osd tree
ID WEIGHT  TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 6.00000 root default                                             
-3 6.00000     rack localrack                                       
-2 6.00000         host localhost                                   
 0 1.00000             osd.0           up  0.27551          0.56276 
 1 1.00000             osd.1           up        0                0 
 2 1.00000             osd.2           up  1.00000          1.00000 
 3 1.00000             osd.3           up  1.00000          0.32735 
 4 1.00000             osd.4           up  1.00000          0.12149 
 5 1.00000             osd.5         down        0          1.00000 
[ubuntu@target167114242169 ~]$ ps fauwwwx
...
root     15035  0.0  0.0 142860  4668 ?        Ss   11:58   0:00  \_ sshd: ubuntu [priv]
ubuntu   15040  0.0  0.0 142976  2456 ?        S    11:58   0:09  |   \_ sshd: ubuntu@notty
root      4938  0.0  0.0 195512  2792 ?        Ss   15:42   0:00  |       \_ sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 3
root      4989  0.0  0.0 151604  6112 ?        S    15:42   0:04  |       |   \_ /usr/bin/python /bin/daemon-helper kill ceph-osd -f -i 3
root      4992  5.3  2.1 927168 169208 ?       Ssl  15:42   6:20  |       |       \_ ceph-osd -f -i 3
root     13890  0.0  0.0 195512  2788 ?        Ss   15:51   0:00  |       \_ sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 4
root     13940  0.0  0.0 151604  6112 ?        S    15:51   0:03  |       |   \_ /usr/bin/python /bin/daemon-helper kill ceph-osd -f -i 4
root     13944  4.2  1.9 881596 151844 ?       Ssl  15:51   4:39  |       |       \_ ceph-osd -f -i 4
root     17571  0.0  0.0 195512  2788 ?        Ss   15:55   0:00  |       \_ sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 5
root     17621  0.0  0.0 151604  6108 ?        S    15:55   0:03  |       |   \_ /usr/bin/python /bin/daemon-helper kill ceph-osd -f -i 5
root     17625  0.0  0.2 325260 17400 ?        Ssl  15:55   0:05  |       |       \_ ceph-osd -f -i 5
root     17754  0.0  0.0 195512  2784 ?        Ss   15:55   0:00  |       \_ sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight
root     17779  0.0  0.1 209116 11132 ?        S    15:55   0:00  |           \_ python /bin/ceph --admin-daemon /var/run/ceph/ceph
[ubuntu@target167114242169 ~]$ sudo lsof -p 17625
ceph-osd 17625 root    0r   CHR                1,3        0t0      1028 /dev/null
ceph-osd 17625 root    1w  FIFO                0,8        0t0    366541 pipe
ceph-osd 17625 root    2w  FIFO                0,8        0t0    366542 pipe
ceph-osd 17625 root    3w   REG              253,1 8250976636 109618888 /var/log/ceph/ceph-osd.5.log
ceph-osd 17625 root    4u  IPv4             367639        0t0       TCP *:6800 (LISTEN)
ceph-osd 17625 root    5u  IPv4             367640        0t0       TCP *:acnet (LISTEN)
ceph-osd 17625 root    6u  IPv4             367641        0t0       TCP *:6802 (LISTEN)
ceph-osd 17625 root    7u  IPv4             367642        0t0       TCP *:6803 (LISTEN)
ceph-osd 17625 root    8r  FIFO                0,8        0t0    367643 pipe
ceph-osd 17625 root    9w  FIFO                0,8        0t0    367643 pipe
ceph-osd 17625 root   10u  unix 0xffff880225ac7480        0t0    367644 /var/run/ceph/ceph-osd.5.asok
ceph-osd 17625 root   11uW  REG              253,1         37 109618889 /var/lib/ceph/osd/ceph-5/fsid
ceph-osd 17625 root   12r   DIR              253,1       4096 109618885 /var/lib/ceph/osd/ceph-5
ceph-osd 17625 root   13r   DIR              253,1      24576     11018 /var/lib/ceph/osd/ceph-5/current
ceph-osd 17625 root   14u   REG              253,1          7     11019 /var/lib/ceph/osd/ceph-5/current/commit_op_seq
ceph-osd 17625 root   15w   REG              253,1        319   8646712 /var/lib/ceph/osd/ceph-5/current/omap/LOG
ceph-osd 17625 root   16uW  REG              253,1          0   8646713 /var/lib/ceph/osd/ceph-5/current/omap/LOCK
ceph-osd 17625 root   17u   REG              253,1          0  10240874 /var/lib/ceph/osd/ceph-5/current/omap/000352.log
ceph-osd 17625 root   18u   REG              253,1      65536  10240876 /var/lib/ceph/osd/ceph-5/current/omap/MANIFEST-000350
ceph-osd 17625 root   19u   REG              253,1  104857600 109618892 /var/lib/ceph/osd/ceph-5/journal
ceph-osd 17625 root   20u   REG              253,1        417  17531111 /var/lib/ceph/osd/ceph-5/current/meta/osd\usuperblock__0_23C2FCDE__none
ceph-osd 17625 root   21r   DIR              253,1       4096  75684106 /usr/lib64/rados-classes
ceph-osd 17625 root   22u  unix 0xffff8802287a61c0        0t0    366796 /var/run/ceph/ceph-osd.5.asok
[ubuntu@target167114242169 ~]$ tail /var/log/ceph/ceph-osd.5.log 
2016-03-21 15:55:36.350651 7f9b1a781900 10 _load_class log from /usr/lib64/rados-classes/libcls_log.so
2016-03-21 15:55:36.351549 7f9b1a781900 10 register_class log status 3
2016-03-21 15:55:36.351551 7f9b1a781900 10 register_cxx_method log.add flags 3 0x7f9b0a130380
2016-03-21 15:55:36.351553 7f9b1a781900 10 register_cxx_method log.list flags 1 0x7f9b0a131810
2016-03-21 15:55:36.351554 7f9b1a781900 10 register_cxx_method log.trim flags 3 0x7f9b0a130e50
2016-03-21 15:55:36.351556 7f9b1a781900 10 register_cxx_method log.info flags 1 0x7f9b0a12fc90
2016-03-21 15:55:36.351557 7f9b1a781900 10 _load_class log success
2016-03-21 15:55:36.351559 7f9b1a781900 10 open_all_classes found numops
2016-03-21 15:55:36.351561 7f9b1a781900 10 _get_class adding new class name numops 0x7f9b1ee1cda8
2016-03-21 15:55:36.351562 7f9b1a781900 10 _load_class numops from /usr/lib64/rados-classes/libcls_numops.so
[ubuntu@target167114242169 ~]$ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight
(hangs)
[ubuntu@target167114242169 ~]$ sudo strace -p 17625
Process 17625 attached
restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f9b19af1820, FUTEX_WAIT_PRIVATE, 2, {0, 118126640}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f9b19af1820, FUTEX_WAIT_PRIVATE, 2, {0, 149946576}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f9b19af1820, FUTEX_WAIT_PRIVATE, 2, {0, 183111664}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f9b19af1820, FUTEX_WAIT_PRIVATE, 2, {0, 61817392}) = -1 ETIMEDOUT (Connection timed out)

#12 Updated by Loic Dachary over 1 year ago

ceph-workbench ceph-qa-suite --dry-run --verbose --suite upgrade/infernalis-x --suite-branch master --ceph jewel --ceph-git-url https://github.com/ceph/ceph

#13 Updated by Samuel Just over 1 year ago

(gdb) bt
#0 0x00007f9b198ba573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f9b198ba447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f9b198ab078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#3 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#4 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198bd31b in tc_calloc () from /lib64/libtcmalloc.so.4
#7 0x00007f9b1a58a9ae in _dl_check_map_versions () from /lib64/ld-linux-x86-64.so.2
#8 0x00007f9b1a58dd36 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007f9b1a58d1ab in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007f9b18f0402b in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00007f9b18f0462d in _dlerror_run () from /lib64/libdl.so.2
#14 0x00007f9b18f040c1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#15 0x00007f9b1ab3f3fb in ClassHandler::_load_class(ClassHandler::ClassData*) ()
#16 0x00007f9b1ab3f7ff in ClassHandler::open_class(std::string const&, ClassHandler::ClassData**) ()
#17 0x00007f9b1ab3fad1 in ClassHandler::open_all_classes() ()
#18 0x00007f9b1aadc34c in OSD::init() ()
#19 0x00007f9b1aa5e1c8 in main ()

#14 Updated by Samuel Just over 1 year ago

(gdb) thread apply all bt

Thread 23 (Thread 0x7f9b16e6a700 (LWP 17626)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1afc1dfb in ceph::log::Log::entry() ()
#2 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f9b15ef6700 (LWP 17633)):
#0 0x00007f9b19112a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b09625a in CephContextServiceThread::entry() ()
#2 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f9b156f5700 (LWP 17634)):
#0 0x00007f9b198ba573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f9b198ba447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f9b198ab078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#3 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#4 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198be718 in tc_new () from /lib64/libtcmalloc.so.4
#7 0x00007f9b1b0c71aa in boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>& boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>::operator=<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > > >(boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > >, boost::spirit::classic::action<boost::spirit::classic::strlit<char const*>, boost::function<void (_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>)> > > const&) ()
#8 0x00007f9b1b0cf15e in json_spirit::Json_grammer<json_spirit::Value_impl<json_spirit::Config_map<std::string> >, _gnu_cxx::_normal_iterator<char const*, std::string> >::definition<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::definition(json_spirit::Json_grammer<json_spirit::Value_impl<json_spirit::Config_map<std::string> >, _gnu_cxx::_normal_iterator<char const*, std::string> > const&) ()
#9 0x00007f9b1b0d00b8 in json_spirit::Json_grammer<json_spirit::Value_impl<json_spirit::Config_map<std::string> >, _gnu_cxx::_normal_iterator<char const*, std::string> >::definition<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >& boost::spirit::classic::impl::get_definition<json_spirit::Json_grammer<json_spirit::Value_impl<json_spirit::Config_map<std::string> >, _gnu_cxx::_normal_iterator<char const*, std::string> >, boost::spirit::classic::parser_context<boost::spirit::classic::nil_t>, boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >(boost::spirit::classic::grammar<json_spirit::Json_grammer<json_spirit::Value_impl<json_spirit::Config_map<std::string> >, _gnu_cxx::_normal_iterator<char const*, std::string> >, boost::spirit::classic::parser_context<boost::spirit::classic::nil_t> > const*) ()
#10 0x00007f9b1b0d03a4 in _gnu_cxx::_normal_iterator<char const*, std::string> json_spirit::read_range_or_throw<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > >(_gnu_cxx::_normal_iterator<char const*, std::string>, _gnu_cxx::_normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#11 0x00007f9b1b0d050c in bool json_spirit::read_range<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > >(_gnu_cxx::_normal_iterator<char const*, std::string>&, _gnu_cxx::_normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#12 0x00007f9b1b0bb54d in json_spirit::read(std::string const&, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#13 0x00007f9b1b09d78e in cmdmap_from_json(std::vector<std::string, std::allocator<std::string> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >*, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) ()
#14 0x00007f9b1b06d237 in AdminSocket::do_accept() ()
#15 0x00007f9b1b06f560 in AdminSocket::entry() ()
#16 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f9b1437b700 (LWP 17636)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b063c97 in SafeTimer::timer_thread() ()
#2 0x00007f9b1b06529d in SafeTimerThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f9b13b7a700 (LWP 17637)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f9b13379700 (LWP 17638)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f9b12b78700 (LWP 17639)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f9b12377700 (LWP 17640)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f9b11b76700 (LWP 17641)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f9b11375700 (LWP 17642)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b052eff in SimpleMessenger::reaper_entry() ()
#2 0x00007f9b1b05aacd in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f9b10b74700 (LWP 17643)):
#0 0x00007f9b19114f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9b19110d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007f9b19110c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3 0x00007f9b1b02b638 in Mutex::Lock(bool) ()
#4 0x00007f9b1b0637d2 in SafeTimer::timer_thread() ()
#5 0x00007f9b1b06529d in SafeTimerThread::entry() ()
#6 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f9b10373700 (LWP 17644)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b063c97 in SafeTimer::timer_thread() ()
#2 0x00007f9b1b06529d in SafeTimerThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f9b0fb72700 (LWP 17645)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b063c97 in SafeTimer::timer_thread() ()
#2 0x00007f9b1b06529d in SafeTimerThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f9b0f371700 (LWP 17646)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1adebbbf in WBThrottle::get_next_should_flush(boost::tuples::tuple<ghobject_t, std::shared_ptr<FDCache::FD>, WBThrottle::PendingWB, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>*) ()
#2 0x00007f9b1adec242 in WBThrottle::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f9b0eb70700 (LWP 17647)):
#0 0x00007f9b198ba573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f9b198ba447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f9b198ab078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#3 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#4 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198be718 in tc_new () from /lib64/libtcmalloc.so.4
#7 0x00007f9b1afc13b4 in ceph::log::Log::create_entry(int, int) ()
#8 0x00007f9b1ad236d6 in FileStore::sync_entry() ()
#9 0x00007f9b1ad4c45d in FileStore::SyncThread::entry() ()
#10 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9b0e36f700 (LWP 17648)):
#0 0x00007f9b19114f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9b19110d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00007f9b19110c3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f9b1a57b029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f9b198ba849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198bb0be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198ac314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4
#7 0x00007f9b198ac633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4
#8 0x00007f9b198aaf64 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#9 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#10 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#11 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#12 0x00007f9b198be718 in tc_new () from /lib64/libtcmalloc.so.4
#13 0x00007f9b1afc13b4 in ceph::log::Log::create_entry(int, int) ()
#14 0x00007f9b1af0c9f7 in FileJournal::do_write(ceph::buffer::list&) ()
#15 0x00007f9b1af0e5dd in FileJournal::write_thread_entry() ()
#16 0x00007f9b1ad442bd in FileJournal::Writer::entry() ()
#17 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f9b0db6e700 (LWP 17649)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1afa9bbc in Finisher::finisher_thread_entry() ()
#2 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f9b0d36d700 (LWP 17650)):
#0 0x00007f9b19112a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b06b2b2 in ThreadPool::worker(ThreadPool::WorkThread*) ()
#2 0x00007f9b1b06c530 in ThreadPool::WorkThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f9b0cb6c700 (LWP 17651)):
#0 0x00007f9b19112a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b06b2b2 in ThreadPool::worker(ThreadPool::WorkThread*) ()
#2 0x00007f9b1b06c530 in ThreadPool::WorkThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f9b0c36b700 (LWP 17652)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1afa9bbc in Finisher::finisher_thread_entry() ()
#2 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f9b0bb6a700 (LWP 17653)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1afa9bbc in Finisher::finisher_thread_entry() ()
#2 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f9b0b369700 (LWP 17654)):
#0 0x00007f9b191126d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9b1b063c97 in SafeTimer::timer_thread() ()
#2 0x00007f9b1b06529d in SafeTimerThread::entry() ()
#3 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9b179b528d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f9b1a781900 (LWP 17625)):
#0 0x00007f9b198ba573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f9b198ba447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f9b198ab078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#3 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#4 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
---Type <return> to continue, or q <return> to quit---
#5 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198bd31b in tc_calloc () from /lib64/libtcmalloc.so.4
#7 0x00007f9b1a58a9ae in _dl_check_map_versions () from /lib64/ld-linux-x86-64.so.2
#8 0x00007f9b1a58dd36 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007f9b1a58d1ab in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007f9b18f0402b in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00007f9b18f0462d in _dlerror_run () from /lib64/libdl.so.2
#14 0x00007f9b18f040c1 in dlopen@@GLIBC_2.2.5

#15 Updated by Samuel Just over 1 year ago

Thread 8 (Thread 0x7f9b0e36f700 (LWP 17648)):
#0 0x00007f9b19114f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9b19110d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00007f9b19110c3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f9b1a57b029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f9b198ba849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198bb0be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198ac314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4
#7 0x00007f9b198ac633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4
#8 0x00007f9b198aaf64 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#9 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#10 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#11 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#12 0x00007f9b198be718 in tc_new () from /lib64/libtcmalloc.so.4
#13 0x00007f9b1afc13b4 in ceph::log::Log::create_entry(int, int) ()
#14 0x00007f9b1af0c9f7 in FileJournal::do_write(ceph::buffer::list&) ()
#15 0x00007f9b1af0e5dd in FileJournal::write_thread_entry() ()
#16 0x00007f9b1ad442bd in FileJournal::Writer::entry() ()
#17 0x00007f9b1910edc5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f9b179b528d in clone () from /lib64/libc.so.6

So it failed to allocate a logging message entry?

#16 Updated by Samuel Just over 1 year ago

http://osxr.org:8080/glibc/source/elf/dl-tls.c?v=glibc-2.17#0696

#0 0x00007f9b198ba573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f9b198ba447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f9b198ab078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4
#3 0x00007f9b198ab148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#4 0x00007f9b198ab1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198ae235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198bd31b in tc_calloc () from /lib64/libtcmalloc.so.4
#7 0x00007f9b1a58a9ae in _dl_check_map_versions () from /lib64/ld-linux-x86-64.so.2
#8 0x00007f9b1a58dd36 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007f9b1a58d1ab in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007f9b18f0402b in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007f9b1a5891b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00007f9b18f0462d in _dlerror_run () from /lib64/libdl.so.2
#14 0x00007f9b18f040c1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#15 0x00007f9b1ab3f3fb in ClassHandler::_load_class(ClassHandler::ClassData*) ()
#16 0x00007f9b1ab3f7ff in ClassHandler::open_class(std::string const&, ClassHandler::ClassData**) ()
#17 0x00007f9b1ab3fad1 in ClassHandler::open_all_classes() ()
#18 0x00007f9b1aadc34c in OSD::init() ()
#19 0x00007f9b1aa5e1c8 in main ()

Suspicious, tls_get_addr_tail has special handling for when it's called concurrently with dlopen.

#17 Updated by Samuel Just over 1 year ago

glibc is 2.17

#18 Updated by Samuel Just over 1 year ago

This looks like a circular lock problem between glibc (dlopen trying to call into tcmalloc) and tcmalloc (calling into glibc via libunwind and blocking on state held by dlopen in tls_get_addr_tail?).

#19 Updated by Samuel Just over 1 year ago

https://github.com/gperftools/gperftools/releases

I guess in the new version of gperftools they have an --enable-emergency-malloc configure flag.

#20 Updated by Samuel Just over 1 year ago

Hmm, more information.

#0 0x00007f9b19114f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9b19110d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00007f9b19110c3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f9b1a57b029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f9b198ba849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198bb0be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198ac314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4
#7 0x00007f9b198ac633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4

It looks like a crash, but the call to GetStackTrace is in page_heap.cc::RecordGrowth, which appears to be informational, not a crash (it's called in the normal path of PageHeap::GrowHeap). So, tcmalloc here is just fine. It continues to look like a deadlock between threads 8 and 1. Probably time to get a libc master on this.

#21 Updated by Samuel Just over 1 year ago

  • Project changed from teuthology to Ceph
  • Subject changed from openstack: SSHException: Key-exchange timed out to Apparent deadlock between tcmalloc getting a stacktrace and dlopen allocating memory

#22 Updated by Loic Dachary over 1 year ago

  • Description updated (diff)

#23 Updated by Carlos O'Donell over 1 year ago

Samuel Just wrote:

Hmm, more information.

#0 0x00007f9b19114f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9b19110d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00007f9b19110c3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f9b1a57b029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f9b198ba849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4
#5 0x00007f9b198bb0be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4
#6 0x00007f9b198ac314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4
#7 0x00007f9b198ac633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4

It looks like a crash, but the call to GetStackTrace is in page_heap.cc::RecordGrowth, which appears to be informational, not a crash (it's called in the normal path of PageHeap::GrowHeap). So, tcmalloc here is just fine. It continues to look like a deadlock between threads 8 and 1. Probably time to get a libc master on this.

I'm an senior upstream glibc developer and project steward and I'll try to provide some detail here and some possible solutions. I am neither a Ceph nor TCMalloc expert, but some friends have asked me to comment here and I'm very happy to do that. The glibc community is here to help :-)

The deadlock in tcmalloc is most likely because tcmalloc fails to initialize all TLS variables which are required for logging, and this creates a deadlock between the dyanmic loader loading a shared library and tcmalloc's own internal spinlocks.

There are many intricate details of how the implementation supports thread local storage, lazy allocation, and dynamic library loading that must all be understood in order to implement a robust and high performance memory allocator. There is no guarantee that all C/C++ code can be run from a malloc given the caller's context. Malloc is used by the entire runtime and may be called in places where it is not safe to dynamically load other libraries (tcmalloc expects this via assert/logging dlopen) or even call other C runtime functions (tcmalloc appears to call anything in the C/C++ runtime). It is sufficiently complex that in upstream we have coined the term "synchronously-reetrant safe" (SR-safe) to refer to those functions which malloc might be able to call safely. We have discussed upstream that libdl's entire API should be SR-safe, to allow interposed mallocs to load libraries that contain helper routines, but nothing further is set in stone. Consider that malloc itself is trying to call back into the same runtime that provides it's own services. Extreme care must be taken.

In this particular case it's a lock ordering issue and tcmalloc needs fixing to avoid this problematic ordering.

At a high level I believe the deadlock is this:

T1
-> dlopen (hold the load lock for shared library loading consistency)
-> new
T8
-> new
-> tcmalloc::CentralFreeList (lock the free list spinlock)
T1
-> tcmalloc:CentralFreeList (wait on free list spinlock)
T8
-> tls_get_addr_tail (initialize TLS variable)
This TLS variable is likely from tcmalloc (lazy initialized) or libunwind.
-> pthread_mutex_lock (wait on dlopen to finish).
TLS lazy initialization is waiting for the in-progress
dlopen which may impact the correctness of the TLS variable placement.

T1 is waiting for the free list spinlock while holding the dlopen lock.
T8 is waiting for the dlopen lock while holding the free list spinlock.

I can't reproduce this because in all of my tests RecordGrowth is called early on in the application execution and thus initializes the required TLS variables before other threads run. Something about your mode of execution here means RecordGrowth is not called early. The race window is quite small between the two racing threads, but given enough intervals you'll hit it.

In summary:

  • In tcmalloc it is unsafe to call dlopen and a memory allocation in parallel for the first time from different threads when the allocation would grow the PageHeap (and call RecordGrowth). Subsequent parallel calls are safe as the lazy TLS initialization, part of the logging, is complete and has no dependencies on a consistent dynamic loader state.

Workarounds:

(1) Initialize TLS variables early.
(1a) Ensure that you load and touch all TLS variables required by the implementation before allowing anything else to proceed e.g. calling the logging early just to exercise the code paths required.
(1b) Figure out why RecordGrowth is not called early under the singular starting thread before other threads are started (large thread cache size?), and find a way to get it called.

(2) Simplify the logging in tcmalloc.
(2a) Don't collect backtraces. Simplify RecordGrowth by removing the backtrace recording.
(2b) Careful with backtraces during error conditions. Simplify the ASSERT macro used in GrowHeap. These will also deadlock.
(2c) Avoid `static __thread int recursive;` in gperftools/src/stacktrace_libunwind-inl.h to avoid the TLS initialization.

I would take this upstream to gperftools to talk about the problem.

In glibc's malloc we avoid these problems by using IE-model TLS variables which need no dyanmic initialization (limited global distribution-wide resource) and by limiting and working with the developers of the libraries we load (gcc's libgcc_s.so has no TLS and is used for backtracing).

In the future glibc is going to try make it easier for users to write custom allocators, particularly by avoiding deferred TLS allocations (doing them at dlopen time). This is desirable because it makes first use of TLS variables async-signal-safe (a problem which we've tried to correct once already), and dlopen can return reasonable errors about out-of-memory conditions (for which there is no standards compliant way to do this when a variable is accessed). We haven't done this yet because it's not easy. The core dynamic loader is at the bottom of a very large software stack and we need to make sure everything keeps working as expected. Until then, you have to understand some intricate details of the runtime APIs you are interposing. Even then we'll likely have a dlopen flag to revert to the old lazy-allocation behaviour, so both must be supported for legacy applications.

#24 Updated by Aliaksei Kandratsenka over 1 year ago

Hi. Upstream gperftools maintainer here.

Indeed it looks like the issue is that thread that does dlopen took some glibc lock and called into tcmalloc and is waiting for one of central free list locks. So far it's fine. While another thread took central free list lock and triggered heap growth sampling as part of requesting more memory from kernel. So it is capturing backtrace and as part of tls usage there tries to take glibc lock taken by first thread.

One quick recommendation is to try to link to libtcmalloc_minimal rather than libtcmalloc. The former is just malloc, and the later adds support for malloc sampling, heap growth sampling, heap checker, heap profiler etc. Only full malloc ever captures backtraces. So if you only need fast malloc and nothing else, just link to tcmalloc_minimal, it is a tiny bit faster too. Note that, with exception of this seemingly easily fixable bug our stack trace capturing in libtcmalloc is believed to be safe (but stack trace capturing in our cpu profiler can still occasionally deadlock for reasons unrelated to this ticket).

Looks like indeed the issue is that 'recursive' variable isn't initial-exec as rest of our tls variables. So I agree with Carlos opinion above. This is easy to fix. Created ticket at: https://github.com/gperftools/gperftools/issues/786

Also if you have any issues with tcmalloc, feel free to escalate to us.

#25 Updated by Sage Weil over 1 year ago

  • Status changed from Verified to Pending Upstream
  • Priority changed from Normal to Urgent

#26 Updated by Yuri Weinstein over 1 year ago

Possibly the same problem:
http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-11_02:10:01-upgrade:hammer-x-jewel-distro-basic-vps/120647/teuthology.log

2016-04-11T04:06:13.432 INFO:tasks.ceph.mon.b:Started
2016-04-11T04:06:13.433 INFO:tasks.ceph:Waiting until ceph osds are all up...
2016-04-11T04:06:13.433 INFO:teuthology.orchestra.run.vpm100:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd dump --format=json'
2016-04-11T04:06:14.671 INFO:tasks.ceph.mon.b.vpm100.stdout:starting mon.b rank 1 at 172.21.2.100:6790/0 mon_data /var/lib/ceph/mon/ceph-b fsid 7afdb138-0cd5-44be-bfdd-49823a70e523
2016-04-11T04:06:15.181 INFO:teuthology.misc.health.vpm100.stderr:2016-04-11 11:06:15.180668 7fde3fbe1700 -1 asok(0x7fde38001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.10889.asok': (13) Permission denied
2016-04-11T04:06:15.345 DEBUG:teuthology.misc:4 of 6 OSDs are up

#27 Updated by Sage Weil over 1 year ago

#28 Updated by Haomai Wang over 1 year ago

We recently hit this bug frequently on centos7. It looks like ceph qa lab met this problem rarely, it may related to tcmalloc or libunwind version?

#29 Updated by Loic Dachary over 1 year ago

  • Assignee deleted (Loic Dachary)

#30 Updated by Haomai Wang over 1 year ago

After use with tcmalloc_minimal, no deadlock happened again.

#31 Updated by Sage Weil about 1 year ago

hit this on smithi015 (centos7). gperftools-libs-2.4-7.el7.x86_64

#32 Updated by Ken Dreyer about 1 year ago

I've opened https://bugzilla.redhat.com/1339710 to request that the RHEL 7 gperftools maintainers take that patch ( https://github.com/gperftools/gperftools/commit/7852eeb75b9375cf52a7da01be044da6e915dd08 ) in their downstream package.

#33 Updated by Loic Dachary about 1 year ago

Daniel reports this happens fairly frequently on RHEL VMs. Although we could timeout ceph-osd mkfs and retry, I'm not sure how to assert that it took too long because of this bug or because of something else. Is it reasonable to assume that ceph-osd mkfs can only timeout because of this bug ? At least until https://bugzilla.redhat.com/show_bug.cgi?id=1339710 is resolved.

#34 Updated by Loic Dachary about 1 year ago

When ceph-osd --mkfs blocks because of this, https://github.com/ceph/ceph/pull/9343 may be an acceptable workaround

#35 Updated by Loic Dachary about 1 year ago

  • Related to Bug #16103: ceph-disk: workaround gperftool hang added

#36 Updated by Ken Dreyer 9 months ago

  • Needs Doc set to No

RHBZ #1339710 is resolved in RHEL 7.3, and CentOS 7.3 (in progress) will have the updated gperftools as well.

#37 Updated by Sage Weil 9 months ago

This appears to trigger very frequently with ceph-objectstore-tool. :(

#38 Updated by Ken Dreyer 9 months ago

David copied CentOS CR's gperftools packages to lab-extras today in order to test. (see ticket #18094)

#39 Updated by Sage Weil 9 months ago

  • Priority changed from Urgent to Immediate

quick way to tell if this is (probably) the root cause:

grep ceph-objectstore-tool teuthology.log
verify that the last machine is centos

#40 Updated by Sage Weil 9 months ago

Just saw this on xenial! oh goodie.

ii  libgoogle-perftools4                      2.4-0ubuntu5                             amd64        libraries for CPU and heap analysis, plus an efficient thread-caching malloc
ii  libtcmalloc-minimal4                      2.4-0ubuntu5                             amd64        efficient thread-caching malloc

#42 Updated by Sage Weil 8 months ago

  • Priority changed from Immediate to Urgent

#43 Updated by Sage Weil 8 months ago

  • Priority changed from Urgent to Immediate

http://pulpito.ceph.com/sage-2016-12-20_03:05:39-rados-wip-sage-testing---basic-smithi/

8 hung jobs out of 236 on my last rados run! :(

xenial and centos both.

#44 Updated by Sage Weil 6 months ago

still see this on xenial: /a/sage-2017-02-14_06:55:13-rados-wip-pg-split-interval---basic-smithi/813242

#45 Updated by Nathan Cutler 6 months ago

That this is occuring on xenial is disturbing, because it seems to have gperftools 2.4:

(virtualenv) ubuntu@teuthology:~$ apt search libgoogle-perftools4
Sorting... Done
Full Text Search... Done
libgoogle-perftools4/xenial,now 2.4-0ubuntu5 amd64 [installed,automatic]
  libraries for CPU and heap analysis, plus an efficient thread-caching malloc

#46 Updated by Sage Weil 6 months ago

  • Priority changed from Immediate to Urgent

#47 Updated by Sage Weil 4 months ago

  • Status changed from Pending Upstream to Resolved

#48 Updated by Sage Weil 4 months ago

  • Status changed from Resolved to Pending Upstream

just saw this again on xenial:

Thread 4 (Thread 0x7fb16d3af700 (LWP 93913)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fb170e81e92 in __GI___pthread_mutex_lock (mutex=0x7fb1729bc948 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007fb1727a8eed in tls_get_addr_tail (ti=0x7fb171c83c30, dtv=0x55bd3d224490, the_map=0x7fb1729a6000) at dl-tls.c:765
#3  0x00007fb171a6d459 in ?? () from /usr/lib/libtcmalloc.so.4
#4  0x00007fb171a6dcee in GetStackTrace(void**, int, int) () from /usr/lib/libtcmalloc.so.4
#5  0x00007fb171a5f0c0 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /usr/lib/libtcmalloc.so.4
#6  0x00007fb171a5f423 in tcmalloc::PageHeap::New(unsigned long) () from /usr/lib/libtcmalloc.so.4
#7  0x00007fb171a5dd34 in tcmalloc::CentralFreeList::Populate() () from /usr/lib/libtcmalloc.so.4
#8  0x00007fb171a5df28 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#9  0x00007fb171a5dfbf in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.4
#10 0x00007fb171a60faa in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.4
#11 0x00007fb171a52289 in ?? () from /usr/lib/libtcmalloc.so.4
#12 0x00007fb171a72d52 in tc_posix_memalign () from /usr/lib/libtcmalloc.so.4
#13 0x000055bd32dd3eb5 in ceph::buffer::list::append(char const*, unsigned int) ()
#14 0x000055bd32df9656 in GetdescsHook::call(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> > > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list&) ()
#15 0x000055bd32df632b in AdminSocket::do_accept() ()
#16 0x000055bd32df78d8 in AdminSocket::entry() ()
#17 0x00007fb170e7f70a in start_thread (arg=0x7fb16d3af700) at pthread_create.c:333
#18 0x00007fb16fef682d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fb16dbb0700 (LWP 93912)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055bd32efe1f0 in CephContextServiceThread::entry() ()
#2  0x00007fb170e7f70a in start_thread (arg=0x7fb16dbb0700) at pthread_create.c:333
#3  0x00007fb16fef682d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fb16eb4c700 (LWP 93911)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055bd32e1fc7b in ceph::logging::Log::entry() ()
#2  0x00007fb170e7f70a in start_thread (arg=0x7fb16eb4c700) at pthread_create.c:333
#3  0x00007fb16fef682d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fb17299bc80 (LWP 93910)):
#0  0x00007fb171a6d159 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.4
#1  0x00007fb171a6d026 in SpinLock::SlowLock() () from /usr/lib/libtcmalloc.so.4
#2  0x00007fb171a5de38 in tcmalloc::CentralFreeList::Populate() () from /usr/lib/libtcmalloc.so.4
#3  0x00007fb171a5df28 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#4  0x00007fb171a5dfbf in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.4
#5  0x00007fb171a60faa in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.4
#6  0x00007fb171a72a4b in tc_calloc () from /usr/lib/libtcmalloc.so.4
#7  0x00007fb1727a04fa in do_lookup_unique (undef_map=0x55bd3d1c9700, ref=0x7fb16c5a9028, strtab=0x55bd324ca418 "", sym=<optimized out>, type_class=4, result=0x7ffc82b1bf80, map=0x7fb1729bd168, new_hash=2818826075, 
    undef_name=0x7fb16c6331b5 "_ZZN5boost9function1IvdE9assign_toINS_3_bi6bind_tIvNS_4_mfi3mf1IvN11json_spirit16Semantic_actionsINS7_10Value_implINS7_10Config_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEENS_6spirit7c"...) at dl-lookup.c:268
#8  do_lookup_x (undef_name=undef_name@entry=0x7fb16c6331b5 "_ZZN5boost9function1IvdE9assign_toINS_3_bi6bind_tIvNS_4_mfi3mf1IvN11json_spirit16Semantic_actionsINS7_10Value_implINS7_10Config_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEENS_6spirit7c"..., new_hash=new_hash@entry=2818826075, 
    old_hash=old_hash@entry=0x7ffc82b1bf70, ref=0x7fb16c5a9028, result=result@entry=0x7ffc82b1bf80, scope=<optimized out>, i=<optimized out>, version=0x0, flags=1, skip=0x0, type_class=4, undef_map=0x55bd3d1c9700) at dl-lookup.c:540
#9  0x00007fb1727a094f in _dl_lookup_symbol_x (undef_name=0x7fb16c6331b5 "_ZZN5boost9function1IvdE9assign_toINS_3_bi6bind_tIvNS_4_mfi3mf1IvN11json_spirit16Semantic_actionsINS7_10Value_implINS7_10Config_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEENS_6spirit7c"..., 
    undef_map=undef_map@entry=0x55bd3d1c9700, ref=ref@entry=0x7ffc82b1c0d0, symbol_scope=symbol_scope@entry=0x55bd3d1c9a58, version=0x0, type_class=4, flags=1, skip_map=0x0) at dl-lookup.c:829
#10 0x00007fb1727a25ad in elf_machine_rela (skip_ifunc=0, reloc_addr_arg=0x7fb16c9616a8, version=<optimized out>, sym=0x7fb16c5a9028, reloc=0x7fb16c669a70, map=0x55bd3d1c9700) at ../sysdeps/x86_64/dl-machine.h:301
#11 elf_dynamic_do_Rela (skip_ifunc=0, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0x55bd3d1c9700) at do-rel.h:137
#12 _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:258
#13 0x00007fb1727ab681 in dl_open_worker (a=a@entry=0x7ffc82b1c3a0) at dl-open.c:435
#14 0x00007fb1727a6394 in _dl_catch_error (objname=objname@entry=0x7ffc82b1c390, errstring=errstring@entry=0x7ffc82b1c398, mallocedp=mallocedp@entry=0x7ffc82b1c38f, operate=operate@entry=0x7fb1727ab300 <dl_open_worker>, args=args@entry=0x7ffc82b1c3a0) at dl-error.c:187
#15 0x00007fb1727aabd9 in _dl_open (file=0x55bd3d16dcc0 "/usr/lib/ceph/erasure-code/libec_lrc.so", mode=-2147483646, 
    caller_dlopen=0x55bd3323b016 <ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+406>, nsid=-2, 
    argc=<optimized out>, argv=<optimized out>, env=0x55bd3d07a000) at dl-open.c:660
#16 0x00007fb170c74f09 in dlopen_doit (a=a@entry=0x7ffc82b1c5d0) at dlopen.c:66
#17 0x00007fb1727a6394 in _dl_catch_error (objname=0x55bd3d07e0b0, errstring=0x55bd3d07e0b8, mallocedp=0x55bd3d07e0a8, operate=0x7fb170c74eb0 <dlopen_doit>, args=0x7ffc82b1c5d0) at dl-error.c:187
#18 0x00007fb170c75571 in _dlerror_run (operate=operate@entry=0x7fb170c74eb0 <dlopen_doit>, args=args@entry=0x7ffc82b1c5d0) at dlerror.c:163
#19 0x00007fb170c74fa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#20 0x000055bd3323b016 in ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*) ()
#21 0x000055bd3323b6b5 in ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*) ()
#22 0x000055bd32dce3f6 in global_init_preload_erasure_code(CephContext const*) ()
#23 0x000055bd327a7c41 in main ()
(gdb) q
A debugging session is active.

        Inferior 1 [process 93910] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/ceph-osd, process 93910
root@smithi088:~# dpkg -l | grep tcmal
ii  libtcmalloc-minimal4                      2.4-0ubuntu5                             amd64        efficient thread-caching malloc

#49 Updated by Sage Weil 4 months ago

xenial package presumably does not backport gperftools commit 7852eeb75b9375cf52a7da01be044da6e915dd08 like the rhel/centos package does.

#50 Updated by Sage Weil 3 months ago

  • Status changed from Pending Upstream to Resolved
<jamespage> sage: hey - that google-perftools fix is now in Xenial - hopefully that should resolve the testing issues you've been seeing!
<jamespage> (had a post summit blitz on SRU's this week)

#51 Updated by Sage Weil 3 months ago

just saw this on smithi081, which had 2.4-0ubuntu5, but apt update + apt install libgoogle-perftools4 upgraded to 2.4-0ubuntu5.16.04.1.

#52 Updated by David Galloway 3 months ago

Sage Weil wrote:

just saw this on smithi081, which had 2.4-0ubuntu5, but apt update + apt install libgoogle-perftools4 upgraded to 2.4-0ubuntu5.16.04.1.

https://github.com/ceph/ceph-cm-ansible/pull/317

#53 Updated by David Galloway 3 months ago

David Galloway wrote:

Sage Weil wrote:

just saw this on smithi081, which had 2.4-0ubuntu5, but apt update + apt install libgoogle-perftools4 upgraded to 2.4-0ubuntu5.16.04.1.

https://github.com/ceph/ceph-cm-ansible/pull/317

Merged

Also available in: Atom PDF