https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2013-01-11T09:37:33Z
Ceph
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15655
2013-01-11T09:37:33Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li></ul><p>check dmesg, or VM responsiveness. this triggers when a call to sync(2) takes more than... 2 minutes? i forget how long. it's there as a safety for when the kernel or underlying fs is hung.</p>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15668
2013-01-11T10:16:01Z
Anonymous
<ul></ul><p>backtrace of core.0.14401 from centos3: <br />Core was generated by `/usr/bin/ceph-osd -i 8 --pid-file /var/run/ceph/osd.8.pid -c /etc/ceph/ceph.con'.<br />Program terminated with signal 6, Aborted.<br />#0 0x00007faa9c2e13cb in raise () from /lib64/libpthread.so.0<br />Missing separate debuginfos, use: debuginfo-install ceph-0.56.1-0.el6.x86_64<br />(gdb) bt<br />#0 0x00007faa9c2e13cb in raise () from /lib64/libpthread.so.0<br />]#1 0x000000000078c557 in reraise_fatal (signum=6) at global/signal_handler.cc:58<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> handle_fatal_signal (signum=6) at global/signal_handler.cc:104<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> <signal handler called><br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> 0x00007faa9afae8a5 in raise () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> 0x00007faa9afb0085 in abort () from /lib64/libc.so.6<br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> 0x00007faa9b866a5d in _<em>gnu_cxx::</em>_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> 0x00007faa9b864be6 in ?? () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x00007faa9b864c13 in std::terminate() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x00007faa9b864d0e in _<em>cxa_throw () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x0000000000837839 in ceph::</em>_ceph_assert_fail (assertion=0x2da4d50 "\001", file=0x7faa8011b230 "\360\255\023\200\252\177", line=3294, func=0x9360c0 "virtual void SyncEntryTimeout::finish(int)")<br /> at common/assert.cc:77<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00000000007313ef in SyncEntryTimeout::finish (this=<value optimized out>, r=<value optimized out>) at os/FileStore.cc:3294<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x000000000084f053 in SafeTimer::timer_thread (this=0x2dc6a68) at common/Timer.cc:105<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> 0x000000000085121d in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38<br /><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: osd: pg split breaks if not all osds are up (Duplicate)" href="https://tracker.ceph.com/issues/14">#14</a> 0x00007faa9c2d9851 in start_thread () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds rejoin: invented dirfrags (MDCache.cc:3469) (Resolved)" href="https://tracker.ceph.com/issues/15">#15</a> 0x00007faa9b06367d in clone () from /lib64/libc.so.6</p>
<p>dmesg output from centos3:</p>
<p>hrtimer: interrupt took 3666286 ns<br />INFO: task ceph-osd:14160 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14160 1 0x00000080<br /> ffff88001b4fbc98 0000000000000082 0000000000010287 ffff88001b4fbc00<br /> ffff88001b4fbc68 ffffffff810a45a0 ffff88001b4fbca0 ffff8800187da040<br /> ffff8800187da5f8 ffff88001b4fbfd8 000000000000fb88 ffff8800187da5f8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120<br /> [<ffffffff8106a873>] ? do_fork+0x133/0x460<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14163 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14163 1 0x00000080<br /> ffff88001b483c98 0000000000000082 ffffffff81ecb198 ffff88001b4a8080<br /> ffff88001b483c68 ffffffff810a45a0 0000000000000001 ffff88001b483c98<br /> ffff88001b4a8638 ffff88001b483fd8 000000000000fb88 ffff88001b4a8638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff8111673e>] ? generic_file_aio_write+0xbe/0xe0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd8b>] ? selinux_file_permission+0xfb/0x150<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14166 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14166 1 0x00000080<br /> ffff88001d885c98 0000000000000082 ffff88001d885c38 ffffffff814fe99e<br /> ffff88001d885c68 ffffffff810a45a0 ffff88001d885c68 ffff88001d885c48<br /> ffff88001ae19af8 ffff88001d885fd8 000000000000fb88 ffff88001ae19af8<br />Call Trace:<br /> [<ffffffff814fe99e>] ? __wait_on_bit+0x7e/0x90<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100bb5c>] retint_signal+0x48/0x8c<br />INFO: task ceph-osd:14171 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14171 1 0x00000080<br /> ffff88001b4b3c98 0000000000000082 0000000000010287 ffff88001b4b3c00<br /> ffff88001b4b3c68 ffffffff810a45a0 ffffea0000000000 ffff88001b4b3b88<br /> ffff88000bb30638 ffff88001b4b3fd8 000000000000fb88 ffff88000bb30638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81060280>] ? wake_up_state+0x10/0x20<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14417 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14417 1 0x00000080<br /> ffff88001b0ddc98 0000000000000082 ffffffff81ec9370 ffff88001b410080<br /> ffff88001b0ddc68 ffffffff810a45a0 ffff88001b0ddca0 ffff88001b410080<br /> ffff88001b410638 ffff88001b0ddfd8 000000000000fb88 ffff88001b410638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14418 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14418 1 0x00000080<br /> ffff88001b4abc98 0000000000000082 ffffffff81eca040 ffff880010f64ae0<br /> ffff88001b4abc68 ffffffff810a45a0 ffff88001b4abca0 ffff880010f64ae0<br /> ffff880010f65098 ffff88001b4abfd8 000000000000fb88 ffff880010f65098<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14419 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14419 1 0x00000080<br /> ffff88001b515c98 0000000000000082 0000000000010287 ffff88001b515c00<br /> ffff88001b515c68 ffffffff810a45a0 ffff88001b515ca0 ffff88001f2ecaa0<br /> ffff88001f2ed058 ffff88001b515fd8 000000000000fb88 ffff88001f2ed058<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14420 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14420 1 0x00000080<br /> ffff88001b4cdc98 0000000000000082 0000000000010287 ffff88001b4cdc00<br /> ffff88001b4cdc68 ffffffff810a45a0 ffff88001b4cdca0 ffff88001f2ed500<br /> ffff88001f2edab8 ffff88001b4cdfd8 000000000000fb88 ffff88001f2edab8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14421 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14421 1 0x00000080<br /> ffff88001b03bc98 0000000000000082 0000000000010287 ffff88001b03bc00<br /> ffff88001b03bc68 ffffffff810a45a0 ffffea0000141c48 ffffea0000000000<br /> ffff88001b0d7af8 ffff88001b03bfd8 000000000000fb88 ffff88001b0d7af8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff811852f7>] ? pipe_read+0x2a7/0x4e0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd4f>] ? selinux_file_permission+0xbf/0x150<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:14422 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 14422 1 0x00000080<br /> ffff880019e4bc98 0000000000000082 ffffffff81eca900 ffff88001d4a0aa0<br /> ffff880019e4bc68 ffffffff810a45a0 ffff880019e4bca0 ffff88001d4a0aa0<br /> ffff88001d4a1058 ffff880019e4bfd8 000000000000fb88 ffff88001d4a1058<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />[root@centos3 core]#</p>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15672
2013-01-11T10:23:44Z
Anonymous
<ul></ul><p>looks from dmesg, you are right Sage, low on resources</p>
<p>centos1 core# gdb /usr/bin/ceph-osd core.0.26177</p>
<p>Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.<br />Program terminated with signal 6, Aborted.<br />#0 0x00007fb42de008a5 in raise () from /lib64/libc.so.6<br />Missing separate debuginfos, use: debuginfo-install ceph-0.56.1-0.el6.x86_64<br />(gdb) bt<br />#0 0x00007fb42de008a5 in raise () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> 0x00007fb42de02085 in abort () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> 0x00007fb42e6b8971 in _<em>gnu_cxx::</em>_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> 0x00007fb42e6b6be6 in ?? () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> 0x00007fb42e6b6c13 in std::terminate() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> 0x00007fb42e6b6d0e in _<em>cxa_throw () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> 0x0000000000837839 in ceph::</em>_ceph_assert_fail(char const*, char const*, int, char const*) ()<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> 0x00000000007c4b4b in ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long) ()<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x00000000007c4ee7 in ceph::HeartbeatMap::is_healthy() ()<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x00000000007c5148 in ceph::HeartbeatMap::check_touch_file() ()<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x000000000084d8ad in CephContextServiceThread::entry() ()<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00007fb42f12b851 in start_thread () from /lib64/libpthread.so.0<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x00007fb42deb567d in clone () from /lib64/libc.so.6</p>
<p>dmesg:<br />hrtimer: interrupt took 3717906 ns<br />INFO: task ceph-osd:26177 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26177 1 0x00000080<br /> ffff88001d039c98 0000000000000082 0000000000010287 ffff88001d039c00<br /> ffff88001d039c68 ffffffff810a45a0 ffff88001d039ca0 ffff880000b9a080<br /> ffff880000b9a638 ffff88001d039fd8 000000000000fb88 ffff880000b9a638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120<br /> [<ffffffff8106a873>] ? do_fork+0x133/0x460<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26180 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26180 1 0x00000080<br /> ffff88001f2d9c98 0000000000000082 0000000000000001 ffff88001ac57cf0<br /> 0000000000000000 0000000000000000 ffff88001f2d9c18 ffffffff81060262<br /> ffff88001d0fbab8 ffff88001f2d9fd8 000000000000fb88 ffff88001d0fbab8<br />Call Trace:<br /> [<ffffffff81060262>] ? default_wake_function+0x12/0x20<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff8111673e>] ? generic_file_aio_write+0xbe/0xe0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd8b>] ? selinux_file_permission+0xfb/0x150<br /> [<ffffffff8117b0e2>] ? vfs_write+0x132/0x1a0<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26185 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26185 1 0x00000080<br /> ffff88001f61fc98 0000000000000082 0000000000010287 ffff88001f61fc00<br /> ffff88001f61fc68 ffffffff810a45a0 ffffea0000000000 ffff88001f61fb88<br /> ffff88001ad39098 ffff88001f61ffd8 000000000000fb88 ffff88001ad39098<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81060280>] ? wake_up_state+0x10/0x20<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26815 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26815 1 0x00000080<br /> ffff88001d56dc98 0000000000000082 0000000000010287 ffff88001d56dc00<br /> ffff88001d56dc68 ffffffff810a45a0 ffff88001d56dca0 ffff88001f465540<br /> ffff88001f465af8 ffff88001d56dfd8 000000000000fb88 ffff88001f465af8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26816 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26816 1 0x00000080<br /> ffff88001f5bbc98 0000000000000082 0000000000010287 ffff88001f5bbc00<br /> ffff88001f5bbc68 ffffffff810a45a0 ffff88001fa885f8 ffff88001f5bbc98<br /> ffff88000c9a4638 ffff88001f5bbfd8 000000000000fb88 ffff88000c9a4638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26817 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26817 1 0x00000080<br /> ffff88001ad0dc98 0000000000000082 0000000000010287 ffff88001ad0dc00<br /> ffff88001ad0dc68 ffffffff810a45a0 ffff88001ad0dca0 ffff88000c9a4ae0<br /> ffff88000c9a5098 ffff88001ad0dfd8 000000000000fb88 ffff88000c9a5098<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26818 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26818 1 0x00000080<br /> ffff88001f7e1c98 0000000000000082 0000000000010287 ffff88001f7e1c00<br /> ffff88001f7e1c68 ffffffff810a45a0 ffff880002216680 ffff88001f7e1c98<br /> ffff880008791ab8 ffff88001f7e1fd8 000000000000fb88 ffff880008791ab8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26819 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26819 1 0x00000080<br /> ffff88001acc9c98 0000000000000082 0000000000010287 ffff88001acc9c00<br /> ffff88001acc9c68 ffffffff810a45a0 ffffea0000142738 ffffea0000000000<br /> ffff880008791058 ffff88001acc9fd8 000000000000fb88 ffff880008791058<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff811852f7>] ? pipe_read+0x2a7/0x4e0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd4f>] ? selinux_file_permission+0xbf/0x150<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26820 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26820 1 0x00000080<br /> ffff88001d0c1c98 0000000000000082 0000000000010287 ffff88001d0c1c00<br /> ffff88001d0c1c68 ffffffff810a45a0 ffff88001d0c1d18 ffff88001d0c1c98<br /> ffff88001d01c638 ffff88001d0c1fd8 000000000000fb88 ffff88001d01c638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff810e0f55>] ? call_rcu_sched+0x15/0x20<br /> [<ffffffff810e0f6e>] ? call_rcu+0xe/0x10<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:26821 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 26821 1 0x00000080<br /> ffff88001acf9c98 0000000000000082 0000000000010287 ffff88001acf9c00<br /> ffff88001acf9c68 ffffffff810a45a0 ffff88001acf9ca0 ffff88001f703500<br /> ffff88001f703ab8 ffff88001acf9fd8 000000000000fb88 ffff88001f703ab8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />[root@centos1 core]#</p>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15673
2013-01-11T10:27:18Z
Anonymous
<ul></ul><p>all core files have similar backtrace.<br />again, Sage, looks like you are right, low resources</p>
<p>dmesg:<br />hrtimer: interrupt took 5259323 ns<br />INFO: task ceph-osd:5038 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 5038 1 0x00000080<br /> ffff88001cc59d18 0000000000000082 ffff88001cc59c88 ffffffff8116b670<br /> ffff88001cc59c98 ffff8800000116c0 0000000000000000 0000000000000000<br /> ffff8800025d7af8 ffff88001cc59fd8 000000000000fb88 ffff8800025d7af8<br />Call Trace:<br /> [<ffffffff8116b670>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70<br /> [<ffffffff8127865d>] ? rb_insert_color+0x9d/0x160<br /> [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0<br /> [<ffffffff81054a04>] ? check_preempt_wakeup+0x1a4/0x260<br /> [<ffffffff810632c4>] ? enqueue_task_fair+0x64/0x100<br /> [<ffffffff814fe323>] wait_for_common+0x123/0x180<br /> [<ffffffff81060250>] ? default_wake_function+0x0/0x20<br /> [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20<br /> [<ffffffff811a46b8>] sync_inodes_sb+0x88/0x190<br /> [<ffffffff811aa212>] __sync_filesystem+0x82/0x90<br /> [<ffffffff811aa41b>] sync_filesystem+0x4b/0x70<br /> [<ffffffff811aa490>] sys_syncfs+0x50/0x80<br /> [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b<br />ceph-osd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0<br />ceph-osd cpuset=/ mems_allowed=0<br />Pid: 17403, comm: ceph-osd Not tainted 2.6.32-279.el6.x86_64 <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a><br />Call Trace:<br /> [<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0<br /> [<ffffffff811170e0>] ? dump_header+0x90/0x1b0<br /> [<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70<br /> [<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0<br /> [<ffffffff811174a1>] ? select_bad_process+0xe1/0x120<br /> [<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0<br /> [<ffffffff811276be>] ? __alloc_pages_nodemask+0x89e/0x940<br /> [<ffffffff8115c1da>] ? alloc_pages_current+0xaa/0x110<br /> [<ffffffff811144e7>] ? __page_cache_alloc+0x87/0x90<br /> [<ffffffff8112a10b>] ? __do_page_cache_readahead+0xdb/0x210<br /> [<ffffffff8118fdd0>] ? __pollwait+0x0/0xf0<br /> [<ffffffff8112a261>] ? ra_submit+0x21/0x30<br /> [<ffffffff81115813>] ? filemap_fault+0x4c3/0x500<br /> [<ffffffff8118fec0>] ? pollwake+0x0/0x60<br /> [<ffffffff8113ec14>] ? __do_fault+0x54/0x510<br /> [<ffffffff8113f1c7>] ? handle_pte_fault+0xf7/0xb50<br /> [<ffffffff81060280>] ? wake_up_state+0x10/0x20<br /> [<ffffffff810a3bc0>] ? wake_futex+0x40/0x60<br /> [<ffffffff810a43fe>] ? futex_wake+0x10e/0x120<br /> [<ffffffff8113fe04>] ? handle_mm_fault+0x1e4/0x2b0<br /> [<ffffffff810a6340>] ? do_futex+0x100/0xb00<br /> [<ffffffff81044479>] ? __do_page_fault+0x139/0x480<br /> [<ffffffff8142874b>] ? sys_recvfrom+0x16b/0x180<br /> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br /> [<ffffffff81012bd9>] ? read_tsc+0x9/0x20<br /> [<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff8150326e>] ? do_page_fault+0x3e/0xa0<br /> [<ffffffff81500625>] ? page_fault+0x25/0x30<br />Mem-Info:<br />Node 0 DMA per-cpu:<br />CPU 0: hi: 0, btch: 1 usd: 0<br />Node 0 DMA32 per-cpu:<br />CPU 0: hi: 186, btch: 31 usd: 30<br />active_anon:52525 inactive_anon:52561 isolated_anon:0<br /> active_file:218 inactive_file:321 isolated_file:0<br /> unevictable:0 dirty:0 writeback:1 unstable:0<br /> free:1189 slab_reclaimable:2305 slab_unreclaimable:12054<br /> mapped:206 shmem:0 pagetables:1320 bounce:0<br />Node 0 DMA free:2044kB min:84kB low:104kB high:124kB active_anon:6640kB inactive_anon:6724kB active_file:24kB inactive_file:108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:92kB slab_unreclaimable:56kB kernel_stack:0kB pagetables:68kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:275 all_unreclaimable? yes<br />lowmem_reserve[]: 0 489 489 489<br />Node 0 DMA32 free:2712kB min:2784kB low:3480kB high:4176kB active_anon:203460kB inactive_anon:203520kB active_file:848kB inactive_file:1176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:500896kB mlocked:0kB dirty:0kB writeback:4kB mapped:764kB shmem:0kB slab_reclaimable:9128kB slab_unreclaimable:48160kB kernel_stack:2448kB pagetables:5212kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:704 all_unreclaimable? no<br />lowmem_reserve[]: 0 0 0 0<br />Node 0 DMA: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2044kB<br />Node 0 DMA32: 176*4kB 3*8kB 2*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2712kB<br />6002 total pagecache pages<br />5455 pages in swap cache<br />Swap cache stats: add 1182169, delete 1176714, find 251320/357333<br />Free swap = 0kB<br />Total swap = 1015800kB<br />131055 pages RAM<br />5413 pages reserved<br />692 pages shared<br />120653 pages non-shared<br />[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name<br />[ 383] 0 383 2865 0 0 -17 -1000 udevd<br />[ 920] 0 920 6914 27 0 -17 -1000 auditd<br />[ 936] 0 936 62796 49 0 0 0 rsyslogd<br />[ 970] 0 970 16016 0 0 -17 -1000 sshd<br />[ 1471] 0 1471 19667 34 0 0 0 master<br />[ 1479] 0 1479 29309 31 0 0 0 crond<br />[ 1485] 89 1485 19730 18 0 0 0 qmgr<br />[ 1496] 0 1496 1014 1 0 0 0 mingetty<br />[ 1498] 0 1498 1014 1 0 0 0 mingetty<br />[ 1500] 0 1500 1014 1 0 0 0 mingetty<br />[ 1502] 0 1502 1014 1 0 0 0 mingetty<br />[ 1507] 0 1507 3095 0 0 -17 -1000 udevd<br />[ 1508] 0 1508 3095 0 0 -17 -1000 udevd<br />[ 1509] 0 1509 1014 1 0 0 0 mingetty<br />[ 1511] 0 1511 1014 1 0 0 0 mingetty<br />[ 8060] 0 8060 384151 62840 0 0 0 ceph-mon<br />[ 8174] 0 8174 259649 12390 0 0 0 ceph-osd<br />[ 8242] 0 8242 252767 12542 0 0 0 ceph-osd<br />[ 8304] 0 8304 258688 12090 0 0 0 ceph-osd<br />[17882] 89 17882 19687 215 0 0 0 pickup<br />Out of memory: Kill process 8060 (ceph-mon) score 644 or sacrifice child<br />Killed process 8060, UID 0, (ceph-mon) total-vm:1536604kB, anon-rss:250868kB, file-rss:492kB<br />INFO: task ceph-osd:8304 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8304 1 0x00000080<br /> ffff88001f709c98 0000000000000086 0000000000010287 ffff88001f709c00<br /> ffff88001f709c68 ffffffff810a45a0 ffff88001f709ca0 ffff88001da65540<br /> ffff88001da65af8 ffff88001f709fd8 000000000000fb88 ffff88001da65af8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81099015>] ? sched_clock_local+0x25/0x90<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120<br /> [<ffffffff8106a873>] ? do_fork+0x133/0x460<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8305 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8305 1 0x00000080<br /> ffff88001da8dc98 0000000000000086 ffffffff81ecb5d0 ffff880003ac9500<br /> ffff88001da8dc68 ffffffff810a45a0 ffff88001da8dca0 ffff880003ac9500<br /> ffff880003ac9ab8 ffff88001da8dfd8 000000000000fb88 ffff880003ac9ab8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff8111673e>] ? generic_file_aio_write+0xbe/0xe0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd8b>] ? selinux_file_permission+0xfb/0x150<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8306 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8306 1 0x00000080<br /> ffff88001d779c98 0000000000000086 ffffffff81eca720 ffff88001d69c080<br /> ffff88001d779c68 ffffffff810a45a0 ffff88001d779ca0 ffff88001d69c080<br /> ffff88001d69c638 ffff88001d779fd8 000000000000fb88 ffff88001d69c638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8307 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8307 1 0x00000080<br /> ffff88001f68fc98 0000000000000086 0000000000010287 ffff88001f68fc00<br /> ffff88001f68fc68 ffffffff810a45a0 ffffea0000000000 ffff88001f68fb88<br /> ffff88001f56daf8 ffff88001f68ffd8 000000000000fb88 ffff88001f56daf8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81060280>] ? wake_up_state+0x10/0x20<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320<br /> [<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8373 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8373 1 0x00000080<br /> ffff8800024f1c98 0000000000000086 ffffffff81ec9cf8 ffff88001f5c0080<br /> ffff8800024f1c68 ffffffff810a45a0 ffff88001f64f800 ffff8800024f1c98<br /> ffff88001f5c0638 ffff8800024f1fd8 000000000000fb88 ffff88001f5c0638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8374 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8374 1 0x00000080<br /> ffff88001cc57c98 0000000000000086 0000000000010287 ffff88001cc57c00<br /> ffff88001cc57c68 ffffffff810a45a0 ffff88001cc57ca0 ffff88001cd7f500<br /> ffff88001cd7fab8 ffff88001cc57fd8 000000000000fb88 ffff88001cd7fab8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8375 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8375 1 0x00000080<br /> ffff88001dbf9c98 0000000000000086 ffffffff81ecaa18 ffff88001f43c080<br /> ffff88001dbf9c68 ffffffff810a45a0 ffff88001dbf9ca0 ffff88001f43c080<br /> ffff88001f43c638 ffff88001dbf9fd8 000000000000fb88 ffff88001f43c638<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81060280>] ? wake_up_state+0x10/0x20<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8376 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8376 1 0x00000080<br /> ffff88001cc15c98 0000000000000086 ffffffff81ec9488 ffff88001f5c1540<br /> ffff88001cc15c68 ffffffff810a45a0 ffff88001cc15ca0 ffff88001f5c1540<br /> ffff88001f5c1af8 ffff88001cc15fd8 000000000000fb88 ffff88001f5c1af8<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br /> [<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br /> [<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8377 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8377 1 0x00000080<br /> ffff88001dabbc98 0000000000000086 0000000000010287 ffff88001dabbc00<br /> ffff88001dabbc68 ffffffff810a45a0 ffffea00005e2458 ffffea0000000000<br /> ffff88001dba1098 ffff88001dabbfd8 000000000000fb88 ffff88001dba1098<br />Call Trace:<br /> [<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br /> [<ffffffff81070085>] exit_mm+0x95/0x180<br /> [<ffffffff810704cf>] do_exit+0x15f/0x870<br /> [<ffffffff811852f7>] ? pipe_read+0x2a7/0x4e0<br /> [<ffffffff81070c38>] do_group_exit+0x58/0xd0<br /> [<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br /> [<ffffffff8100a2d5>] do_signal+0x75/0x800<br /> [<ffffffff8121fd4f>] ? selinux_file_permission+0xbf/0x150<br /> [<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br /> [<ffffffff8100b3c1>] int_signal+0x12/0x17</p>
<ol>
<li>gdb /usr/bin/ceph-osd core.0.8174<br />Core was generated by `/usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c /tmp/ceph.conf.263'.<br />Program terminated with signal 6, Aborted.<br />#0 0x00007f2c157503cb in raise () from /lib64/libpthread.so.0<br />Missing separate debuginfos, use: debuginfo-install ceph-0.56.1-0.el6.x86_64<br />(gdb) bt<br />#0 0x00007f2c157503cb in raise () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> 0x000000000078c557 in ?? ()<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> <signal handler called><br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> 0x00007f2c1441d8a5 in raise () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> 0x00007f2c1441f085 in abort () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> 0x00007f2c14cd5a5d in _<em>gnu_cxx::</em>_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> 0x00007f2c14cd3be6 in ?? () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> 0x00007f2c14cd3c13 in std::terminate() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x00007f2c14cd3d0e in _<em>cxa_throw () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x0000000000837839 in ceph::</em>_ceph_assert_fail(char const*, char const*, int, char const*) ()<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x00000000007c4b4b in ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long) ()<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00000000007c4ee7 in ceph::HeartbeatMap::is_healthy() ()<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x00000000007c5148 in ceph::HeartbeatMap::check_touch_file() ()<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> 0x000000000084d8ad in CephContextServiceThread::entry() ()<br /><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: osd: pg split breaks if not all osds are up (Duplicate)" href="https://tracker.ceph.com/issues/14">#14</a> 0x00007f2c15748851 in start_thread () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds rejoin: invented dirfrags (MDCache.cc:3469) (Resolved)" href="https://tracker.ceph.com/issues/15">#15</a> 0x00007f2c144d267d in clone () from /lib64/libc.so.6</li>
</ol>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15674
2013-01-11T10:28:45Z
Anonymous
<ul></ul><p>Deb Barba wrote:</p>
<blockquote>
<p>all core files have similar backtrace.<br />again, Sage, looks like you are right, low resources</p>
<p>dmesg:<br />hrtimer: interrupt took 5259323 ns<br />INFO: task ceph-osd:5038 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 5038 1 0x00000080<br />ffff88001cc59d18 0000000000000082 ffff88001cc59c88 ffffffff8116b670<br />ffff88001cc59c98 ffff8800000116c0 0000000000000000 0000000000000000<br />ffff8800025d7af8 ffff88001cc59fd8 000000000000fb88 ffff8800025d7af8<br />Call Trace:<br />[<ffffffff8116b670>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70<br />[<ffffffff8127865d>] ? rb_insert_color+0x9d/0x160<br />[<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0<br />[<ffffffff81054a04>] ? check_preempt_wakeup+0x1a4/0x260<br />[<ffffffff810632c4>] ? enqueue_task_fair+0x64/0x100<br />[<ffffffff814fe323>] wait_for_common+0x123/0x180<br />[<ffffffff81060250>] ? default_wake_function+0x0/0x20<br />[<ffffffff814fe43d>] wait_for_completion+0x1d/0x20<br />[<ffffffff811a46b8>] sync_inodes_sb+0x88/0x190<br />[<ffffffff811aa212>] __sync_filesystem+0x82/0x90<br />[<ffffffff811aa41b>] sync_filesystem+0x4b/0x70<br />[<ffffffff811aa490>] sys_syncfs+0x50/0x80<br />[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b<br />ceph-osd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0<br />ceph-osd cpuset=/ mems_allowed=0<br />Pid: 17403, comm: ceph-osd Not tainted 2.6.32-279.el6.x86_64 <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a><br />Call Trace:<br />[<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0<br />[<ffffffff811170e0>] ? dump_header+0x90/0x1b0<br />[<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70<br />[<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0<br />[<ffffffff811174a1>] ? select_bad_process+0xe1/0x120<br />[<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0<br />[<ffffffff811276be>] ? __alloc_pages_nodemask+0x89e/0x940<br />[<ffffffff8115c1da>] ? alloc_pages_current+0xaa/0x110<br />[<ffffffff811144e7>] ? __page_cache_alloc+0x87/0x90<br />[<ffffffff8112a10b>] ? __do_page_cache_readahead+0xdb/0x210<br />[<ffffffff8118fdd0>] ? __pollwait+0x0/0xf0<br />[<ffffffff8112a261>] ? ra_submit+0x21/0x30<br />[<ffffffff81115813>] ? filemap_fault+0x4c3/0x500<br />[<ffffffff8118fec0>] ? pollwake+0x0/0x60<br />[<ffffffff8113ec14>] ? __do_fault+0x54/0x510<br />[<ffffffff8113f1c7>] ? handle_pte_fault+0xf7/0xb50<br />[<ffffffff81060280>] ? wake_up_state+0x10/0x20<br />[<ffffffff810a3bc0>] ? wake_futex+0x40/0x60<br />[<ffffffff810a43fe>] ? futex_wake+0x10e/0x120<br />[<ffffffff8113fe04>] ? handle_mm_fault+0x1e4/0x2b0<br />[<ffffffff810a6340>] ? do_futex+0x100/0xb00<br />[<ffffffff81044479>] ? __do_page_fault+0x139/0x480<br />[<ffffffff8142874b>] ? sys_recvfrom+0x16b/0x180<br />[<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br />[<ffffffff81012bd9>] ? read_tsc+0x9/0x20<br />[<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0<br />[<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br />[<ffffffff8150326e>] ? do_page_fault+0x3e/0xa0<br />[<ffffffff81500625>] ? page_fault+0x25/0x30<br />Mem-Info:<br />Node 0 DMA per-cpu:<br />CPU 0: hi: 0, btch: 1 usd: 0<br />Node 0 DMA32 per-cpu:<br />CPU 0: hi: 186, btch: 31 usd: 30<br />active_anon:52525 inactive_anon:52561 isolated_anon:0<br />active_file:218 inactive_file:321 isolated_file:0<br />unevictable:0 dirty:0 writeback:1 unstable:0<br />free:1189 slab_reclaimable:2305 slab_unreclaimable:12054<br />mapped:206 shmem:0 pagetables:1320 bounce:0<br />Node 0 DMA free:2044kB min:84kB low:104kB high:124kB active_anon:6640kB inactive_anon:6724kB active_file:24kB inactive_file:108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:92kB slab_unreclaimable:56kB kernel_stack:0kB pagetables:68kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:275 all_unreclaimable? yes<br />lowmem_reserve[]: 0 489 489 489<br />Node 0 DMA32 free:2712kB min:2784kB low:3480kB high:4176kB active_anon:203460kB inactive_anon:203520kB active_file:848kB inactive_file:1176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:500896kB mlocked:0kB dirty:0kB writeback:4kB mapped:764kB shmem:0kB slab_reclaimable:9128kB slab_unreclaimable:48160kB kernel_stack:2448kB pagetables:5212kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:704 all_unreclaimable? no<br />lowmem_reserve[]: 0 0 0 0<br />Node 0 DMA: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2044kB<br />Node 0 DMA32: 176*4kB 3*8kB 2*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2712kB<br />6002 total pagecache pages<br />5455 pages in swap cache<br />Swap cache stats: add 1182169, delete 1176714, find 251320/357333<br />Free swap = 0kB<br />Total swap = 1015800kB<br />131055 pages RAM<br />5413 pages reserved<br />692 pages shared<br />120653 pages non-shared<br />[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name<br />[ 383] 0 383 2865 0 0 -17 -1000 udevd<br />[ 920] 0 920 6914 27 0 -17 -1000 auditd<br />[ 936] 0 936 62796 49 0 0 0 rsyslogd<br />[ 970] 0 970 16016 0 0 -17 -1000 sshd<br />[ 1471] 0 1471 19667 34 0 0 0 master<br />[ 1479] 0 1479 29309 31 0 0 0 crond<br />[ 1485] 89 1485 19730 18 0 0 0 qmgr<br />[ 1496] 0 1496 1014 1 0 0 0 mingetty<br />[ 1498] 0 1498 1014 1 0 0 0 mingetty<br />[ 1500] 0 1500 1014 1 0 0 0 mingetty<br />[ 1502] 0 1502 1014 1 0 0 0 mingetty<br />[ 1507] 0 1507 3095 0 0 -17 -1000 udevd<br />[ 1508] 0 1508 3095 0 0 -17 -1000 udevd<br />[ 1509] 0 1509 1014 1 0 0 0 mingetty<br />[ 1511] 0 1511 1014 1 0 0 0 mingetty<br />[ 8060] 0 8060 384151 62840 0 0 0 ceph-mon<br />[ 8174] 0 8174 259649 12390 0 0 0 ceph-osd<br />[ 8242] 0 8242 252767 12542 0 0 0 ceph-osd<br />[ 8304] 0 8304 258688 12090 0 0 0 ceph-osd<br />[17882] 89 17882 19687 215 0 0 0 pickup<br />Out of memory: Kill process 8060 (ceph-mon) score 644 or sacrifice child<br />Killed process 8060, UID 0, (ceph-mon) total-vm:1536604kB, anon-rss:250868kB, file-rss:492kB<br />INFO: task ceph-osd:8304 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8304 1 0x00000080<br />ffff88001f709c98 0000000000000086 0000000000010287 ffff88001f709c00<br />ffff88001f709c68 ffffffff810a45a0 ffff88001f709ca0 ffff88001da65540<br />ffff88001da65af8 ffff88001f709fd8 000000000000fb88 ffff88001da65af8<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81099015>] ? sched_clock_local+0x25/0x90<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120<br />[<ffffffff8106a873>] ? do_fork+0x133/0x460<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8305 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8305 1 0x00000080<br />ffff88001da8dc98 0000000000000086 ffffffff81ecb5d0 ffff880003ac9500<br />ffff88001da8dc68 ffffffff810a45a0 ffff88001da8dca0 ffff880003ac9500<br />ffff880003ac9ab8 ffff88001da8dfd8 000000000000fb88 ffff880003ac9ab8<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff8111673e>] ? generic_file_aio_write+0xbe/0xe0<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff8121fd8b>] ? selinux_file_permission+0xfb/0x150<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8306 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8306 1 0x00000080<br />ffff88001d779c98 0000000000000086 ffffffff81eca720 ffff88001d69c080<br />ffff88001d779c68 ffffffff810a45a0 ffff88001d779ca0 ffff88001d69c080<br />ffff88001d69c638 ffff88001d779fd8 000000000000fb88 ffff88001d69c638<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff8100988e>] ? __switch_to+0x26e/0x320<br />[<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8307 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8307 1 0x00000080<br />ffff88001f68fc98 0000000000000086 0000000000010287 ffff88001f68fc00<br />ffff88001f68fc68 ffffffff810a45a0 ffffea0000000000 ffff88001f68fb88<br />ffff88001f56daf8 ffff88001f68ffd8 000000000000fb88 ffff88001f56daf8<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81060280>] ? wake_up_state+0x10/0x20<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff810097cc>] ? __switch_to+0x1ac/0x320<br />[<ffffffff814fd830>] ? thread_return+0x4e/0x76e<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8373 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8373 1 0x00000080<br />ffff8800024f1c98 0000000000000086 ffffffff81ec9cf8 ffff88001f5c0080<br />ffff8800024f1c68 ffffffff810a45a0 ffff88001f64f800 ffff8800024f1c98<br />ffff88001f5c0638 ffff8800024f1fd8 000000000000fb88 ffff88001f5c0638<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br />[<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8374 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8374 1 0x00000080<br />ffff88001cc57c98 0000000000000086 0000000000010287 ffff88001cc57c00<br />ffff88001cc57c68 ffffffff810a45a0 ffff88001cc57ca0 ffff88001cd7f500<br />ffff88001cd7fab8 ffff88001cc57fd8 000000000000fb88 ffff88001cd7fab8<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br />[<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8375 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8375 1 0x00000080<br />ffff88001dbf9c98 0000000000000086 ffffffff81ecaa18 ffff88001f43c080<br />ffff88001dbf9c68 ffffffff810a45a0 ffff88001dbf9ca0 ffff88001f43c080<br />ffff88001f43c638 ffff88001dbf9fd8 000000000000fb88 ffff88001f43c638<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81060280>] ? wake_up_state+0x10/0x20<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br />[<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8376 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8376 1 0x00000080<br />ffff88001cc15c98 0000000000000086 ffffffff81ec9488 ffff88001f5c1540<br />ffff88001cc15c68 ffffffff810a45a0 ffff88001cc15ca0 ffff88001f5c1540<br />ffff88001f5c1af8 ffff88001cc15fd8 000000000000fb88 ffff88001f5c1af8<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff81191d8f>] ? __d_free+0x3f/0x60<br />[<ffffffff8119a330>] ? mntput_no_expire+0x30/0x110<br />[<ffffffff810a6dbb>] ? sys_futex+0x7b/0x170<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17<br />INFO: task ceph-osd:8377 blocked for more than 120 seconds.<br />"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />ceph-osd D 0000000000000000 0 8377 1 0x00000080<br />ffff88001dabbc98 0000000000000086 0000000000010287 ffff88001dabbc00<br />ffff88001dabbc68 ffffffff810a45a0 ffffea00005e2458 ffffea0000000000<br />ffff88001dba1098 ffff88001dabbfd8 000000000000fb88 ffff88001dba1098<br />Call Trace:<br />[<ffffffff810a45a0>] ? exit_robust_list+0x90/0x160<br />[<ffffffff81070085>] exit_mm+0x95/0x180<br />[<ffffffff810704cf>] do_exit+0x15f/0x870<br />[<ffffffff811852f7>] ? pipe_read+0x2a7/0x4e0<br />[<ffffffff81070c38>] do_group_exit+0x58/0xd0<br />[<ffffffff81085866>] get_signal_to_deliver+0x1f6/0x460<br />[<ffffffff8100a2d5>] do_signal+0x75/0x800<br />[<ffffffff8121fd4f>] ? selinux_file_permission+0xbf/0x150<br />[<ffffffff8100aaf0>] do_notify_resume+0x90/0xc0<br />[<ffffffff8100b3c1>] int_signal+0x12/0x17</p>
<ol>
<li>gdb /usr/bin/ceph-osd core.0.8174<br />Core was generated by `/usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c /tmp/ceph.conf.263'.<br />Program terminated with signal 6, Aborted.<br />#0 0x00007f2c157503cb in raise () from /lib64/libpthread.so.0<br />Missing separate debuginfos, use: debuginfo-install ceph-0.56.1-0.el6.x86_64<br />(gdb) bt<br />#0 0x00007f2c157503cb in raise () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> 0x000000000078c557 in ?? ()<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> <signal handler called><br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> 0x00007f2c1441d8a5 in raise () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> 0x00007f2c1441f085 in abort () from /lib64/libc.so.6<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> 0x00007f2c14cd5a5d in _<em>gnu_cxx::</em>_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> 0x00007f2c14cd3be6 in ?? () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> 0x00007f2c14cd3c13 in std::terminate() () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x00007f2c14cd3d0e in _<em>cxa_throw () from /usr/lib64/libstdc++.so.6<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x0000000000837839 in ceph::</em>_ceph_assert_fail(char const*, char const*, int, char const*) ()<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x00000000007c4b4b in ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long) ()<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00000000007c4ee7 in ceph::HeartbeatMap::is_healthy() ()<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x00000000007c5148 in ceph::HeartbeatMap::check_touch_file() ()<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> 0x000000000084d8ad in CephContextServiceThread::entry() ()<br /><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: osd: pg split breaks if not all osds are up (Duplicate)" href="https://tracker.ceph.com/issues/14">#14</a> 0x00007f2c15748851 in start_thread () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds rejoin: invented dirfrags (MDCache.cc:3469) (Resolved)" href="https://tracker.ceph.com/issues/15">#15</a> 0x00007f2c144d267d in clone () from /lib64/libc.so.6</li>
</ol>
</blockquote>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15713
2013-01-11T14:01:22Z
Anonymous
<ul><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>Won't Fix</i></li></ul><p>dmesg shows it was a lack of resources.</p>
<p>upping the memory on these VMs from 512M to 2G</p>
<p>since it appears it was a resource problem, i will close this bug.</p>
<p>do we have any mechanism that I am missing that notifies the end user when crashes like this occur? So they can go in and fix their cluster before there are a critical number of resources that have failed?</p>
<p>caused by a lack of resources on the system.<br />have increased the memory from 512M to 2G, will retest.</p>
Ceph - Bug #3789: OSD core dump and down OSD on CentOS cluster
https://tracker.ceph.com/issues/3789?journal_id=15715
2013-01-11T14:26:52Z
Sage Weil
sage@newdream.net
<ul></ul><p>There is 'ceph health', and a nagios plugin that runs it. A similarly trivial plugin can probably be written for other monitoring systems... i'm not sure what else people actually use these days.</p>