https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-05-17T04:47:35ZCeph rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=911182017-05-17T04:47:35ZBen Hinesbhines@gmail.com
<ul></ul><p>Yikes, same fault also happened when I pushed a new period map, which caused the 'realm reloader' to trigger. That makes this seem more serious.</p>
<pre>
2017-05-16 21:41:53.691726 7f653addf700 1 rgw realm reloader: Frontends paused
2017-05-16 21:41:53.691938 7f66397dc700 0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125
2017-05-16 21:41:54.615538 7f66377d8700 -1 *** Caught signal (Segmentation fault) **
in thread 7f66377d8700 thread_name:lifecycle_thr
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
1: (()+0x50bdca) [0x7f66e2845dca]
2: (()+0xf370) [0x7f66d7a95370]
3: (RGWGC::add_chain(librados::ObjectWriteOperation&, cls_rgw_obj_chain&, std::string const&)+0x62) [0x7f66e278dd62]
4: (RGWGC::send_chain(cls_rgw_obj_chain&, std::string const&, bool)+0x5f) [0x7f66e278de3f]
5: (RGWRados::Object::complete_atomic_modification()+0xb9) [0x7f66e264ca29]
6: (RGWRados::Object::Delete::delete_obj()+0xd70) [0x7f66e269c2a0]
</pre> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=914642017-05-23T21:18:18ZBen Hinesbhines@gmail.com
<ul></ul><p>I got a coredump + symbols for this. let me know if you want the dump, here is the stack with full symbols:</p>
<p>#0 0x00007f6a6cb1723b in raise () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> 0x00007f6a778b9e95 in reraise_fatal (signum=11) at /usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:72<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> handle_fatal_signal (signum=11) at /usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:134<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> <signal handler called><br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> RGWGC::add_chain (this=this@entry=0x0, op=..., chain=..., tag="default.68996150.61684839") at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:58<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> 0x00007f6a77801e3f in RGWGC::send_chain (this=0x0, chain=..., tag="default.68996150.61684839", sync=sync@entry=false)<br /> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:64<br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> 0x00007f6a776c0a29 in RGWRados::Object::complete_atomic_modification (this=0x7f69cc8578d0) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:7870<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> 0x00007f6a777102a0 in RGWRados::Object::Delete::delete_obj (this=this@entry=0x7f69cc857840) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8295<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x00007f6a77710ce8 in RGWRados::delete_obj (this=<optimized out>, obj_ctx=..., bucket_info=..., obj=..., versioning_status=0, bilog_flags=<optimized out>,<br /> expiration_time=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8330<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x00007f6a77607ced in rgw_remove_object (store=0x7f6a810fe000, bucket_info=..., bucket=..., key=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_bucket.cc:519<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x00007f6a7780c971 in RGWLC::bucket_lc_process (this=this@entry=0x7f6a81959c00, shard_id=":globalcache307:default.42048218.11")<br /> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:283<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00007f6a7780d928 in RGWLC::process (this=this@entry=0x7f6a81959c00, index=<optimized out>, max_lock_secs=max_lock_secs@entry=60)<br /> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:482<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x00007f6a7780ddc1 in RGWLC::process (this=0x7f6a81959c00) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:412<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> 0x00007f6a7780e033 in RGWLC::LCWorker::entry (this=0x7f6a81a820d0) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:51<br /><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: osd: pg split breaks if not all osds are up (Duplicate)" href="https://tracker.ceph.com/issues/14">#14</a> 0x00007f6a6cb0fdc5 in start_thread () from /lib64/libpthread.so.0<br /><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds rejoin: invented dirfrags (MDCache.cc:3469) (Resolved)" href="https://tracker.ceph.com/issues/15">#15</a> 0x00007f6a6b37073d in clone () from /lib64/libc.so.6</p> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=914672017-05-23T22:18:21ZBen Hinesbhines@gmail.com
<ul></ul><pre>
rgw.cc:58:
cls_rgw_gc_set_entry(op, cct->_conf->rgw_gc_obj_min_wait, info);
</pre>
<p>Might not like my 'gc obj min wait' setting possibly? Here's my config:</p>
<pre>
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw thread pool size = 800
debug civetweb = 20
rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=500 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log
rgw num rados handles = 32
rgw cache lru size = 30000
rgw cache enabled = true
rgw override bucket index max shards = 23
rgw enable apis = s3
debug rgw = 20
rgw lifecycle work time = 00:01-23:59
rgw enable ops log = False
rgw gc max objs = 2647
rgw lc max objs = 2647
rgw gc obj min wait = 300
rgw gc processor period = 600
rgw gc processor max time = 600
</pre> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=957202017-07-22T00:31:38Zwei qiaomiaowei.qiaomiao@zte.com.cn
<ul></ul><p>I think this PR: <a class="external" href="http://tracker.ceph.com/issues/19956">http://tracker.ceph.com/issues/19956</a> will help you.</p> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=957212017-07-22T01:48:25ZBen Hinesbhines@gmail.com
<ul></ul><p>wei qiaomiao wrote:</p>
<blockquote>
<p>I think this PR: <a class="external" href="http://tracker.ceph.com/issues/19956">http://tracker.ceph.com/issues/19956</a> will help you.</p>
</blockquote>
<p>The link just goes back here. Which PR? If it's in Luminous i can wait then. Thanks! :)</p> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=957262017-07-23T07:29:52ZOrit Wassermanowasserm@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=957802017-07-25T06:13:03ZBen Hinesbhines@gmail.com
<ul></ul><p>Looks like the fix is at <a class="external" href="https://github.com/ceph/ceph/pull/16495">https://github.com/ceph/ceph/pull/16495</a> and this was duplicated by <a class="external" href="http://tracker.ceph.com/issues/20756">http://tracker.ceph.com/issues/20756</a></p> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959252017-07-27T14:23:27ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p>merged <a class="external" href="https://github.com/ceph/ceph/pull/16495">https://github.com/ceph/ceph/pull/16495</a></p> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959522017-07-27T17:37:00ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>jewel kraken</i></li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959532017-07-27T17:37:48ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Backport</strong> changed from <i>jewel kraken</i> to <i>kraken</i></li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959612017-07-27T17:53:47ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/20755">Bug #20755</a>: radosgw crash when service is restarted during lifecycle processing</i> added</li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959632017-07-27T17:53:58ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Related to</strong> deleted (<i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/20755">Bug #20755</a>: radosgw crash when service is restarted during lifecycle processing</i>)</li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959652017-07-27T17:54:06ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/20755">Bug #20755</a>: radosgw crash when service is restarted during lifecycle processing</i> added</li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=959672017-07-27T17:54:20ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/20756">Bug #20756</a>: radosgw daemon crash when service is restarted during lifecycle processing</i> added</li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=960712017-07-28T06:17:36ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-6 priority-4 priority-default closed" href="/issues/20836">Backport #20836</a>: kraken: radosgw Segmentation fault when service is restarted during lifecycle processing</i> added</li></ul> rgw - Bug #19956: radosgw Segmentation fault when service is restarted during lifecycle processinghttps://tracker.ceph.com/issues/19956?journal_id=982902017-09-05T15:25:40ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>