https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2016-05-17T23:18:49ZCeph rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=709632016-05-17T23:18:49ZRussell Islammisla011@fiu.edu
<ul></ul><p>Above output is from top command.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710202016-05-18T17:44:26ZRussell Islammisla011@fiu.edu
<ul></ul><p>More info:<br />After configuring multi site object gateway, radosgw is taking almost 100% cpu usage while syncing is going on.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710212016-05-18T17:51:05ZYehuda Sadehyehuda@redhat.com
<ul></ul><p>what version are you using?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710222016-05-18T18:00:58ZRussell Islammisla011@fiu.edu
<ul></ul><p>Latest version: Jewel 10.2.1</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710332016-05-18T21:35:00ZRussell Islammisla011@fiu.edu
<ul></ul><p>More info: It also takes long time to stop the service.</p>
<p>systemctl stop ceph-radosgw@</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710502016-05-19T09:22:00ZNathan Cutlerncutler@suse.cz
<ul></ul><p>In another ticket ( <a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: 10.2.1 more pidfile permission problems (Duplicate)" href="https://tracker.ceph.com/issues/15907">#15907</a> ) there is a situation where the old sysvinit script is getting run - I think because the user did <code>systemctl start ceph</code> (which has the unintended effect of running <code>/etc/init.d/ceph</code> via systemd-sysvinit. Maybe something similar is happening in this situation.</p>
<p>You could check <code>ps aux | grep ceph</code> for lines like the one described in <a class="external" href="http://tracker.ceph.com/issues/15907#note-2">http://tracker.ceph.com/issues/15907#note-2</a></p>
<p>Is the behavior different if you use <code>systemctl start ceph-radosgw.target</code> to start RGW?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710512016-05-19T09:22:45ZNathan Cutlerncutler@suse.cz
<ul></ul><p>And use <code>systemctl stop ceph-radosgw.target</code> to stop, of course.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=710632016-05-19T17:02:50ZRussell Islammisla011@fiu.edu
<ul></ul><p>I started the daemon with systemctl start ceph-radosgw.target. Still almost 100% of the cpu is occupied by radosgw.</p>
<p>ps aux | grep ceph</p>
<p>root 4950 0.0 0.7 158912 7336 ? Ss May18 0:54 python /usr/sbin/ceph-create-keys --cluster ceph --id ceph-us-west-1<br />ceph 5265 0.0 4.0 356888 41184 ? Ssl May18 0:06 /usr/bin/ceph-mon -f --cluster ceph --id ceph-us-west --setuser ceph --setgroup ceph<br />ceph 6390 0.0 7.3 893100 74616 ? Ssl May18 0:30 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph<br />ceph 11114 95.4 3.4 2437208 34872 ? Ssl 09:59 1:25 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-us-west --setuser ceph --setgroup ceph<br />root 11849 0.0 0.0 112632 948 pts/0 R+ 10:01 0:00 grep --color=auto ceph</p>
<p>Question: What is the difference between "systemctl start ceph-radosgw.target" and "systemctl start ceph-radosgw@rgw."?<br />Do we need both of them?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=713152016-05-23T16:59:27ZRussell Islammisla011@fiu.edu
<ul></ul><p>Could anyone confirm if this is normal behavior?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=713302016-05-24T09:24:56ZJiaying Renmikulely@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/2343/out.png">out.png</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/2343/out.png">View</a> added</li></ul><p>Hi~ Yehuda:</p>
<p>I've seem the same issue: my env:</p>
<p>[mikulely@localhost src]$ uname -a<br />Linux localhost.localdomain 3.10.0-327.el7.x86_64 <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux<br />[mikulely@localhost src]$ ceph -v<br />ceph version 10.0.0-7743-g10f9a1d (10f9a1d1b38b8aeac029cb7332ee67fc8e80eb6e)</p>
<p>My setup steps:</p>
<p>[mikulely@localhost src]$ pwd<br />/home/mikulely/ceph/src<br />[mikulely@localhost src]$ python test/rgw/test_multi.py --num-zones 2</p>
<p>After this setup,the ceph-radsogw is over 160% by htop output.</p>
<p>After encouter this,I've enable oprofile option and re-compile,the profile result is attached. Anything I can do to help future investigate?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=719072016-06-01T23:47:40ZRussell Islammisla011@fiu.edu
<ul></ul><p>If this is not a bug, better close it.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=720302016-06-03T19:06:09ZCasey Bodleycbodley@redhat.com
<ul></ul><p>Jiaying Ren wrote:</p>
<blockquote>
<p>After encouter this,I've enable oprofile option and re-compile,the profile result is attached. Anything I can do to help future investigate?</p>
</blockquote>
<p>Thanks for the profile data. If you're still able to reproduce this, could you turn on --debug-rgw=20 and see what shows up in the radosgw logs? If we're spinning somewhere, it will probably be spamming the logs with repeated output. That output should help us narrow down the cause.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=724982016-06-13T10:04:20ZBenoit Petitbpetit@b0rk.in
<ul></ul><p>Hi,</p>
<p>I face exactly the same problem. I have two rgw in multisite with the following characteristics:</p>
<p>CentOS Linux release 7.1.1503 (Core)<br />Linux cephrgw-lab-01-ber 3.10.0-229.el7.x86_64<br />Running radosgw with the following command: /usr/bin/radosgw -d --cluster ceph --debug_ms 5 --name client.rgw.cephrgw-lab-01-ber --setuser ceph --setgroup ceph --debug-rgw=20 > rgw.log 2>&1<br />ceph version: ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)</p>
<p>I've attached the logs (--debug-rgw=20).</p>
<p>Please tell me if I have to open another ticket. (And sorry if I had to)</p>
<p>Thanks for your time.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=724992016-06-13T10:06:51ZBenoit Petitbpetit@b0rk.in
<ul><li><strong>File</strong> <a href="/attachments/download/2367/rgw.log.log">rgw.log.log</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/2367/rgw.log.log">View</a> added</li></ul><p>It's better with the log file</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=726572016-06-14T18:54:18ZOrit Wassermanowasserm@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Casey Bodley</i></li></ul> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=729182016-06-16T15:07:00ZBenoit Petitbpetit@b0rk.in
<ul></ul><p>Just in case it could help, I've attached a performance record captured with perf record (perf version 3.10.0-327.18.2.el7.x86_64.debug on centos 7) on the radosgw pid. It can be read with perf report -i perf.data.</p>
<p>Thanks,</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=729202016-06-16T15:12:45ZBenoit Petitbpetit@b0rk.in
<ul></ul><p>Hm, nope. Can't upload it as it is hard to get a record smaller than 2Mo and I get a request entity too large as sonn as my attachment exceeds 1Mo.</p>
<p>Here it is: [[<a class="external" href="https://framadrop.org/r/xdOZIgBRxA#0CvhDDDOw1nFXjc6lRw89jf5A099pPpNItFkGIg/JdE=">https://framadrop.org/r/xdOZIgBRxA#0CvhDDDOw1nFXjc6lRw89jf5A099pPpNItFkGIg/JdE=</a>]]</p>
<p>Thanks</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=740672016-07-08T00:00:00ZRussell Islammisla011@fiu.edu
<ul></ul><p>This is still in version 10.2.2. Can we get some update on this?</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=748102016-07-14T19:03:15ZCasey Bodleycbodley@redhat.com
<ul></ul><p>Russell Islam wrote:</p>
<blockquote>
<p>This is still in version 10.2.2. Can we get some update on this?</p>
</blockquote>
<p>We've still been unable to reproduce this in testing, though we have seen issues with older versions of libcurl; can you provide the version you're running? (curl --version)</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=748122016-07-14T20:34:12ZRussell Islammisla011@fiu.edu
<ul></ul><p>[root@ceph-client7 ceph-config]# curl --version<br />curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.19.1 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3<br />Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp <br />Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=748902016-07-18T14:05:25ZCasey Bodleycbodley@redhat.com
<ul></ul><p>Russell Islam wrote:</p>
<blockquote>
<p>[root@ceph-client7 ceph-config]# curl --version<br />curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.19.1 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3<br />Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp <br />Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz</p>
</blockquote>
<p>Thank you. 7.29 is the version we had some downstream issues with in RHEL. We make heavy use of curl_multi_wait(), and 7.29 is missing some fixes that were leading to deadlocks in our case. Would you be willing to test with a more recent version of curl? If not, I can set up a centos vm and give it a try.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=749142016-07-18T16:49:18ZRussell Islammisla011@fiu.edu
<ul></ul><p>Thanks for the update. I will test this issue with later version of curl and keep you posted here.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=751602016-07-21T20:21:12ZRussell Islammisla011@fiu.edu
<ul></ul><p>Tested with later version of curl. In my case 7.43. Got rid of this issue.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=751612016-07-21T20:29:05ZCasey Bodleycbodley@redhat.com
<ul></ul><p>Russell Islam wrote:</p>
<blockquote>
<p>Tested with later version of curl. In my case 7.43. Got rid of this issue.</p>
</blockquote>
<p>Good to know, thank you very much for testing. That means we can reproduce by running against older versions of libcurl to get to the bottom of this.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=751622016-07-21T20:42:52ZRussell Islammisla011@fiu.edu
<ul></ul><blockquote>
<p>Good to know, thank you very much for testing. That means we can reproduce by running against older versions of libcurl to get to the bottom of this.</p>
</blockquote>
<p>Yes. You are right.</p> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=757562016-08-02T13:21:58ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/16695">Bug #16695</a>: radosgw Consumes too much CPU time to synchronize metadata or data between multisite</i> added</li></ul> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=780122016-09-07T17:19:25ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/17052">Bug #17052</a>: unittest_http_manager times out</i> added</li></ul> rgw - Bug #15915: rgw command is consuming all the cpu timehttps://tracker.ceph.com/issues/15915?journal_id=875492017-03-14T17:56:05ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul>