https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-03-15T18:34:51ZCeph rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1092072018-03-15T18:34:51ZYehuda Sadehyehuda@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Mark Kogan</i></li></ul> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1094142018-03-21T06:34:58Zwei jinwjin.cn@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/3368/rgw.png">rgw.png</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/3368/rgw.png">View</a> added</li></ul><p>We tuned rados cluster and closing rgw dynamic sharding, and benchmarked radosgw performance again. We got the same result (qps will decrease rapidly, but after restarting gateway, it will recover).</p>
<p>We deployed six gateways, seems all of them are ok. So where might be the bottleneck or regression?</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1095342018-03-22T17:59:53ZCasey Bodleycbodley@redhat.com
<ul></ul><p>Have you looked at radosgw's memory usage? There was an issue with memory growth in <a class="external" href="https://tracker.ceph.com/issues/23207">https://tracker.ceph.com/issues/23207</a> that may be related.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1096042018-03-27T08:36:36Zwei jinwjin.cn@gmail.com
<ul></ul><p>Casey Bodley wrote:</p>
<blockquote>
<p>Have you looked at radosgw's memory usage? There was an issue with memory growth in <a class="external" href="https://tracker.ceph.com/issues/23207">https://tracker.ceph.com/issues/23207</a> that may be related.</p>
</blockquote>
<p>I tried flame graph and found that tcmalloc costs too much time during performance regression. However, I am not sure whether there are other factors that affect the performance. I will try that patch later to verify.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1096192018-03-27T14:56:15Zwei jinwjin.cn@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/3392/throughput%20Graph.png">throughput Graph.png</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/3392/throughput%20Graph.png">View</a> added</li></ul><p>I tried patch (<a class="external" href="https://github.com/ceph/ceph/pull/20953">https://github.com/ceph/ceph/pull/20953</a>) with one gateway, still the same problem.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097272018-03-28T10:03:46ZMark Koganmkogan@redhat.com
<ul></ul><p>@wei jin, Is it possible to share the cosbench workload xml please (redacting ip addresses/keys/passwords...)<br />To see the workload parameters like number of objects and object sizes, etc, <br />And I will try to run this workload on our system and check.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097282018-03-28T10:14:20Zwei jinwjin.cn@gmail.com
<ul></ul><pre><code class="xml syntaxhl"><span class="CodeRay"><span class="preprocessor"><?xml version="1.0" encoding="UTF-8" ?></span>
<span class="tag"><workload</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">s3-sample</span><span class="delimiter">"</span></span> <span class="attribute-name">description</span>=<span class="string"><span class="delimiter">"</span><span class="content">sample benchmark for s3</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"><storage</span> <span class="attribute-name">type</span>=<span class="string"><span class="delimiter">"</span><span class="content">s3</span><span class="delimiter">"</span></span> <span class="attribute-name">config</span>=<span class="string"><span class="delimiter">"</span><span class="content">accesskey=FOO;secretkey=BAR;endpoint=http://IP:PORT</span><span class="delimiter">"</span></span> <span class="tag">/></span>
<span class="tag"><workflow></span>
<span class="tag"><workstage</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">init</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"><work</span> <span class="attribute-name">type</span>=<span class="string"><span class="delimiter">"</span><span class="content">init</span><span class="delimiter">"</span></span> <span class="attribute-name">workers</span>=<span class="string"><span class="delimiter">"</span><span class="content">1</span><span class="delimiter">"</span></span> <span class="attribute-name">config</span>=<span class="string"><span class="delimiter">"</span><span class="content">cprefix=s3testqwer;containers=r(1,1)</span><span class="delimiter">"</span></span> <span class="tag">/></span>
<span class="tag"></workstage></span>
<span class="tag"><workstage</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">prepare</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"><work</span> <span class="attribute-name">type</span>=<span class="string"><span class="delimiter">"</span><span class="content">prepare</span><span class="delimiter">"</span></span> <span class="attribute-name">workers</span>=<span class="string"><span class="delimiter">"</span><span class="content">1024</span><span class="delimiter">"</span></span> <span class="attribute-name">config</span>=<span class="string"><span class="delimiter">"</span><span class="content">cprefix=s3testqwer;containers=r(1,1);objects=r(1,100000000);sizes=c(32)KB</span><span class="delimiter">"</span></span> <span class="tag">/></span>
<span class="tag"></workstage></span>
<span class="tag"></workflow></span>
<span class="tag"></workload></span>
</span></code></pre>
<p>Here it is. I tried it yesterday with one gateway using s3 protocol. A few days ago, I tried 6 gateways using swift client.</p>
<p>Seems both s3/swift clients have performance regression.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097302018-03-28T11:17:24ZMark Koganmkogan@redhat.com
<ul></ul><p>Thank you,</p>
<p>To summarize the workload - <br />1024 cosbench workers are writing 32KB objects into a single bucket.</p>
<p>Is it possible to please share the ceph.conf also to see the configured threads number and other tunables?</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097322018-03-28T11:38:03Zwei jinwjin.cn@gmail.com
<ul></ul><p>[global]<br />fsid = c105ca4e-b864-46d1-8ab2-74e0dd8b966a<br />public_network = 10.15.49.0/24<br />mon_initial_members = n15-049-194, n15-049-208, n15-049-222<br />mon_host = 10.15.49.194,10.15.49.208,10.15.49.222</p>
<p>auth_cluster_required = cephx<br />auth_service_required = cephx<br />auth_client_required = cephx</p>
<p>osd_pool_default_size = 3<br />osd_pool_default_min_size = 1</p>
<p>mon_osd_down_out_interval = 2592000</p>
<p>rgw_override_bucket_index_max_shards = 65521<br />rgw_dynamic_resharding = false<br />rgw_bucket_index_max_aio = 256<br />rgw_num_rados_handles = 8<br />rgw_thread_pool_size = 512<br />rgw_cache_lru_size = 1000000<br />rgw_gc_max_objs = 1000<br />rgw_gc_obj_min_wait = 0<br />rgw_gc_processor_period = 0<br />#rgw_max_chunk_size = 4194304</p>
<p>debug_asok = 0/0<br />debug_auth = 0/0<br />debug_bluefs = 0/0<br />debug_bluestore = 0/0<br />debug_buffer = 0/0<br />debug_civetweb = 0/0<br />debug_client = 0/0<br />debug_context = 0/0<br />debug_crush = 0/0<br />debug_filer = 0/0<br />debug_filestore = 0/0<br />debug_finisher = 0/0<br />debug_heartbeatmap = 0/0<br />debug_journal = 0/0<br />debug_journaler = 0/0<br />debug_lockdep = 0/0<br />debug_log = 0<br />debug_mds = 0/0<br />debug_mds_balancer = 0/0<br />debug_mds_locker = 0/0<br />debug_mds_log = 0/0<br />debug_mds_log_expire = 0/0<br />debug_mds_migrator = 0/0<br />debug_mon = 0/0<br />debug_monc = 0/0<br />debug_ms = 0/0<br />debug_objclass = 0/0<br />debug_objectcatcher = 0/0<br />debug_objecter = 0/0<br />debug_osd = 0/0<br />debug_paxos = 0/0<br />debug_perfcounter = 0/0<br />debug_rados = 0/0<br />debug_rbd = 0/0<br />debug_rgw = 0/0<br />debug_rocksdb = 0/0<br />debug_throttle = 0/0<br />debug_timer = 0/0<br />debug_tp = 0/0<br />debug_zs = 0/0</p>
<p>[mon]<br />mon_clock_drift_allowed = 1<br />mon_osd_full_ratio = 0.90<br />mon_osd_nearfull_ratio = 0.75</p>
<p>[mds]<br />mds_log_max_expiring = 200</p>
<p>mds_cache_size = 100000000<br />mds_client_prealloc_inos = 100000</p>
<p>mds_beacon_grace = 200<br />mds_beacon_interval = 10<br />mds_session_timeout = 300<br />mds_reconnect_timeout = 100</p>
<p>[osd]<br />ms_dispatch_throttle_bytes = 1048576000<br />objecter_inflight_ops = 10000<br />objecter_inflight_op_bytes = 1048576000<br />osd_client_message_cap = 10000<br />osd_client_message_size_cap = 1048576000<br />osd_max_write_size = 512</p>
<p>osd_num_op_tracker_shard = 64</p>
<p>osd_scrub_during_recovery = false<br />osd_scrub_sleep = 2<br />osd_scrub_min_interval = 2592000<br />osd_scrub_max_interval = 5184000<br />osd_scrub_begin_hour= 2<br />osd_scrub_end_hour= 8<br />osd_scrub_load_threshold = 5<br />osd_recovery_max_active = 1</p>
<p>osd_op_thread_timeout = 280<br />osd_op_thread_suicide_timeout = 300<br />osd_recovery_thread_timeout = 280<br />osd_recovery_thread_suicide_timeout = 300</p>
<p>osd_op_threads = 4<br />osd_disk_threads = 2<br />osd_op_num_threads_per_shard = 2<br />osd_op_num_shards = 8</p>
<p>osd_pg_object_context_cache_count = 10000<br />osd_map_cache_size = 1024</p>
<p>bluestore_min_alloc_size_ssd = 32768<br />bluestore_bluefs_balance_interval = 30<br />bluestore_cache_trim_interval = 60<br />bluestore_cache_size_ssd = 10737418240 #10g<br />bluestore_throttle_bytes = 53687091200 #512m<br />bluestore_throttle_deferred_bytes = 107374182400 #1g<br />bluestore_cache_kv_max = 2147483648 #2g</p>
<p>bluestore_rocksdb_options =compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=7,max_bytes_for_level_base=536870912,compaction_threads=32,flusher_threads=8</p>
<p>bluefs_buffered_io = true<br />bluestore_csum_type = none</p>
<p>#bluestore_min_alloc_size = 65536<br />#bluestore_extent_map_shard_max_size = 200<br />#bluestore_extent_map_shard_target_size = 100<br />#bluestore_extent_map_shard_min_size = 50</p>
<p>[client.rgw.n15-049-194]<br />rgw_frontends = "civetweb port=80 num_threads=512 enable_keep_alive=yes request_timeout_ms=50000"</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097342018-03-28T11:48:20Zwei jinwjin.cn@gmail.com
<ul></ul><p>rgw_gc_max_objs = 1000<br />rgw_gc_obj_min_wait = 0<br />rgw_gc_processor_period = 0</p>
<p>gc related configs may be ignored, I just want to test trim speed (my requirement is described here: <a class="external" href="https://github.com/ceph/ceph/pull/20546">https://github.com/ceph/ceph/pull/20546</a>). Reset them to default values could reproduce the issue too.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097352018-03-28T12:11:09ZMark Koganmkogan@redhat.com
<ul></ul><p>Thank you very much,</p>
<p>I see that there are 512 RGW threads and 1024 cosbench workers,</p>
<p>is it possible to test with a reduced number of cosbench workers<br />(less than RGW threads), for example, 500 cosbench workers?</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097362018-03-28T12:37:47Zwei jinwjin.cn@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/3394/throughput%20Graph%20(1).png">throughput Graph (1).png</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/3394/throughput%20Graph%20(1).png">View</a> added</li></ul><p>1. comment gc related configures<br />2. restart all daemons (mon/osd/rgw) <br />3. change cosbench worker to 500 and create a newly bucket for test</p>
<p>Still reproduced the regression, after restarting rgw daemon, qps recovered again :(</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1097432018-03-28T14:43:33ZMark Koganmkogan@redhat.com
<ul></ul><p>Thank you very much for testing and providing all the information,</p>
<p>I will run the workload on my test system, to try to reproduce and investigate.</p>
<p>Compared to which version is the regression? (Jewel ?)</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1098072018-03-29T03:14:54Zwei jinwjin.cn@gmail.com
<ul></ul><p>Mark Kogan wrote:</p>
<blockquote>
<p>Compared to which version is the regression? (Jewel ?)</p>
</blockquote>
<p>Actually this is my first time to test luminous (newly deployed cluster, not upgraded). All OSDs are using SSD disk, total 250+.</p>
<p>It might be more reasonable to use 'performance issue' instead of 'performance regression' in the title, sorry for the confusion.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1102532018-04-05T18:14:27ZMatt Benjaminmbenjamin@redhat.com
<ul></ul><p>rgw_num_rados_handles > 1 is not advisable (after change, need to scale up inflight_ops and inflight_op_bytes to compensate)</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1104362018-04-08T12:05:38Zwei jinwjin.cn@gmail.com
<ul></ul><p>Matt Benjamin wrote:</p>
<blockquote>
<p>rgw_num_rados_handles > 1 is not advisable (after change, need to scale up inflight_ops and inflight_op_bytes to compensate)</p>
</blockquote>
<p>Tried with rgw_num_rados_handles = 1, still reproduced the performance issue.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1107742018-04-10T12:09:16ZMark Koganmkogan@redhat.com
<ul></ul><p>I checked if this reproduces on two different clusters <br />(one with HDDs and one with SSDs) <br />using some of the provided ceph.conf [rgw] parameters that (I could configure on my cluster)<br />and the provided cosbench workload.</p>
<p>On my clusters, I did not see the performance degradation as described in this bug.</p>
<p>The ceph.conf customizations in the provided ceph.conf are very extensive,<br />The cause can possibly be configurations parameters that I can not use on my setup.</p>
<p>I would try to see if it happens also with a relatively small number of cosbench workers like 100,<br />and if so try to bisect the ceph.conf by commenting various tunings and re-checking,<br />it's possible that some parameter/s are over-tuned.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1107842018-04-10T13:45:16Zwei jinwjin.cn@gmail.com
<ul></ul><p>How long have you been running for the cosbench?</p>
<p>The issue might be related to memory usage according to flame graph I gathered before.<br />I tuned TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES of file /etc/default/ceph, and compared 128MB (default) and 256MB, 256MB was much better, but both can reproduce it.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1110232018-04-12T14:29:22ZMark Koganmkogan@redhat.com
<ul></ul><p>Very interesting, please tell how do you generate the flame graph, I will perform the same check on my system and update.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1110242018-04-12T14:43:54Zwei jinwjin.cn@gmail.com
<ul></ul><pre><code class="text syntaxhl"><span class="CodeRay">git clone https://github.com/brendangregg/FlameGraph
perf record -e cpu-clock --call-graph dwarf -p 3481456 -- sleep 30 #3481456 is gateway pid
perf script | ./FlameGraph/stackcollapse-perf.pl > rgw-perf.out
./FlameGraph/flamegraph.pl rgw-perf.out > rgw.svg
</span></code></pre> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1142622018-05-29T14:58:46ZMark Koganmkogan@redhat.com
<ul></ul><p>I was able to reproduce this and traced it back to the "rgw_cache_expiry_interval" parameter, the default value is 900 seconds.</p>
<p>If its possible please verify that it mitigates the issue on your system by increasing the value like below in ceph.conf :<br />rgw_cache_expiry_interval = 9000</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1143052018-05-30T08:34:31ZMark Koganmkogan@redhat.com
<ul></ul><p>Following internal discussion and verification on the test system<br />While we are debugging the issue its currently recommended to disable the cache expiry by<br />setting the interval to 0 in ceph.conf :</p>
<p>rgw_cache_expiry_interval = 0</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1143062018-05-30T08:39:36Zwei jinwjin.cn@gmail.com
<ul></ul><p>Mark Kogan wrote:</p>
<blockquote>
<p>Following internal discussion and verification on the test system<br />While we are debugging the issue its currently recommended to disable the cache expiry by<br />setting the interval to 0 in ceph.conf :</p>
<p>rgw_cache_expiry_interval = 0</p>
</blockquote>
<p>Thanks. I was just wondering why not set it to zero.<br />I am sorry I have no test cluster available to verify it now, may try it later.</p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1146462018-06-05T09:10:17ZMark Koganmkogan@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>Fix PR: <a class="external" href="https://github.com/ceph/ceph/pull/22410">https://github.com/ceph/ceph/pull/22410</a></p> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1155772018-06-22T13:47:01ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>luminous mimic</i></li></ul> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1155972018-06-22T15:54:39ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/24632">Backport #24632</a>: luminous: rgw performance regression for luminous 12.2.4</i> added</li></ul> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1155992018-06-22T15:54:41ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/24633">Backport #24633</a>: mimic: rgw performance regression for luminous 12.2.4</i> added</li></ul> rgw - Bug #23379: rgw performance regression for luminous 12.2.4https://tracker.ceph.com/issues/23379?journal_id=1192652018-08-22T13:08:32ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>