https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2011-01-11T10:51:51ZCeph Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=21642011-01-11T10:51:51ZSage Weilsage@newdream.net
<ul><li><strong>Category</strong> set to <i>OSD</i></li><li><strong>Assignee</strong> set to <i>Colin McCabe</i></li><li><strong>Target version</strong> set to <i>v0.24.2</i></li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=23052011-01-28T10:50:38ZSage Weilsage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.24.2</i> to <i>v0.24.3</i></li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=23272011-01-31T12:06:59ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Target version</strong> changed from <i>v0.24.3</i> to <i>19</i></li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=23672011-02-04T14:15:17ZSage Weilsage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>19</i> to <i>v0.26</i></li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=23682011-02-04T14:15:39ZSage Weilsage@newdream.net
<ul><li><strong>Assignee</strong> deleted (<del><i>Colin McCabe</i></del>)</li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=23692011-02-04T14:16:02ZSage Weilsage@newdream.net
<ul><li><strong>Subject</strong> changed from <i>huge cosd memory usage with large number of objects</i> to <i>cosd memory usage with large number of pools</i></li></ul> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24022011-02-09T10:49:11ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Greg Farnum</i></li></ul><p>Taking a look now!</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24222011-02-10T08:38:11ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>I ran the OSD through massif and in one of my tests, creating 2000 pools (with default 8 PGs), 1/3 of the ending memory was used up by MPGStatsAck. Best guess is that the single pg_stat_queue_lock is slowing things down too much and the messages are stacking up.</p>
<p>Continuing to look for other potential issues.</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24582011-02-14T09:23:31ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>It seems the messages stacking up are just a result of massif slowing down operations too much.</p>
<p>According to massif, each PG takes ~14KB of memory when empty and without any peers.<br />There are some hash_maps taking up a bunch of unnecessary space, though -- converted a couple to maps and removed an unused one. This should be a significant decrease but haven't tested how much.</p>
<p>I'm probablying going to have to proceed testing with tcmalloc heap profiling rather than massif, since the massif performance is causing problems. Hopefully I can at least run tests via massif with peering to see how that impacts PG memory use, since massif is a bit more precise at profiling exactly where the memory use is going.</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24772011-02-16T21:22:51ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Hmm, I just tried to run one of my tests using 2 OSDs instead of one for the first time. It looks like it's still possible for them to get backed up on PG messages from the monitor -- I tried to send heapdump commands and it was I think minutes between when the monitor "sent" it and the OSD "received" it (according to the dout logs). Will update tomorrow, probably with a new bug.</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24832011-02-17T09:24:17ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>So, yeah. All the heapdump commands were sent <strong>after</strong> ceph -w reported the PGs were all active+clean, although I didn't check cosd CPU usage.<br /><pre>gregf@kai:~/testing/ceph/src/out$ grep heapdump mon.a | grep osd0 | less
2011-02-16 21:26:03.188484 7f6b53290710 mon.a@0(leader) e1 send_command osd0 10.0.1.205:6800/14185[heapdump]
2011-02-16 21:26:03.188499 7f6b53290710 mon.a@0(leader) e1 try_send_message mon_command(heapdump v 463) v1 to osd0 10.0.1.205:6800/14185
2011-02-16 21:26:03.188519 7f6b53290710 -- 10.0.1.205:6789/0 --> osd0 10.0.1.205:6800/14185 -- mon_command(heapdump v 463) v1 -- ?+0 0x7f6b480432a0
2011-02-16 21:26:43.467015 7f6b53290710 mon.a@0(leader) e1 send_command osd0 10.0.1.205:6800/14185[heapdump]
2011-02-16 21:26:43.467036 7f6b53290710 mon.a@0(leader) e1 try_send_message mon_command(heapdump v 463) v1 to osd0 10.0.1.205:6800/14185
2011-02-16 21:26:43.467057 7f6b53290710 -- 10.0.1.205:6789/0 --> osd0 10.0.1.205:6800/14185 -- mon_command(heapdump v 463) v1 -- ?+0 0x7f6b44062910
2011-02-16 21:27:17.594580 7f6b53290710 mon.a@0(leader) e1 send_command osd0 10.0.1.205:6800/14185[heapdump]
2011-02-16 21:27:17.594596 7f6b53290710 mon.a@0(leader) e1 try_send_message mon_command(heapdump v 463) v1 to osd0 10.0.1.205:6800/14185
2011-02-16 21:27:17.594616 7f6b53290710 -- 10.0.1.205:6789/0 --> osd0 10.0.1.205:6800/14185 -- mon_command(heapdump v 463) v1 -- ?+0 0x7f6b44062910
2011-02-16 21:27:31.665497 7f6b53290710 mon.a@0(leader) e1 send_command osd0 10.0.1.205:6800/14185[heapdump]
2011-02-16 21:27:31.665512 7f6b53290710 mon.a@0(leader) e1 try_send_message mon_command(heapdump v 463) v1 to osd0 10.0.1.205:6800/14185
2011-02-16 21:27:31.665533 7f6b53290710 -- 10.0.1.205:6789/0 --> osd0 10.0.1.205:6800/14185 -- mon_command(heapdump v 463) v1 -- ?+0 0x7f6b4c05d790
2011-02-16 21:29:31.038685 7f6b53290710 mon.a@0(leader) e1 send_command osd0 10.0.1.205:6800/14185[heapdump]
2011-02-16 21:29:31.038700 7f6b53290710 mon.a@0(leader) e1 try_send_message mon_command(heapdump v 463) v1 to osd0 10.0.1.205:6800/14185
2011-02-16 21:29:31.038720 7f6b53290710 -- 10.0.1.205:6789/0 --> osd0 10.0.1.205:6800/14185 -- mon_command(heapdump v 463) v1 -- ?+0 0x7f6b48017550
</pre><br /><pre>gregf@kai:~/testing/ceph/src/out$ grep heapdump osd.0 | less
2011-02-16 21:37:59.379852 7f8999b57710 -- 10.0.1.205:6800/14185 <== mon0 10.0.1.205:6789/0 1163 ==== mon_command(heapdump v 463) v1 ==== 50+0+0 (3078000875 0 0) 0x2c91700 con 0x19e1500
2011-02-16 21:37:59.379885 7f8999b57710 osd0 462 _dispatch 0x2c91700 mon_command(heapdump v 463) v1
2011-02-16 21:37:59.379907 7f8999b57710 osd0 462 handle_command args: [heapdump]
2011-02-16 21:37:59.478496 7f8999b57710 -- 10.0.1.205:6800/14185 <== mon0 10.0.1.205:6789/0 1171 ==== mon_command(heapdump v 463) v1 ==== 50+0+0 (3078000875 0 0) 0x258f8c0 con 0x19e1500
2011-02-16 21:37:59.478528 7f8999b57710 osd0 462 _dispatch 0x258f8c0 mon_command(heapdump v 463) v1
2011-02-16 21:37:59.478551 7f8999b57710 osd0 462 handle_command args: [heapdump]
2011-02-16 21:37:59.578440 7f8999b57710 -- 10.0.1.205:6800/14185 <== mon0 10.0.1.205:6789/0 1179 ==== mon_command(heapdump v 463) v1 ==== 50+0+0 (3078000875 0 0) 0x47ffa80 con 0x19e1500
2011-02-16 21:37:59.578472 7f8999b57710 osd0 462 _dispatch 0x47ffa80 mon_command(heapdump v 463) v1
2011-02-16 21:37:59.578494 7f8999b57710 osd0 462 handle_command args: [heapdump]
2011-02-16 21:37:59.611745 7f8999b57710 -- 10.0.1.205:6800/14185 <== mon0 10.0.1.205:6789/0 1182 ==== mon_command(heapdump v 463) v1 ==== 50+0+0 (3078000875 0 0) 0x47ff540 con 0x19e1500
2011-02-16 21:37:59.611777 7f8999b57710 osd0 462 _dispatch 0x47ff540 mon_command(heapdump v 463) v1
2011-02-16 21:37:59.611799 7f8999b57710 osd0 462 handle_command args: [heapdump]
2011-02-16 21:38:02.505711 7f8999b57710 -- 10.0.1.205:6800/14185 <== mon0 10.0.1.205:6789/0 1219 ==== mon_command(heapdump v 463) v1 ==== 50+0+0 (3078000875 0 0) 0x2cbc000 con 0x19e1500
2011-02-16 21:38:02.505751 7f8999b57710 osd0 463 _dispatch 0x2cbc000 mon_command(heapdump v 463) v1
2011-02-16 21:38:02.505774 7f8999b57710 osd0 463 handle_command args: [heapdump]</pre></p>
<p>Presumably this is related to DongJin's issue <a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: 1). PG bits don't get recognized and 2). Takes too long for OSDs to boot up. (Duplicate)" href="https://tracker.ceph.com/issues/810">#810</a>, and maybe to Jim Schutt's trouble on the list with large clusters heartbeating.</p>
<p>Have we done something to make PGs and peering more expensive recently, or is the system just not well-tested with large numbers of PGs like this and others are seeing it because the system doesn't create intelligent default PG numbers for large clusters?</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=24942011-02-18T15:25:09ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>I merged code which removes some hash_maps in <a class="changeset" title="Merge branch 'pool_memory'" href="https://tracker.ceph.com/projects/ceph/repository/revisions/a7929c5e265d1b6502733ee9525fd93bbcfc739e">a7929c5e265d1b6502733ee9525fd93bbcfc739e</a>; this takes per-PG memory use from ~14KB to ~4KB in the case of empty PGs.</p>
<p>I will review this on an ongoing basis but for now issues with memory fragmentation and map churn are far more significant than the actual PG memory use.</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=25242011-02-27T19:42:00ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>I'm closing this since it's become apparent that the actual memory use issues are less related to the in-memory objects and more about issues with message spam, peering, and general problems with odd memory use, all of which we're working on!</p> Ceph - Bug #698: cosd memory usage with large number of poolshttps://tracker.ceph.com/issues/698?journal_id=27402011-03-13T14:39:26ZSage Weilsage@newdream.net
<ul><li><strong>translation missing: en.field_story_points</strong> set to <i>3</i></li><li><strong>translation missing: en.field_position</strong> set to <i>1</i></li><li><strong>translation missing: en.field_position</strong> changed from <i>1</i> to <i>543</i></li></ul>