Actions
Bug #10697
closeddumpling: valgrind leak ReplicatedPG::get_object_context
Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.
% Done:
0%
Source:
other
Tags:
Backport:
dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
http://pulpito.ceph.com/loic-2015-01-29_15:41:06-rados-dumpling-backports---basic-multi/730305/
reports an object allocated by ReplicatedPG::get_object_context has been leaked (v.txt)
Files
Updated by Loïc Dachary over 9 years ago
Although the OSD does not crash, I think the leak is real.
In the OSD.3 logs I see
2015-01-29 23:44:36.622160 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] create_object_context 0x22b7f340 6b2cdaff/1.00000000/head//1 0 2015-01-29 23:44:36.637867 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] populate_obc_watchers 6b2cdaff/1.00000000/head//1 2015-01-29 23:44:36.644112 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc 6b2cdaff/1.00000000/head//1 2015-01-29 23:44:36.654598 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] find_object_context 6b2cdaff/1.00000000/head//1 @head 2015-01-29 23:44:36.668981 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op mode is idle(wr=0) 2015-01-29 23:44:36.678771 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op mode now rmw(wr=0) 2015-01-29 23:44:36.710681 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op 6b2cdaff/1.00000000/head//1 [tmapput 0~606] ov 0'0 av 8'1 snapc 0=[] snapset 0=[]:[] 2015-01-29 23:44:36.726848 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op 6b2cdaff/1.00000000/head//1 [tmapput 0~606] 2015-01-29 23:44:36.733591 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op tmapput 0~606 2015-01-29 23:44:36.744351 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] tmapput key .ceph_head 2015-01-29 23:44:36.758805 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op 6b2cdaff/1.00000000/head//1 [writefull 0~606] 2015-01-29 23:44:36.760314 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op writefull 0~606 2015-01-29 23:44:36.786023 1a12d700 15 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op_effects on session 0x229c76b0 2015-01-29 23:44:36.794864 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] make_writeable 6b2cdaff/1.00000000/head//1 snapset=0x22b7ea30 snapc=0=[] 2015-01-29 23:44:36.810427 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] make_writeable 6b2cdaff/1.00000000/head//1 done, snapset=0=[]:[]+head 2015-01-29 23:44:36.825370 1a12d700 15 filestore(/var/lib/ceph/osd/ceph-3) getattr 1.f_head/6b2cdaff/1.00000000/snapdir//1 '_' 2015-01-29 23:44:36.828922 14121700 1 -- 10.214.132.26:6802/1426 <== mds.0 10.214.132.26:6800/1429 2 ==== osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4 ==== 155+0+227 (2563151455 0 3808907663) 0x22b44ae0 con 0x2294a2f0 2015-01-29 23:44:36.829890 14121700 10 osd.3 8 do_waiters -- start 2015-01-29 23:44:36.829969 14121700 10 osd.3 8 do_waiters -- finish 2015-01-29 23:44:36.830033 14121700 20 osd.3 8 _dispatch 0x22b44ae0 osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4 2015-01-29 23:44:36.831723 14121700 15 osd.3 8 require_same_or_newer_map 8 (i am 8) 0x22b44ae0 2015-01-29 23:44:36.831829 14121700 20 osd.3 8 _share_map_incoming mds.0 10.214.132.26:6800/1429 8 2015-01-29 23:44:36.832611 14121700 15 osd.3 8 enqueue_op 0x22b8f3e0 prio 64 cost 227 latency 0.655201 osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4 2015-01-29 23:44:36.833074 14121700 10 osd.3 8 do_waiters -- start 2015-01-29 23:44:36.833146 14121700 10 osd.3 8 do_waiters -- finish 2015-01-29 23:44:36.835185 1a12d700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/1.f_head/1.00000000__snapdir_6B2CDAFF__1 with flags=2: (2) No such file or directory 2015-01-29 23:44:36.835484 1a12d700 10 filestore(/var/lib/ceph/osd/ceph-3) getattr 1.f_head/6b2cdaff/1.00000000/snapdir//1 '_' = -2 2015-01-29 23:44:36.838203 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] set mtime to 2015-01-29 23:44:22.165396 2015-01-29 23:44:36.869767 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] final snapset 0=[]:[]+head in 6b2cdaff/1.00000000/head//1
which starts with
create_object_context 0x22b7f340 6b2cdaff/1.00000000/head//1
which shows the code path goes through http://workbench.dachary.org/ceph/ceph/blob/dumpling/src/osd/ReplicatedPG.cc#L4585 which is the line incriminated by valgrind in the trace above. A few log lines later, we have
final snapset 0=[]:[]+head in 6b2cdaff/1.00000000/head//1
which means http://workbench.dachary.org/ceph/ceph/blob/dumpling/src/osd/ReplicatedPG.cc#L3827 has been called:
ctx->snapset_obc = get_object_context(snapoid, true);
therefore the existing snapset_obc has been leaked. I must confess that I don't fully understand the implications of the patch. And in particular why the test
} else if (ctx->new_snapset.clones.size() && (!ctx->snapset_obc || !ctx->snapset_obc->obs.exists)) {
instead of
} else if (ctx->new_snapset.clones.size()) {
is necessary.
Updated by Loïc Dachary over 9 years ago
- Status changed from New to Duplicate
Actions