Project

General

Profile

Actions

Bug #10697

closed

dumpling: valgrind leak ReplicatedPG::get_object_context

Added by Loïc Dachary about 9 years ago. Updated about 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2015-01-29_15:41:06-rados-dumpling-backports---basic-multi/730305/

reports an object allocated by ReplicatedPG::get_object_context has been leaked (v.txt)


Files

v.txt (5.25 KB) v.txt valgrind output Loïc Dachary, 01/30/2015 02:35 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #10262: osd/osd_types.h: 2944: FAILED assert(rwstate.empty())ResolvedSamuel Just12/06/2014

Actions
Actions #1

Updated by Loïc Dachary about 9 years ago

  • Backport set to dumpling
Actions #2

Updated by Loïc Dachary about 9 years ago

Although the OSD does not crash, I think the leak is real.

In the OSD.3 logs I see

2015-01-29 23:44:36.622160 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] create_object_context 0x22b7f340 6b2cdaff/1.00000000/head//1 0
2015-01-29 23:44:36.637867 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] populate_obc_watchers 6b2cdaff/1.00000000/head//1
2015-01-29 23:44:36.644112 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc 6b2cdaff/1.00000000/head//1
2015-01-29 23:44:36.654598 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] find_object_context 6b2cdaff/1.00000000/head//1 @head
2015-01-29 23:44:36.668981 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op mode is idle(wr=0)
2015-01-29 23:44:36.678771 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op mode now rmw(wr=0)
2015-01-29 23:44:36.710681 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_op 6b2cdaff/1.00000000/head//1 [tmapput 0~606] ov 0'0 av 8'1 snapc 0=[] snapset 0=[]:[]
2015-01-29 23:44:36.726848 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op 6b2cdaff/1.00000000/head//1 [tmapput 0~606]
2015-01-29 23:44:36.733591 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op  tmapput 0~606
2015-01-29 23:44:36.744351 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] tmapput key .ceph_head
2015-01-29 23:44:36.758805 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op 6b2cdaff/1.00000000/head//1 [writefull 0~606]
2015-01-29 23:44:36.760314 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op  writefull 0~606
2015-01-29 23:44:36.786023 1a12d700 15 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] do_osd_op_effects on session 0x229c76b0
2015-01-29 23:44:36.794864 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] make_writeable 6b2cdaff/1.00000000/head//1 snapset=0x22b7ea30  snapc=0=[]
2015-01-29 23:44:36.810427 1a12d700 20 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean] make_writeable 6b2cdaff/1.00000000/head//1 done, snapset=0=[]:[]+head
2015-01-29 23:44:36.825370 1a12d700 15 filestore(/var/lib/ceph/osd/ceph-3) getattr 1.f_head/6b2cdaff/1.00000000/snapdir//1 '_'
2015-01-29 23:44:36.828922 14121700  1 -- 10.214.132.26:6802/1426 <== mds.0 10.214.132.26:6800/1429 2 ==== osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4 ==== 155+0+227 (2563151455 0 3808907663) 0x22b44ae0 con 0x2294a2f0
2015-01-29 23:44:36.829890 14121700 10 osd.3 8 do_waiters -- start
2015-01-29 23:44:36.829969 14121700 10 osd.3 8 do_waiters -- finish
2015-01-29 23:44:36.830033 14121700 20 osd.3 8 _dispatch 0x22b44ae0 osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4
2015-01-29 23:44:36.831723 14121700 15 osd.3 8 require_same_or_newer_map 8 (i am 8) 0x22b44ae0
2015-01-29 23:44:36.831829 14121700 20 osd.3 8 _share_map_incoming mds.0 10.214.132.26:6800/1429 8
2015-01-29 23:44:36.832611 14121700 15 osd.3 8 enqueue_op 0x22b8f3e0 prio 64 cost 227 latency 0.655201 osd_op(mds.0.1:23 606.00000000 [tmapup 0~0] 1.f89eaaf4 RETRY=1 e8) v4
2015-01-29 23:44:36.833074 14121700 10 osd.3 8 do_waiters -- start
2015-01-29 23:44:36.833146 14121700 10 osd.3 8 do_waiters -- finish
2015-01-29 23:44:36.835185 1a12d700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/1.f_head/1.00000000__snapdir_6B2CDAFF__1 with flags=2: (2) No such file or directory
2015-01-29 23:44:36.835484 1a12d700 10 filestore(/var/lib/ceph/osd/ceph-3) getattr 1.f_head/6b2cdaff/1.00000000/snapdir//1 '_' = -2
2015-01-29 23:44:36.838203 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean]  set mtime to 2015-01-29 23:44:22.165396
2015-01-29 23:44:36.869767 1a12d700 10 osd.3 pg_epoch: 8 pg[1.f( empty local-les=8 n=0 ec=1 les/c 8/8 4/4/4) [3,4] r=0 lpr=4 mlcod 0'0 active+clean]  final snapset 0=[]:[]+head in 6b2cdaff/1.00000000/head//1

which starts with
create_object_context 0x22b7f340 6b2cdaff/1.00000000/head//1

which shows the code path goes through http://workbench.dachary.org/ceph/ceph/blob/dumpling/src/osd/ReplicatedPG.cc#L4585 which is the line incriminated by valgrind in the trace above. A few log lines later, we have
final snapset 0=[]:[]+head in 6b2cdaff/1.00000000/head//1

which means http://workbench.dachary.org/ceph/ceph/blob/dumpling/src/osd/ReplicatedPG.cc#L3827 has been called:
ctx->snapset_obc = get_object_context(snapoid, true);

therefore the existing snapset_obc has been leaked. I must confess that I don't fully understand the implications of the patch. And in particular why the test
  } else if (ctx->new_snapset.clones.size() &&
         (!ctx->snapset_obc || !ctx->snapset_obc->obs.exists)) {

instead of
  } else if (ctx->new_snapset.clones.size()) {

is necessary.

Actions #3

Updated by Samuel Just about 9 years ago

Ah, I think you are right.

Actions #4

Updated by Loïc Dachary about 9 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF