Project

General

Profile

Actions

Bug #45916

open

cls_lock: unlimited shared lock created by libradosstriper api let node crash

Added by Zhenyi Shu almost 4 years ago. Updated almost 4 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
librados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Background: Ceph liminous are running on our production and a service uses libradosstriper api to access ceph.

We found that multi instances to read/write a same object at the same time will cause the attribute named 'lock.striper.lock' bigger and bigger and let node crash finally. We also found this bug will let the size of rocksdb reach 2TB on our production.

These logs are pasted from develop env:

2020-06-08 06:45:50.744365 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240865 (7'239300,7'240865] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240865 lcod 7'240864 mlcod 7'240864 active+clean] do_osd_op 1:503f3c5a:::3.blk.0000000000000000:head [setxattr lock.striper.lock (37818)]
2020-06-08 06:45:50.744378 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240865 (7'239300,7'240865] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240865 lcod 7'240864 mlcod 7'240864 active+clean] do_osd_op  setxattr lock.striper.lock (37818)
2020-06-08 06:45:50.746581 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240866 (7'239300,7'240866] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240866 lcod 7'240865 mlcod 7'240865 active+clean] do_osd_op 1:503f3c5a:::3.blk.0000000000000000:head [setxattr lock.striper.lock (37710)]
2020-06-08 06:45:50.746594 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240866 (7'239300,7'240866] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240866 lcod 7'240865 mlcod 7'240865 active+clean] do_osd_op  setxattr lock.striper.lock (37710)
2020-06-08 06:45:50.749465 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240867 (7'239300,7'240867] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240867 lcod 7'240866 mlcod 7'240866 active+clean] do_osd_op 1:503f3c5a:::3.blk.0000000000000000:head [setxattr lock.striper.lock (37818)]
2020-06-08 06:45:50.749477 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240867 (7'239300,7'240867] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240867 lcod 7'240866 mlcod 7'240866 active+clean] do_osd_op  setxattr lock.striper.lock (37818)
2020-06-08 06:45:50.751825 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240868 (7'239300,7'240868] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240868 lcod 7'240867 mlcod 7'240867 active+clean] do_osd_op 1:503f3c5a:::3.blk.0000000000000000:head [setxattr lock.striper.lock (37710)]
2020-06-08 06:45:50.751837 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240868 (7'239300,7'240868] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240868 lcod 7'240867 mlcod 7'240867 active+clean] do_osd_op  setxattr lock.striper.lock (37710)
2020-06-08 06:45:50.754369 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240869 (7'239300,7'240869] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240869 lcod 7'240868 mlcod 7'240868 active+clean] do_osd_op 1:503f3c5a:::3.blk.0000000000000000:head [setxattr lock.striper.lock (37818)]
2020-06-08 06:45:50.754381 7f793e352700 10 osd.0 pg_epoch: 7 pg[1.2( v 7'240869 (7'239300,7'240869] local-lis/les=5/6 n=7 ec=4/4 lis/c 5/5 les/c/f 6/6/0 5/5/5) [0] r=0 lpr=5 crt=7'240869 lcod 7'240868 mlcod 7'240868 active+clean] do_osd_op  setxattr lock.striper.lock (37818)
Actions #2

Updated by Neha Ojha almost 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 35467
Actions

Also available in: Atom PDF