Project

General

Profile

Bug #19131

snap trim blocked behind ec read, never woken, on kraken-x upgrade

Added by Sage Weil 9 months ago. Updated 3 months ago.

Status:
Rejected
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
Start date:
03/02/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

/a/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/875222
/a/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/875227

2017-03-02 17:50:59.353188 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] 2:55ed5d96:::smithi15412456-42:1b0 old_snaps 1ab,1ad,1ae,1af old snapset 1b0=[1af,1ae,1ad,1ab,1a7]:[1a7,1b0]+head
2017-03-02 17:50:59.353208 7f4e209d7700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] simple_opc_create 2:55ed5d96:::smithi15412456-42:1b0
2017-03-02 17:50:59.353229 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] trim_object: Unable to get a wlock on 2:55ed5d96:::smithi15412456-42:1b0
2017-03-02 17:50:59.353245 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: could not get write lock on obj 2:55ed5d96:::smithi15412456-42:1b0
2017-03-02 17:50:59.353259 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: waiting for it to clear

but when ec read completes,
2017-03-02 17:50:59.356683 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] handle_sub_read_reply have shard=1
2017-03-02 17:50:59.356701 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] handle_sub_read_reply Complete: ReadOp(tid=10602, to_read={2:55ed5d96:::smithi15412456-42:1b0=read_request_t(to_read=[0,2935008,0], need=0(2),1(0),5(1), want_attrs=0)}, complete={2:55ed5d96:::smithi15412456-42:1b0=read_result_t(r=0, errors={}, noattrs, returned=(0, 2935008, [0(2),978336, 1(0),978336, 5(1),978336]))}, priority=127, obj_to_source={2:55ed5d96:::smithi15412456-42:1b0=0(2),1(0),5(1)}, source_to_obj={0(2)=2:55ed5d96:::smithi15412456-42:1b0,1(0)=2:55ed5d96:::smithi15412456-42:1b0,5(1)=2:55ed5d96:::smithi15412456-42:1b0}, in_progress=)
2017-03-02 17:50:59.358607 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] objects_read_async_cb: got: {2:55ed5d96:::smithi15412456-42:1b0=0,{0~2935008(2935008)}}
2017-03-02 17:50:59.358638 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] objects_read_async_cb: cache: ExtentCache(
2017-03-02 17:50:59.361307 7f4e231dc700 15 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] log_op_stats osd_op(client.26616.0:5071 2.69bab7aa 2:55ed5d96:::smithi15412456-42:1ae [read 0~2935008] snapc 0=[] ack+read+known_if_redirected e1467) v7 inb 0 outb 2935008 lat 0.039182

snaptrimmer is not woken up.

This appears to be because snaptrimmer is never woken on read lock release:

  void put_lock_type(
    ObjectContext::RWState::State type,
    list<OpRequestRef> *to_wake,
    bool *requeue_recovery,
    bool *requeue_snaptrimmer) {
    switch (type) {
    case ObjectContext::RWState::RWWRITE:
      return put_write(to_wake, requeue_recovery, requeue_snaptrimmer);
    case ObjectContext::RWState::RWREAD:
      return put_read(to_wake);
    case ObjectContext::RWState::RWEXCL:
      return put_excl(to_wake, requeue_recovery, requeue_snaptrimmer);
    default:
      assert(0 == "invalid lock type");
    }
  }

what is weird is that this doesn't seem to happen with rados/thrash-erasure-code, but happened 4 times on upgrade/kraken-x uprade: http://pulpito.ceph.com/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/


Related issues

Copied to Ceph - Backport #19207: kraken: snap trim blocked behind ec read, never woken, on kraken-x upgrade Rejected
Copied to Ceph - Backport #19208: jewel: snap trim blocked behind ec read, never woken, on kraken-x upgrade Resolved

History

#1 Updated by Sage Weil 9 months ago

  • Status changed from New to Need Review

#2 Updated by Sage Weil 9 months ago

  • Status changed from Need Review to Pending Backport
  • Backport set to kraken,jewel

#3 Updated by Jan Fajerski 9 months ago

  • Copied to Backport #19207: kraken: snap trim blocked behind ec read, never woken, on kraken-x upgrade added

#4 Updated by Jan Fajerski 9 months ago

  • Copied to Backport #19208: jewel: snap trim blocked behind ec read, never woken, on kraken-x upgrade added

#5 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Rejected

Kraken is EOL.

Also available in: Atom PDF