Actions
Bug #19131
closedsnap trim blocked behind ec read, never woken, on kraken-x upgrade
Status:
Rejected
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/875222
/a/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/875227
2017-03-02 17:50:59.353188 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] 2:55ed5d96:::smithi15412456-42:1b0 old_snaps 1ab,1ad,1ae,1af old snapset 1b0=[1af,1ae,1ad,1ab,1a7]:[1a7,1b0]+head 2017-03-02 17:50:59.353208 7f4e209d7700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] simple_opc_create 2:55ed5d96:::smithi15412456-42:1b0 2017-03-02 17:50:59.353229 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] trim_object: Unable to get a wlock on 2:55ed5d96:::smithi15412456-42:1b0 2017-03-02 17:50:59.353245 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: could not get write lock on obj 2:55ed5d96:::smithi15412456-42:1b0 2017-03-02 17:50:59.353259 7f4e209d7700 10 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: waiting for it to clear
but when ec read completes,
2017-03-02 17:50:59.356683 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] handle_sub_read_reply have shard=1 2017-03-02 17:50:59.356701 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] handle_sub_read_reply Complete: ReadOp(tid=10602, to_read={2:55ed5d96:::smithi15412456-42:1b0=read_request_t(to_read=[0,2935008,0], need=0(2),1(0),5(1), want_attrs=0)}, complete={2:55ed5d96:::smithi15412456-42:1b0=read_result_t(r=0, errors={}, noattrs, returned=(0, 2935008, [0(2),978336, 1(0),978336, 5(1),978336]))}, priority=127, obj_to_source={2:55ed5d96:::smithi15412456-42:1b0=0(2),1(0),5(1)}, source_to_obj={0(2)=2:55ed5d96:::smithi15412456-42:1b0,1(0)=2:55ed5d96:::smithi15412456-42:1b0,5(1)=2:55ed5d96:::smithi15412456-42:1b0}, in_progress=) 2017-03-02 17:50:59.358607 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] objects_read_async_cb: got: {2:55ed5d96:::smithi15412456-42:1b0=0,{0~2935008(2935008)}} 2017-03-02 17:50:59.358638 7f4e231dc700 20 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] objects_read_async_cb: cache: ExtentCache( 2017-03-02 17:50:59.361307 7f4e231dc700 15 osd.1 pg_epoch: 1467 pg[2.as0( v 1466'1019 (0'0,1466'1019] local-les=1032 n=5 ec=1031 les/c/f 1032/1032/0 1031/1031/1031) [1,5,0,2] r=0 lpr=1031 crt=1466'1019 lcod 1456'1017 mlcod 1456'1017 active+clean+snaptrim snaptrimq=[1ad~1,1b3~1]] log_op_stats osd_op(client.26616.0:5071 2.69bab7aa 2:55ed5d96:::smithi15412456-42:1ae [read 0~2935008] snapc 0=[] ack+read+known_if_redirected e1467) v7 inb 0 outb 2935008 lat 0.039182
snaptrimmer is not woken up.
This appears to be because snaptrimmer is never woken on read lock release:
void put_lock_type( ObjectContext::RWState::State type, list<OpRequestRef> *to_wake, bool *requeue_recovery, bool *requeue_snaptrimmer) { switch (type) { case ObjectContext::RWState::RWWRITE: return put_write(to_wake, requeue_recovery, requeue_snaptrimmer); case ObjectContext::RWState::RWREAD: return put_read(to_wake); case ObjectContext::RWState::RWEXCL: return put_excl(to_wake, requeue_recovery, requeue_snaptrimmer); default: assert(0 == "invalid lock type"); } }
what is weird is that this doesn't seem to happen with rados/thrash-erasure-code, but happened 4 times on upgrade/kraken-x uprade: http://pulpito.ceph.com/sage-2017-03-02_17:08:17-upgrade:kraken-x-master---basic-smithi/
Updated by Sage Weil about 7 years ago
- Status changed from New to Fix Under Review
Updated by Sage Weil about 7 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to kraken,jewel
Updated by Jan Fajerski about 7 years ago
- Copied to Backport #19207: kraken: snap trim blocked behind ec read, never woken, on kraken-x upgrade added
Updated by Jan Fajerski about 7 years ago
- Copied to Backport #19208: jewel: snap trim blocked behind ec read, never woken, on kraken-x upgrade added
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Rejected
Kraken is EOL.
Actions