Project

General

Profile

Bug #20919

osd: replica read can trigger cache promotion

Added by Sage Weil over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-08-04T17:36:21.692 INFO:tasks.ceph.osd.0.smithi037.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.2-163-g554a790/rpm/el7/BUILD/ceph-12.1.2-163-g554a790/src/osd/PG.h: In function 'eversion_t PG::get_next_version() const' thread 7fc309876700 time 2017-08-04 17:36:21.701531
2017-08-04T17:36:21.693 INFO:tasks.ceph.osd.0.smithi037.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.2-163-g554a790/rpm/el7/BUILD/ceph-12.1.2-163-g554a790/src/osd/PG.h: 2370: FAILED assert(at_version > info.last_update)
2017-08-04T17:36:21.698 INFO:tasks.ceph.osd.0.smithi037.stderr: ceph version 12.1.2-163-g554a790 (554a790e4fabff2ed21e9f86f79719b77bbd105d) luminous (rc)
2017-08-04T17:36:21.698 INFO:tasks.ceph.osd.0.smithi037.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fc3314ab580]
2017-08-04T17:36:21.698 INFO:tasks.ceph.osd.0.smithi037.stderr: 2: (PG::get_next_version() const+0x117) [0x7fc3310f5ea7]
2017-08-04T17:36:21.699 INFO:tasks.ceph.osd.0.smithi037.stderr: 3: (PrimaryLogPG::finish_promote(int, PrimaryLogPG::CopyResults*, std::shared_ptr<ObjectContext>)+0x252) [0x7fc3310b9912]
2017-08-04T17:36:21.699 INFO:tasks.ceph.osd.0.smithi037.stderr: 4: (PromoteCallback::finish(boost::tuples::tuple<int, PrimaryLogPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>)+0x50) [0x7fc33111f660]
2017-08-04T17:36:21.699 INFO:tasks.ceph.osd.0.smithi037.stderr: 5: (PrimaryLogPG::process_copy_chunk(hobject_t, unsigned long, int)+0x76c) [0x7fc3310c239c]
2017-08-04T17:36:21.699 INFO:tasks.ceph.osd.0.smithi037.stderr: 6: (C_Copyfrom::finish(int)+0xb1) [0x7fc331121391]
2017-08-04T17:36:21.699 INFO:tasks.ceph.osd.0.smithi037.stderr: 7: (Context::complete(int)+0x9) [0x7fc330f67219]
2017-08-04T17:36:21.700 INFO:tasks.ceph.osd.0.smithi037.stderr: 8: (Finisher::finisher_thread_entry()+0x1c5) [0x7fc3314a8f05]
2017-08-04T17:36:21.700 INFO:tasks.ceph.osd.0.smithi037.stderr: 9: (()+0x7dc5) [0x7fc32e294dc5]
2017-08-04T17:36:21.700 INFO:tasks.ceph.osd.0.smithi037.stderr: 10: (clone()+0x6d) [0x7fc32d38873d]

/a/sage-2017-08-04_15:57:43-rbd-wip-sage-testing2-20170804a-distro-basic-smithi/1482943

Related issues

Copied to RADOS - Backport #22402: luminous: osd: replica read can trigger cache promotion Resolved
Copied to RADOS - Backport #22403: jewel: osd: replica read can trigger cache promotion Resolved

History

#1 Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to RADOS

#2 Updated by Sage Weil over 6 years ago

  • Subject changed from osd/PG.h: 2370: FAILED assert(at_version > info.last_update) to osd: replica read can trigger cache promotion
  • Priority changed from High to Urgent

a replica was servicing a read and tried to do a cache promotion:

   -37> 2017-08-04 17:36:21.700080 7fc30e880700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] do_op: op osd_op(client.4224.0:2408 12.2 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:19 [read 0~4194304] snapc 0=[] ondisk+read+localize_reads+known_if_redirected e95) v8
   -36> 2017-08-04 17:36:21.700088 7fc30e880700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] op_has_sufficient_caps session=0x7fc33be21200 pool=12 (test-librbd-smithi037-320057-10 ) owner=0 need_read_cap=1 need_write_cap=0 classes=[] -> yes
   -35> 2017-08-04 17:36:21.700098 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] do_op osd_op(client.4224.0:2408 12.2 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:19 [read 0~4194304] snapc 0=[] ondisk+read+localize_reads+known_if_redirected e95) v8 may_read -> read-ordered flags ondisk+read+localize_reads+known_if_redirected
   -34> 2017-08-04 17:36:21.700107 7fc30e880700 15 filestore(/var/lib/ceph/osd/ceph-0) getattr(4392): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head# 'snapset'
   -33> 2017-08-04 17:36:21.700127 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/12.2_head/rbd\udata.10803bc4e8cd.0000000000000000__head_3A1F1D4A__c with flags=2: (2) No such file or directory
   -32> 2017-08-04 17:36:21.700132 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) getattr(4426): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head# 'snapset' = -2
   -31> 2017-08-04 17:36:21.700135 7fc30e880700 15 filestore(/var/lib/ceph/osd/ceph-0) getattr(4392): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:snapdir# 'snapset'
   -30> 2017-08-04 17:36:21.700146 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/12.2_head/rbd\udata.10803bc4e8cd.0000000000000000__snapdir_3A1F1D4A__c with flags=2: (2) No such file or directory
   -29> 2017-08-04 17:36:21.700149 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) getattr(4426): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:snapdir# 'snapset' = -2
   -28> 2017-08-04 17:36:21.700152 7fc30e880700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] find_object_context 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:19 no snapset
   -27> 2017-08-04 17:36:21.700159 7fc30e880700 25 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] maybe_handle_cache_detail (no obc) missing_oid 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head must_promote 0 in_hit_set 0
   -26> 2017-08-04 17:36:21.700166 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] get_object_context: obc NOT found in cache: 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head
   -25> 2017-08-04 17:36:21.700171 7fc30e880700 15 filestore(/var/lib/ceph/osd/ceph-0) getattr(4392): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head# '_'
   -24> 2017-08-04 17:36:21.700181 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/12.2_head/rbd\udata.10803bc4e8cd.0000000000000000__head_3A1F1D4A__c with flags=2: (2) No such file or directory
   -23> 2017-08-04 17:36:21.700185 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) getattr(4426): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head# '_' = -2
   -22> 2017-08-04 17:36:21.700187 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] get_object_context: no obc for soid 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head but can_create
   -21> 2017-08-04 17:36:21.700193 7fc30e880700 15 filestore(/var/lib/ceph/osd/ceph-0) getattr(4392): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:snapdir# 'snapset'
   -20> 2017-08-04 17:36:21.700201 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/12.2_head/rbd\udata.10803bc4e8cd.0000000000000000__snapdir_3A1F1D4A__c with flags=2: (2) No such file or directory
   -19> 2017-08-04 17:36:21.700204 7fc30e880700 10 filestore(/var/lib/ceph/osd/ceph-0) getattr(4426): 12.2_head/#12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:snapdir# 'snapset' = -2
   -18> 2017-08-04 17:36:21.700211 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] create_object_context 0x7fc33c439440 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head 
   -17> 2017-08-04 17:36:21.700216 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] populate_obc_watchers 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head
   -16> 2017-08-04 17:36:21.700220 7fc30e880700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] PrimaryLogPG::check_blacklisted_obc_watchers for obc 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head
   -15> 2017-08-04 17:36:21.700224 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] get_object_context: 0x7fc33c439440 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head rwstate(none n=0 w=0) oi: 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) ssc: 0x7fc33c1326c0 snapset: 0=[]:[]
   -14> 2017-08-04 17:36:21.700232 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] start_copy 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head from 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head @3 v0 flags 30 mirror_snapset
   -13> 2017-08-04 17:36:21.700239 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] _copy_some 0x7fc33c439440 0x7fc33c82a398
   -12> 2017-08-04 17:36:21.700270 7fc30e880700  1 -- 172.21.15.37:0/316573 --> 172.21.15.35:6809/14816 -- osd_op(unknown.0.11:5 3.2 3:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:snapdir [list-snaps] snapc 0=[] ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95) v8 -- 0x7fc33b39db00 con 0
   -11> 2017-08-04 17:36:21.700290 7fc30e880700  1 -- 172.21.15.37:0/316573 --> 172.21.15.35:6809/14816 -- osd_op(unknown.0.11:6 3.2 3:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head [copy-get max 8388608] snapc 0=[] ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95) v8 -- 0x7fc33be48900 con 0
   -10> 2017-08-04 17:36:21.700298 7fc30e880700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] wait_for_blocked_object 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head 0x7fc33bdf0b80
    -9> 2017-08-04 17:36:21.700313 7fc30e880700 10 osd.0 95 dequeue_op 0x7fc33bdf0b80 finish
    -8> 2017-08-04 17:36:21.700316 7fc30e880700 20 osd.0 op_wq(2) _process empty q, waiting
    -7> 2017-08-04 17:36:21.701275 7fc32b1b5700  1 -- 172.21.15.37:0/316573 <== osd.5 172.21.15.35:6809/14816 1 ==== osd_op_reply(5 rbd_data.10803bc4e8cd.0000000000000000 [list-snaps] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 182+0+0 (4112046353 0 0) 0x7fc33be48900 con 0x7fc33b44a800
    -6> 2017-08-04 17:36:21.701472 7fc32b1b5700  1 -- 172.21.15.37:0/316573 <== osd.5 172.21.15.35:6809/14816 2 ==== osd_op_reply(6 rbd_data.10803bc4e8cd.0000000000000000 [copy-get max 8388608] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 182+0+103 (1547587044 0 1324605519) 0x7fc33be48900 con 0x7fc33b44a800
    -5> 2017-08-04 17:36:21.701495 7fc309876700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] process_copy_chunk 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head tid 6 (2) No such file or directory
    -4> 2017-08-04 17:36:21.701507 7fc309876700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] process_copy_chunk complete r = (2) No such file or directory
    -3> 2017-08-04 17:36:21.701512 7fc309876700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] finish_promote 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head r=-2 uv0
    -2> 2017-08-04 17:36:21.701518 7fc309876700 10 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] finish_promote whiteout 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head
    -1> 2017-08-04 17:36:21.701522 7fc309876700 20 osd.0 pg_epoch: 95 pg[12.2( v 95'3 (0'0,95'3] local-lis/les=88/89 n=2 ec=88/88 lis/c 88/88 les/c/f 89/89/0 88/88/88) [3,0] r=1 lpr=89 luod=0'0 crt=95'3 lcod 95'2 active] simple_opc_create 12:52b8f85c:::rbd_data.10803bc4e8cd.0000000000000000:head
     0> 2017-08-04 17:36:21.703638 7fc309876700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.2-163-g554a790/rpm/el7/BUILD/ceph-12.1.2-163-g554a790/src/osd/PG.h: In function 'eversion_t PG::get_next_version() const' thread 7fc309876700 time 2017-08-04 17:36:21.701531
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.2-163-g554a790/rpm/el7/BUILD/ceph-12.1.2-163-g554a790/src/osd/PG.h: 2370: FAILED assert(at_version > info.last_update)

(note the r=1).

#3 Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Fix Under Review

#4 Updated by Josh Durgin over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous, jewel

#5 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22402: luminous: osd: replica read can trigger cache promotion added

#6 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22403: jewel: osd: replica read can trigger cache promotion added

#7 Updated by Nathan Cutler almost 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF