Bug #64258: osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted) - RADOS - Ceph

Actions

Copy link

Bug #64258

open

osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)

Added by Laura Flores 3 months ago. Updated about 1 month ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Nitzan Mordechai

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

55923

Crash signature (v1):

Crash signature (v2):

Description

/a/lflores-2024-01-29_16:26:06-rados-wip-lflores-testing-2024-01-29-0542-distro-default-smithi/7537140/

2024-01-29T18:29:08.075 INFO:tasks.ceph.osd.2.smithi052.stderr:./src/osd/PrimaryLogPG.cc: 10384: FAILED ceph_assert(inserted)
2024-01-29T18:29:08.075 INFO:tasks.ceph.osd.2.smithi052.stderr:
2024-01-29T18:29:08.075 INFO:tasks.ceph.osd.2.smithi052.stderr: ceph version 19.0.0-948-g65f72cc3 (65f72cc35f227ce82933fc2e9fe05d39cc2c921c) squid (dev)
2024-01-29T18:29:08.075 INFO:tasks.ceph.osd.2.smithi052.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x118) [0x562f3203c450]
2024-01-29T18:29:08.075 INFO:tasks.ceph.osd.2.smithi052.stderr: 2: ceph-osd(+0x3f0607) [0x562f3203c607]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 3: ceph-osd(+0x396c18) [0x562f31fe2c18]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 4: (cls_cxx_gather(void*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, char const*, ceph::buffer::v15_2_0::list&)+0x32a) [0x562f321284da]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 5: /usr/lib/rados-classes/libcls_test_remote_reads.so(+0x3841) [0x7f7d0d9a9841]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 6: ceph-osd(+0x5b7b7d) [0x562f32203b7d]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 7: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x5c3) [0x562f3229cea3]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 8: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x9e) [0x562f322afd7e]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 9: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x37d) [0x562f3228844d]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 10: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x30e9) [0x562f3228c9f9]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x197) [0x562f321c5487]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 12: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x69) [0x562f323d6c29]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xab3) [0x562f321d0183]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x293) [0x562f326bfb33]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 15: ceph-osd(+0xa74094) [0x562f326c0094]
2024-01-29T18:29:08.076 INFO:tasks.ceph.osd.2.smithi052.stderr: 16: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f7d1b897b43]
2024-01-29T18:29:08.077 INFO:tasks.ceph.osd.2.smithi052.stderr: 17: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f7d1b929a00]

Actions

Copy link

Updated by Nitzan Mordechai 3 months ago

Assignee set to Nitzan Mordechai

Actions

Copy link

Updated by Laura Flores 3 months ago

Status changed from New to Fix Under Review
Pull request ID set to 55410

Actions

Copy link

Updated by Radoslaw Zarzynski 3 months ago

Bump up.

Actions

Copy link

Updated by Samuel Just 3 months ago

Nitzan was able to reproduce this locally, the following analysis is on those logs.

osd 1 and 2 crashed on the same op, osd.1 first:

osd.1:

    -1> 2024-02-14T08:55:56.299+0000 7f47c6fe9700 -1 ../src/osd/PrimaryLogPG.cc: In function 'int PrimaryLogPG::start_cls_gather(PrimaryLogPG::OpContext*, std::map<std::__cxx11::basic_string<char>, ceph::buffer::v15_2_0::list>*, const string&, const char*, const char*, ceph::bufferlist&)' thread 7f47c6fe9700 time 2024-02-14T08:55:56.296161+0000
../src/osd/PrimaryLogPG.cc: 10390: FAILED ceph_assert(inserted)

osd.2:

    -2> 2024-02-14T08:57:46.522+0000 7fa2486cd700 -1 ../src/osd/PrimaryLogPG.cc: In function 'int PrimaryLogPG::start_cls_gather(PrimaryLogPG::OpContext*, std::map<std::__cxx11::basic_string<char>, ceph::buffer::v15_2_0::list>*, const string&, const char*, const char*, ceph::bufferlist&)' thread 7fa2486cd700 time 2024-02-14T08:57:46.519023+0000
../src/osd/PrimaryLogPG.cc: 10390: FAILED ceph_assert(inserted)

osd.1 final log lines for op

2024-02-14T08:55:56.294+0000 7f47e9c1f700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] <== client.4156 172.21.5.153:0/3838631962 1 ==== osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e77) v8 ==== 219+0+7855 (crc 0 0 0) 0x598c500 con 0x5e23800
2024-02-14T08:55:56.294+0000 7f47e9c1f700 15 osd.1 77 enqueue_op osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e77) v8 prio 63 type 42 cost 7855 latency 0.000015 epoch 77 osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e77) v8
2024-02-14T08:55:56.294+0000 7f47e9c1f700 20 osd.1 op_wq(2) _enqueue OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e77) v8) class_id 3 prio 63 cost 7855 e77)
2024-02-14T08:55:56.294+0000 7f47caff1700 20 osd.1 op_wq(2) _process 3.2 to_process <> waiting <> waiting_peering {}
2024-02-14T08:55:56.294+0000 7f47caff1700 20 osd.1 op_wq(2) _process OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e77) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e77) queued
2024-02-14T08:55:56.294+0000 7f47c6fe9700 10 osd.1 pg_epoch: 77 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: found obc in cache: obc(3:41780a1c:::tgt_object:head(dne) rwstate(write n=2 w=0))
2024-02-14T08:55:56.294+0000 7f47c6fe9700 10 osd.1 pg_epoch: 77 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: obc(3:41780a1c:::tgt_object:head(dne) rwstate(write n=2 w=0)) oi: 3:41780a1c:::tgt_object:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) exists: 0 ssc(3:41780a1c:::tgt_object:snapdir snapset: 0=[]:{} ref: 1 registered: 1 exists: 0)
2024-02-14T08:55:56.299+0000 7f47c6fe9700 -1 ../src/osd/PrimaryLogPG.cc: In function 'int PrimaryLogPG::start_cls_gather(PrimaryLogPG::OpContext*, std::map<std::__cxx11::basic_string<char>, ceph::buffer::v15_2_0::list>*, const string&, const char*, const char*, ceph::bufferlist&)' thread 7f47c6fe9700 time 2024-02-14T08:55:56.296161+0000
../src/osd/PrimaryLogPG.cc: 10390: FAILED ceph_assert(inserted)

This is a little bit confusing. The first few lines are the arrival of client.4156.0:302, but the get_object_context calls are actually a continuation of a prior op:

2024-02-14T08:55:56.293+0000 7f47e9c1f700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] <== client.4156 172.21.5.153:0/3838631962 1 ==== osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8 ==== 219+0+7855 (crc 0 0 0) 0x4c90780 con 0x58bfc00
2024-02-14T08:55:56.293+0000 7f47e9c1f700 15 osd.1 77 enqueue_op osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8 prio 63 type 42 cost 7855 latency 0.000018 epoch 73 osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8
2024-02-14T08:55:56.293+0000 7f47e9c1f700 20 osd.1 op_wq(2) _enqueue OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8) class_id 3 prio 63 cost 7855 e73)
2024-02-14T08:55:56.293+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process 3.2 to_process <> waiting <> waiting_peering {}
2024-02-14T08:55:56.293+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e73) queued
2024-02-14T08:55:56.293+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process 3.2 to_process <OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e73)> waiting <> waiting_peering {}
2024-02-14T08:55:56.293+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e73) pg 0x4c95000
2024-02-14T08:55:56.293+0000 7f47c6fe9700 10 osd.1 77 dequeue_op osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8 prio 63 cost 7855 latency 0.000133 osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8 pg pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean]
...
2024-02-14T08:55:56.293+0000 7f47c6fe9700 10 osd.1 pg_epoch: 77 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] _handle_message: osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8
...
2024-02-14T08:55:56.293+0000 7f47c6fe9700 10 osd.1 pg_epoch: 77 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op osd_op(client.4156.0:302 3.2 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected+supports_pool_eio e73) v8 may_write may_read -> write-ordered flags ondisk+retry+read+known_if_redirected+supports_pool_eio

Note, this prior op is also client.4156.0:302, but notice the RETRY=2 tag on the first line after it is decoded. We can predict that the last arrived message above would actually have had RETRY=3 if the OSD had survived long enough to decode it. As to why this is happening:

2024-02-14T08:55:56.293+0000 7f47e9c1f700  0 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] >> 172.21.5.153:0/3838631962 conn(0x58bfc00 msgr2=0x5ada680 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send injecting socket failure

Due to

2024-02-14T08:55:35.980+0000 7f47e6c19700 20 set_mon_vals ms_inject_socket_failures = 100 (no change)

To summarize the situation so far:
- client<->osd connections are being reset causing ops to be resent.
- The op that crashed osd.1 was actually the second retry on that op.

The next question is what happened to the previous two incarnations:

2024-02-14T08:55:24.262+0000 7f47e9c1f700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] <== client.4156 172.21.5.153:0/3838631962 1 ==== osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 ==== 219+0+7855 (crc 0 0 0) 0x598ca00 con 0x551b000
...
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op: op osd_op(client.4156.0:302 3.2 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8

2024-02-14T08:55:21.256+0000 7f47e841c700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] <== client.4156 172.21.5.153:0/3838631962 2 ==== osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e56) v8 ==== 219+0+7855 (crc 0 0 0) 0x5ef0f00 con 0x567c400
...
2024-02-14T08:55:21.256+0000 7f47caff1700 20 osd.1 pg_epoch: 56 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean]  discard osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e56) v8

That last one was discarded because of a disconnect, see PG.cc::can_discard_op

...
  auto m = op->get_req<MOSDOp>();
  if (cct->_conf->osd_discard_disconnected_ops && OSD::op_is_discardable(m)) {
    dout(20) << " discard " << *m << dendl;
    return true;
  }
...

So, the op which arrived at 2024-02-14T08:55:24.262 (RETRY 1) is most likely why the map already has an entry.

2024-02-14T08:55:24.262+0000 7f47e9c1f700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] <== client.4156 172.21.5.153:0/3838631962 1 ==== osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 ==== 219+0+7855 (crc 0 0 0) 0x598ca00 con 0x551b000
2024-02-14T08:55:24.262+0000 7f47e9c1f700 15 osd.1 59 enqueue_op osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 prio 63 type 42 cost 7855 latency 0.000031 epoch 56 osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8
2024-02-14T08:55:24.262+0000 7f47e9c1f700 20 osd.1 op_wq(2) _enqueue OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8) class_id 3 prio 63 cost 7855 e56)
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process 3.2 to_process <> waiting <> waiting_peering {}
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e56) queued
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process 3.2 to_process <OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e56)> waiting <> waiting_peering {}
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 op_wq(2) _process OpSchedulerItem(3.2 PGOpItem(op=osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8) class_id 3 prio 63 qos_cost 143371 cost 7855 e56) pg 0x4c95000
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 59 dequeue_op osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 prio 63 cost 7855 latency 0.000160 osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 pg pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean]
...
2024-02-14T08:55:24.262+0000 7f47c6fe9700  1 -- [v2:172.21.5.153:6811/51816011,v1:172.21.5.153:6812/51816011] --> 172.21.5.153:0/3838631962 -- osd_map(57..59 src has 1..59) v4 -- 0x5ac1180 con 0x551b000
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] _handle_message: osd_op(client.4156.0:302 3.2 3.38501e82 (undecoded) ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op: op osd_op(client.4156.0:302 3.2 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] op_has_sufficient_caps session=0x5b00960 pool=3 (test-rados-api-folio03-2714962-1 ) pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=0 classes=[class test_remote_reads method test_gather rd 1 wr 1 allowed 1] -> yes
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op osd_op(client.4156.0:302 3.2 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected+supports_pool_eio e56) v8 may_write may_read -> write-ordered flags ondisk+retry+read+known_if_redirected+supports_pool_eio
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: obc NOT found in cache: 3:41780a1c:::tgt_object:head
2024-02-14T08:55:24.262+0000 7f47c6fe9700 15 bluestore(/home/nmordech/ceph_crimson/ceph/build/dev/osd1) getattr 3.2_head #3:41780a1c:::tgt_object:head# _
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 bluestore(/home/nmordech/ceph_crimson/ceph/build/dev/osd1).collection(3.2_head 0x5377a40) get_onode oid #3:41780a1c:::tgt_object:head# key 0x7F800000000000000341780A1C'!tgt_object!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 bluestore(/home/nmordech/ceph_crimson/ceph/build/dev/osd1).collection(3.2_head 0x5377a40)  r -2 v.len 0
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 bluestore(/home/nmordech/ceph_crimson/ceph/build/dev/osd1) getattr 3.2_head #3:41780a1c:::tgt_object:head# _ = -2
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: no obc for soid 3:41780a1c:::tgt_object:head but can_create
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] create_object_context 0x6652580 3:41780a1c:::tgt_object:head
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] populate_obc_watchers 3:41780a1c:::tgt_object:head
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] PrimaryLogPG::check_blocklisted_obc_watchers for obc 3:41780a1c:::tgt_object:head
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: obc(3:41780a1c:::tgt_object:head(dne) rwstate(none n=0 w=0)) oi: 3:41780a1c:::tgt_object:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) ssc(3:41780a1c:::tgt_object:snapdir snapset: 0=[]:{} ref: 1 registered: 1 exists: 0)
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] find_object_context 3:41780a1c:::tgt_object:head @head oi=3:41780a1c:::tgt_object:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0])
2024-02-14T08:55:24.262+0000 7f47c6fe9700 25 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op oi 3:41780a1c:::tgt_object:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0])
2024-02-14T08:55:24.262+0000 7f47c6fe9700 20 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_op obc obc(3:41780a1c:::tgt_object:head(dne) rwstate(write n=1 w=0))
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] execute_ctx 0x4c86400
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] execute_ctx 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b] ov 0'0 av 59'84 snapc 0=[] snapset 0=[]:{}
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_osd_op 3:41780a1c:::tgt_object:head [call test_remote_reads.test_gather in=7855b]
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] do_osd_op  call test_remote_reads.test_gather in=7855b
2024-02-14T08:55:24.262+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] call method test_remote_reads.test_gather
2024-02-14T08:55:24.263+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: found obc in cache: obc(3:41780a1c:::tgt_object:head(dne) rwstate(write n=1 w=0))
2024-02-14T08:55:24.263+0000 7f47c6fe9700 10 osd.1 pg_epoch: 59 pg[3.2( v 56'83 (0'0,56'83] local-lis/les=48/49 n=9 ec=48/44 lis/c=48/48 les/c/f=49/49/0 sis=48) [1,2,0] r=0 lpr=48 crt=56'83 lcod 56'82 mlcod 56'82 active+clean] get_object_context: obc(3:41780a1c:::tgt_object:head(dne) rwstate(write n=1 w=0)) oi: 3:41780a1c:::tgt_object:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) exists: 0 ssc(3:41780a1c:::tgt_object:snapdir snapset: 0=[]:{} ref: 1 registered: 1 exists: 0)

We can see from this output that the interval did not change between RETRY=1 when we initially serviced the request and RETRY=3 when we crashed. The RETRY=1 was apparently still in progress when RETRY=3 arrived. This explains why the entry was already populated -- the request was still in progress.

I see two related problems here:
1. we need to correctly deal with retries of ops which make use of the cls_gather interface
2. we need to correctly deal with pipelined cls_gather operations. Even without a retry, it's entirely possible for there to be multiple concurrent cls_gather operations on the same object since the method is marked as write and not read+write.

This above is deceptively complex to fix and test. This feature was introduced as an experiment, but I'm not aware of any serious users at this time. Thus, I suggest:
1. Remove this unit test from the ci and move it to an experimental suite which doesn't block PR merges.
2. Mark this feature deprecated for squid. Either someone steps up to finish the implementation and maintain it or it gets removed in T.

Actions

Copy link