Project

General

Profile

Bug #43813

objecter doesn't send osd_op

Added by Sage Weil 8 months ago. Updated 8 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697914

ceph-client.admin.35079.log


Related issues

Copied to RADOS - Backport #43991: mimic: objecter doesn't send osd_op Need More Info
Copied to RADOS - Backport #43992: nautilus: objecter doesn't send osd_op Need More Info

History

#1 Updated by Sage Weil 8 months ago

current thinking: paused = true

    if (op->should_resend) {
      if (!op->session->is_homeless() && !op->target.paused) {
    logger->inc(l_osdc_op_resend);
    _send_op(op);
      }
    } else {
      _op_cancel_map_check(op);
      _cancel_linger_op(op);
    }

since the log does show the _send_op call
2020-01-24T08:24:34.267+0000 7f5825e63700 15 client.5253.objecter _session_op_assign 1 10
...should happen here...
2020-01-24T08:24:34.267+0000 7f5825e63700 20 client.5253.objecter put_session s=0x7f580801a530 osd=1 4

#2 Updated by Sage Weil 8 months ago

  • Status changed from New to Need More Info

maybe this will help debug: https://github.com/ceph/ceph/pull/32850

#3 Updated by Sage Weil 8 months ago

  • Project changed from Ceph to RADOS

#4 Updated by Sage Weil 8 months ago

  • Target version set to v15.0.0

#5 Updated by Sage Weil 8 months ago

/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487

2020-01-31T00:16:55.694+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map got epochs [146,146] > 145
2020-01-31T00:16:55.694+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map decoding incremental epoch 146
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter  checking op 4
2020-01-31T00:16:55.694+0000 7f12ae1c8700 20 client.4531.objecter _calc_target epoch 146 base foo1 @32 precalc_pgid 0 pgid 0.0 is_write
2020-01-31T00:16:55.694+0000 7f12ae1c8700 20 client.4531.objecter _calc_target target foo1 @32 -> pgid 32.be9754b3
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter _calc_target paused 0 -> 1   ************************************
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter _maybe_request_map subscribing (onetime) to next osd map
...
2020-01-31T00:16:57.577+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map got epochs [147,147] > 146
2020-01-31T00:16:57.577+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map decoding incremental epoch 147
2020-01-31T00:16:57.577+0000 7f12ae1c8700 10 client.4531.objecter  checking op 4
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter _calc_target epoch 147 base foo1 @32 precalc_pgid 0 pgid 0.0 is_write
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter _calc_target target foo1 @32 -> pgid 32.be9754b3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 10 client.4531.objecter _calc_target  raw pgid 32.be9754b3 -> actual 32.13 acting [3,7] primary 3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter put_session s=0x556669f6d760 osd=7 3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 15 client.4531.objecter _session_op_remove 7 4
...
2020-01-31T00:16:57.577+0000 7f12ae1c8700 15 client.4531.objecter _session_op_assign 3 4
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter put_session s=0x7f129c063ab0 osd=3 4

the pool is full:
pool 32 'test-rados-api-smithi012-24964-3' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 167 flags hashpspool,full,full_quota max_bytes 4096 stripe_width 0 application rados

the client is here LibRadosAio.PoolQuotaPP:
    ASSERT_EQ(0, ioctx.aio_operate("foo" + stringify(n),
                                   completion.get(), &op,
                                   librados::OPERATION_FULL_TRY));
    completion->wait_for_complete();  <<---- here
    int r = completion->get_return_value();

so the op has FULL_TRY set, but it's not getting sent.

#6 Updated by Sage Weil 8 months ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Sage Weil

#7 Updated by Sage Weil 8 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 33020

#8 Updated by Sage Weil 8 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus

#9 Updated by Sage Weil 8 months ago

  • Backport changed from nautilus to nautilus,mimic

#10 Updated by Nathan Cutler 8 months ago

#11 Updated by Nathan Cutler 8 months ago

Also available in: Atom PDF