Project

General

Profile

Actions

Bug #43813

closed

objecter doesn't send osd_op

Added by Sage Weil about 4 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697914

ceph-client.admin.35079.log


Related issues 3 (0 open3 closed)

Has duplicate RADOS - Bug #38064: librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio.PoolQuotaPP brokenDuplicate

Actions
Copied to RADOS - Backport #43991: mimic: objecter doesn't send osd_opRejectedActions
Copied to RADOS - Backport #43992: nautilus: objecter doesn't send osd_opRejectedActions
Actions #1

Updated by Sage Weil about 4 years ago

current thinking: paused = true

    if (op->should_resend) {
      if (!op->session->is_homeless() && !op->target.paused) {
    logger->inc(l_osdc_op_resend);
    _send_op(op);
      }
    } else {
      _op_cancel_map_check(op);
      _cancel_linger_op(op);
    }

since the log does show the _send_op call
2020-01-24T08:24:34.267+0000 7f5825e63700 15 client.5253.objecter _session_op_assign 1 10
...should happen here...
2020-01-24T08:24:34.267+0000 7f5825e63700 20 client.5253.objecter put_session s=0x7f580801a530 osd=1 4

Actions #2

Updated by Sage Weil about 4 years ago

  • Status changed from New to Need More Info

maybe this will help debug: https://github.com/ceph/ceph/pull/32850

Actions #3

Updated by Sage Weil about 4 years ago

  • Project changed from Ceph to RADOS
Actions #4

Updated by Sage Weil about 4 years ago

  • Target version set to v15.0.0
Actions #5

Updated by Sage Weil about 4 years ago

/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487

2020-01-31T00:16:55.694+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map got epochs [146,146] > 145
2020-01-31T00:16:55.694+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map decoding incremental epoch 146
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter  checking op 4
2020-01-31T00:16:55.694+0000 7f12ae1c8700 20 client.4531.objecter _calc_target epoch 146 base foo1 @32 precalc_pgid 0 pgid 0.0 is_write
2020-01-31T00:16:55.694+0000 7f12ae1c8700 20 client.4531.objecter _calc_target target foo1 @32 -> pgid 32.be9754b3
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter _calc_target paused 0 -> 1   ************************************
2020-01-31T00:16:55.694+0000 7f12ae1c8700 10 client.4531.objecter _maybe_request_map subscribing (onetime) to next osd map
...
2020-01-31T00:16:57.577+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map got epochs [147,147] > 146
2020-01-31T00:16:57.577+0000 7f12ae1c8700  3 client.4531.objecter handle_osd_map decoding incremental epoch 147
2020-01-31T00:16:57.577+0000 7f12ae1c8700 10 client.4531.objecter  checking op 4
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter _calc_target epoch 147 base foo1 @32 precalc_pgid 0 pgid 0.0 is_write
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter _calc_target target foo1 @32 -> pgid 32.be9754b3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 10 client.4531.objecter _calc_target  raw pgid 32.be9754b3 -> actual 32.13 acting [3,7] primary 3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter put_session s=0x556669f6d760 osd=7 3
2020-01-31T00:16:57.577+0000 7f12ae1c8700 15 client.4531.objecter _session_op_remove 7 4
...
2020-01-31T00:16:57.577+0000 7f12ae1c8700 15 client.4531.objecter _session_op_assign 3 4
2020-01-31T00:16:57.577+0000 7f12ae1c8700 20 client.4531.objecter put_session s=0x7f129c063ab0 osd=3 4

the pool is full:
pool 32 'test-rados-api-smithi012-24964-3' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 167 flags hashpspool,full,full_quota max_bytes 4096 stripe_width 0 application rados

the client is here LibRadosAio.PoolQuotaPP:
    ASSERT_EQ(0, ioctx.aio_operate("foo" + stringify(n),
                                   completion.get(), &op,
                                   librados::OPERATION_FULL_TRY));
    completion->wait_for_complete();  <<---- here
    int r = completion->get_return_value();

so the op has FULL_TRY set, but it's not getting sent.
Actions #6

Updated by Sage Weil about 4 years ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Sage Weil
Actions #7

Updated by Sage Weil about 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 33020
Actions #8

Updated by Sage Weil about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus
Actions #9

Updated by Sage Weil about 4 years ago

  • Backport changed from nautilus to nautilus,mimic
Actions #10

Updated by Nathan Cutler about 4 years ago

Actions #11

Updated by Nathan Cutler about 4 years ago

Actions #12

Updated by Josh Durgin over 3 years ago

  • Has duplicate Bug #38064: librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio.PoolQuotaPP broken added
Actions #13

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #14

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF