Project

General

Profile

Actions

Bug #36177

closed

rados rm --force-full is blocked when cluster is in full status

Added by Honggang Yang over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
rados tool
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

# rados -p datapool -c ceph.conf rm --force-full benchmark_data_ip-172-31-16-244_5986_object332364
2018-05-17 06:42:36.433 7fc8e67fc700 10 client.?.objecter ms_handle_connect 0x562ff527c500
2018-05-17 06:42:36.433 7fc8e67fc700 10 client.?.objecter resend_mon_ops
2018-05-17 06:42:36.437 7fc9032f8a40 10 client.4229.objecter _maybe_request_map subscribing (onetime) to next osd map
2018-05-17 06:42:36.437 7fc8e67fc700 10 client.4229.objecter ms_dispatch 0x562ff5130c00 osd_map(11..11 src has 1..11) v4
2018-05-17 06:42:36.437 7fc8e67fc700  3 client.4229.objecter handle_osd_map got epochs [11,11] > 0
2018-05-17 06:42:36.437 7fc8e67fc700  3 client.4229.objecter handle_osd_map decoding full epoch 11
2018-05-17 06:42:36.437 7fc8e67fc700 10 client.4229.objecter _maybe_request_map subscribing (onetime) to next osd map
2018-05-17 06:42:36.437 7fc8e67fc700 20 client.4229.objecter dump_active .. 0 homeless
2018-05-17 06:42:36.437 7fc8e67fc700 10 client.4229.objecter ms_handle_connect 0x7fc8d8002ce0
2018-05-17 06:42:36.437 7fc9032f8a40 10 client.4229.objecter _op_submit op 0x562ff52815e0
2018-05-17 06:42:36.437 7fc9032f8a40 20 client.4229.objecter _calc_target epoch 11 base benchmark_data_ip-172-31-16-244_5986_object332364 @1 precalc_pgid
2018-05-17 06:42:36.437 7fc9032f8a40 20 client.4229.objecter _calc_target target benchmark_data_ip-172-31-16-244_5986_object332364 @1 -> pgid 1.c1414b68
2018-05-17 06:42:36.437 7fc9032f8a40 10 client.4229.objecter _calc_target  raw pgid 1.c1414b68 -> actual 1.168 acting [0] primary 0
2018-05-17 06:42:36.437 7fc9032f8a40 20 client.4229.objecter _get_session s=0x562ff5281cb0 osd=0 3
2018-05-17 06:42:36.437 7fc9032f8a40 10 client.4229.objecter _op_submit oid benchmark_data_ip-172-31-16-244_5986_object332364 '@1' '@1' [delete] tid 1 os
2018-05-17 06:42:36.437 7fc9032f8a40 20 client.4229.objecter get_session s=0x562ff5281cb0 osd=0 3
2018-05-17 06:42:36.437 7fc9032f8a40 15 client.4229.objecter _session_op_assign 0 1
///<------ send the request
2018-05-17 06:42:36.437 7fc9032f8a40 15 client.4229.objecter _send_op 1 to 1.168 on osd.0

osd log:

2018-05-17 06:54:36.559 7f82eb2bd700  1 -- 127.0.0.1:6801/5723 >> - conn(0x56171db24700 :6801 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._process_connection sd=41
2018-05-17 06:54:36.559 7f82eb2bd700 10 osd.0 11  new session 0x561726f20500 con=0x56171db24700 addr=127.0.0.1:0/281423892
2018-05-17 06:54:36.559 7f82eb2bd700  2 -- 127.0.0.1:6801/5723 >> 127.0.0.1:0/281423892 conn(0x56171db24700 :6801 s=STATE_ACCEPTING_WAIT_SEQ pgs=3 cs=1 l
andle_connect_msg accept write reply msg done
2018-05-17 06:54:36.559 7f82eb2bd700  2 -- 127.0.0.1:6801/5723 >> 127.0.0.1:0/281423892 conn(0x56171db24700 :6801 s=STATE_ACCEPTING_WAIT_SEQ pgs=3 cs=1 l
process_connection accept get newly_acked_seq 0
2018-05-17 06:54:36.559 7f82eb2bd700  5 -- 127.0.0.1:6801/5723 >> 127.0.0.1:0/281423892 conn(0x56171db24700 :6801 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DI
2018-05-17 06:54:36.559 7f82eb2bd700  1 -- 127.0.0.1:6801/5723 <== client.4234 127.0.0.1:0/281423892 1 ==== osd_op(client.4234.0:1 1.168 1.c1414b68 (unde
2018-05-17 06:54:36.559 7f82eb2bd700 15 osd.0 11 enqueue_op 0x561726973800 prio 63 cost 0 latency 0.000051 epoch 11 osd_op(client.4234.0:1 1.168 1.c1414b
2018-05-17 06:54:36.559 7f82cea22700 10 osd.0 11 dequeue_op 0x561726973800 prio 63 cost 0 latency 0.000103 osd_op(client.4234.0:1 1.168 1.c1414b68 (undec
2018-05-17 06:54:36.559 7f82cea22700 10 osd.0 pg_epoch: 11 pg[1.168( v 10'1529 (0'0,10'1529] local-lis/les=6/7 n=1529 ec=6/6 lis/c 6/6 les/c/f 7/7/11 6/6
2018-05-17 06:54:36.559 7f82cea22700 10 osd.0 pg_epoch: 11 pg[1.168( v 10'1529 (0'0,10'1529] local-lis/les=6/7 n=1529 ec=6/6 lis/c 6/6 les/c/f 7/7/11 6/6
///// <-------- op is dropped here
2018-05-17 06:54:36.559 7f82cea22700 10 osd.0 pg_epoch: 11 pg[1.168( v 10'1529 (0'0,10'1529] local-lis/les=6/7 n=1529 ec=6/6 lis/c 6/6 les/c/f 7/7/11 6/6
2018-05-17 06:54:36.559 7f82cea22700 10 osd.0 11 dequeue_op 0x561726973800 finish

check the osd's code:

// mds should have stopped writing before this point.
// We can't allow OSD to become non-startable even if mds
// could be writing as part of file removals.
if (write_ordered && osd->check_failsafe_full(get_dpp()) &&
    !m->has_flag(CEPH_OSD_FLAG_FULL_TRY)) {
  dout(10) << __func__ << " fail-safe full check failed, dropping request." << dendl;
  return;
}

So, if we add FULL_TRY tag with the rm request, it will be processed.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #36435: mimic: rados rm --force-full is blocked when cluster is in full statusResolvedNathan CutlerActions
Copied to RADOS - Backport #36436: luminous: rados rm --force-full is blocked when cluster is in full statusResolvedNathan CutlerActions
Actions #2

Updated by Greg Farnum over 5 years ago

  • Project changed from Ceph to RADOS
  • Category changed from ceph cli to Administration/Usability
  • Status changed from New to Fix Under Review
  • Component(RADOS) rados tool added
Actions #3

Updated by Sage Weil over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to mimic,luminous
Actions #4

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36435: mimic: rados rm --force-full is blocked when cluster is in full status added
Actions #5

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36436: luminous: rados rm --force-full is blocked when cluster is in full status added
Actions #6

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF