Project

General

Profile

Bug #15025

New added OSD always down when full flag is set

Added by Libin Wu about 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When some osd in cluster is full, for example:
/dev/vdb1 15717356 15278696 438660 98% /var/lib/ceph/osd/ceph-0
/dev/vdc1 15717356 15276416 440940 98% /var/lib/ceph/osd/ceph-3

Then the flags of osdmap is set full:
osdmap e106: 5 osds: 2 up, 2 in
flags full

Now, use ceph-deploy prepare and active to add new osd to solve the problem, the osd service could up and running, but it's
state always down:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.04997 root default
-2 0.04997 host ceph01
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 down 0 1.00000
3 0.00999 osd.3 up 1.00000 1.00000
4 0.00999 osd.4 down 0 1.00000
2 0.00999 osd.2 down 0 1.00000

Above, osd.2 and osd.4 are the osds just added after osdmap full.

As far as my test, the problem existed both on 0.80.11 and 0.94.5


Related issues

Copied to Ceph - Backport #19484: jewel: New added OSD always down when full flag is set Resolved
Copied to Ceph - Backport #19485: kraken: New added OSD always down when full flag is set Resolved

History

#1 Updated by Libin Wu about 8 years ago

Follow is the analyze from log and code:
1. New osd.4 start up, osdmap is 0, send a MMonSubscribe message to monitor, start is 0 and flag is CEPH_SUBSCRIBE_ONETIME.
2. Monitor use handle_subscribe to handle this request(suppose now the osdmap epoch is 280),it will send a full osdmap to osd.4, epoch is 280.
3. OSD receive this message and handle it in Objecter::handle_osd_map. As there is full flag in osdmap, it will call Objecter::maybe_request_map. This will add a new record <"osdmap", <281, >> into the sub_have map and send a MMonSubscribe message to monitor, start is 281 and flag is not CEPH_SUBSCRIBE_ONETIME. So the flag is not CEPH_SUBSCRIBE_ONETIME, the record will always in the sub_map map.

4. Continue, in OSD::handle_osd_map, OSD found the received osdmap [280, 280] is useless(my epoch is 0, need [0, 280]), so need to subscribe the osdmap, call will like: OSD::osdmap_subscribe(1, true).
But, there is already a "osdmap" record in the sub_have map and the start 280 is newer than 1, so the record will not update. This time, also just a MMonSubscribe message with start 281 will be sent to monitor.

5. Monitor received those requests, but the request osdmap is newer than it has, it will not send any osdmap to osd.4

6. osd.4 has no chance to call start_boot, and the osd always down.

#3 Updated by Nathan Cutler about 8 years ago

  • Status changed from New to Fix Under Review

#5 Updated by Yuri Weinstein about 7 years ago

  • Status changed from Fix Under Review to Pending Backport

#6 Updated by Nathan Cutler about 7 years ago

  • Backport set to jewel, kraken

#7 Updated by Nathan Cutler almost 7 years ago

  • Copied to Backport #19484: jewel: New added OSD always down when full flag is set added

#8 Updated by Nathan Cutler almost 7 years ago

  • Copied to Backport #19485: kraken: New added OSD always down when full flag is set added

#9 Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF