Project

General

Profile

Bug #12454

I shut down all the osd, but the use of "ceph osd tree" command, or display a osd up

Added by huanwen ren over 8 years ago. Updated almost 7 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 2 node, each node has 3 osds and 1 mon.
But I shut down all osds, The use of "ceph osd tree" command to view,
disply a osd up.

I think all the osd should be down state

no.1.jpg View (35.8 KB) huanwen ren, 07/24/2015 02:59 AM

no.2.jpg View (104 KB) huanwen ren, 07/24/2015 02:59 AM

info1.jpg View (167 KB) huanwen ren, 07/27/2015 03:43 AM

ceph osd dump.txt View (2.09 KB) huanwen ren, 07/27/2015 06:13 AM

ceph osd dump.txt View (2.09 KB) huanwen ren, 07/27/2015 06:13 AM

History

#1 Updated by huang jun over 8 years ago

what about your 'ceph -s' shows how much osds up?
or you can wait for few minutes to see if the status going correct.

#2 Updated by huanwen ren over 8 years ago

huang jun wrote:

what about your 'ceph -s' shows how much osds up?
or you can wait for few minutes to see if the status going correct.

yes?I wait for 5 minutes.
Please see the attachment

#3 Updated by huang jun over 8 years ago

if we stop an osd, it should send message to mon, so mon can mark it in the newest osdmap.
what about 'ceph osd dump' shows?
or your can reproduce it by set debug_mon=20?

#4 Updated by huanwen ren over 8 years ago

ceph-mon.node173.log's have a record

"2015-07-27 14:14:21.239896 7fdad6b76700 5 mon.node173@0(leader).osd e196 can_mark_down current up_ratio 0.166667 < min 0.3, will not mark osd.0 down"

#5 Updated by huang jun over 8 years ago

it's strange, i just stop all my osd, 'ceph -s' and 'ceph osd tree' shows all osds are down.
can you paste your mon log?

#6 Updated by huanwen ren over 8 years ago

2015-07-27 14:13:50.849930 7fdad6375700  1 -- 10.118.202.173:6789/0 --> 10.118.202.173:6789/0 -- log(1 entries) v1 -- ?+0 0x5ba4a40 con 0x4ae0f20
2015-07-27 14:13:50.849944 7fdad6375700 10 mon.node173@0(leader).osd e194 should_propose
2015-07-27 14:13:50.849949 7fdad6375700 10 -- 10.118.202.173:6789/0 dispatch_throttle_release 184 to dispatch throttler 184/104857600
2015-07-27 14:13:50.849966 7fdad6375700  1 -- 10.118.202.173:6789/0 <== mon.0 10.118.202.173:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x5ba4a40 con 0x4ae0f20
2015-07-27 14:13:50.849974 7fdad6375700 20 mon.node173@0(leader) e1 have connection
2015-07-27 14:13:50.849976 7fdad6375700 20 mon.node173@0(leader) e1 ms_dispatch existing session MonSession: mon.0 10.118.202.173:6789/0 is openallow * for mon.0 10.118.202.173:6789/0
2015-07-27 14:13:50.849983 7fdad6375700 20 mon.node173@0(leader) e1  caps allow *
2015-07-27 14:13:50.850006 7fdad6375700 10 mon.node173@0(leader).log v65522 preprocess_query log(1 entries) v1 from mon.0 10.118.202.173:6789/0
2015-07-27 14:13:50.850012 7fdad6375700 10 mon.node173@0(leader).log v65522 preprocess_log log(1 entries) v1 from mon.0
2015-07-27 14:13:50.850016 7fdad6375700 20 is_capable service=log command= write on cap allow *
2015-07-27 14:13:50.850018 7fdad6375700 20  allow so far , doing grant allow *
2015-07-27 14:13:50.850019 7fdad6375700 20  allow all
2015-07-27 14:13:50.850024 7fdad6375700 10 mon.node173@0(leader).log v65522 prepare_update log(1 entries) v1 from mon.0 10.118.202.173:6789/0
2015-07-27 14:13:50.850029 7fdad6375700 10 mon.node173@0(leader).log v65522 prepare_log log(1 entries) v1 from mon.0
2015-07-27 14:13:50.850032 7fdad6375700 10 mon.node173@0(leader).log v65522  logging 2015-07-27 14:13:50.849915 mon.0 10.118.202.173:6789/0 1772 : cluster [INF] osd.1 marked itself down
2015-07-27 14:13:51.238430 7fdad6b76700 11 mon.node173@0(leader) e1 tick
2015-07-27 14:13:51.238453 7fdad6b76700  5 mon.node173@0(leader).osd e194 can_mark_down current up_ratio 0.166667 < min 0.3, will not mark osd.0 down
2015-07-27 14:13:51.238470 7fdad6b76700  5 mon.node173@0(leader).osd e194 can_mark_down current up_ratio 0.166667 < min 0.3, will not mark osd.1 down
2015-07-27 14:13:51.238477 7fdad6b76700 10 mon.node173@0(leader).pg v6576 check_down_pgs
2015-07-27 14:13:51.238489 7fdad6b76700 10 mon.node173@0(leader).pg v6576 v6576: 320 pgs: 141 active+undersized+degraded, 40 stale+active+remapped, 86 active+remapped, 53 stale+active+undersized+degraded; 0 bytes data, 137 GB used, 98663 MB / 234 GB avail
2015-07-27 14:13:51.238524 7fdad6b76700 10 mon.node173@0(leader).osd e194 e194: 6 osds: 2 up, 3 in
2015-07-27 14:13:51.238530 7fdad6b76700 20 mon.node173@0(leader).osd e194 osd.2 laggy halflife 3600 decay_k -0.000192541 down for 2.273915 decay 0.999562
2015-07-27 14:13:51.238620 7fdad6b76700 10 mon.node173@0(leader).osd e194  min_last_epoch_clean 147
2015-07-27 14:13:51.238623 7fdad6b76700 10 mon.node173@0(leader).log v65522 log
2015-07-27 14:13:51.238626 7fdad6b76700 10 mon.node173@0(leader).auth v95 auth
2015-07-27 14:13:51.238632 7fdad6b76700 20 mon.node173@0(leader) e1 sync_trim_providers
2015-07-27 14:13:51.288419 7fdad6b76700 10 mon.node173@0(leader).osd e194 encode_pending e 195
2015-07-27 14:13:51.288440 7fdad6b76700  2 mon.node173@0(leader).osd e194  osd.1 DOWN
2015-07-27 14:13:51.320572 7fdad6b76700  1 -- 10.118.202.173:6789/0 --> mon.1 10.118.202.181:6789/0 -- paxos(begin lc 72959 fc 0 pn 2100 opn 0) v3 -- ?+0 0x5736180
2015-07-27 14:13:51.320605 7fdad6b76700 10 mon.node173@0(leader).log v65522 encode_full log v 65522
2015-07-27 14:13:51.320633 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.320664 7fdad6b76700 10 mon.node173@0(leader).log v65522 encode_pending v65523
2015-07-27 14:13:51.320723 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.349439 7fdad416f700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).reader got ack seq 8754 >= 8754 on 0x5736180 paxos(begin lc 72959 fc 0 pn 2100 opn 0) v3
2015-07-27 14:13:51.349470 7fdad416f700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).reader wants 80 from dispatch throttler 0/104857600
2015-07-27 14:13:51.349484 7fdad416f700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).aborted = 0
2015-07-27 14:13:51.349550 7fdad416f700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).reader got message 608993566 0x5736180 paxos(accept lc 72959 fc 0 pn 2100 opn 0) v3
2015-07-27 14:13:51.349583 7fdad6375700  1 -- 10.118.202.173:6789/0 <== mon.1 10.118.202.181:6789/0 608993566 ==== paxos(accept lc 72959 fc 0 pn 2100 opn 0) v3 ==== 80+0+0 (4139454105 0 0) 0x5736180 con 0x4ae27e0
2015-07-27 14:13:51.349596 7fdad6375700 20 mon.node173@0(leader) e1 have connection
2015-07-27 14:13:51.349584 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.349607 7fdad6375700 20 mon.node173@0(leader) e1 ms_dispatch existing session MonSession: mon.1 10.118.202.181:6789/0 is openallow * for mon.1 10.118.202.181:6789/0
2015-07-27 14:13:51.349613 7fdad6375700 20 mon.node173@0(leader) e1  caps allow *
2015-07-27 14:13:51.349615 7fdad6375700 20 is_capable service=mon command= read on cap allow *
2015-07-27 14:13:51.349611 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).write_ack 608993566
2015-07-27 14:13:51.349617 7fdad6375700 20  allow so far , doing grant allow *
2015-07-27 14:13:51.349619 7fdad6375700 20  allow all
2015-07-27 14:13:51.349620 7fdad6375700 20 is_capable service=mon command= exec on cap allow *
2015-07-27 14:13:51.349622 7fdad6375700 20  allow so far , doing grant allow *
2015-07-27 14:13:51.349623 7fdad6375700 20  allow all
2015-07-27 14:13:51.349625 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.349637 7fdad6375700 10 -- 10.118.202.173:6789/0 dispatch_throttle_release 80 to dispatch throttler 80/104857600
2015-07-27 14:13:51.370580 7fdad84f3700  1 -- 10.118.202.173:6789/0 --> mon.1 10.118.202.181:6789/0 -- paxos(commit lc 72960 fc 0 pn 2100 opn 0) v3 -- ?+0 0x4ee6180
2015-07-27 14:13:51.370616 7fdad84f3700 10 mon.node173@0(leader) e1 refresh_from_paxos
2015-07-27 14:13:51.370637 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.370703 7fdad84f3700 15 mon.node173@0(leader).osd e194 update_from_paxos paxos e 195, my e 194
2015-07-27 14:13:51.370720 7fdad4a71700 10 -- 10.118.202.173:6789/0 >> 10.118.202.181:6789/0 pipe(0x4b48b00 sd=13 :59276 s=2 pgs=79190 cs=1 l=0 c=0x4ae27e0).writer: state = open policy.server=0
2015-07-27 14:13:51.370745 7fdad84f3700  7 mon.node173@0(leader).osd e194 update_from_paxos  applying incremental 195
2015-07-27 14:13:51.370829 7fdad84f3700  1 mon.node173@0(leader).osd e195 e195: 6 osds: 1 up, 3 in
2015-07-27 14:13:51.395524 7fdad84f3700 10 mon.node173@0(leader).osd e195  adding osd.1 to down_pending_out map
2015-07-27 14:13:51.395545 7fdad84f3700 10 mon.node173@0(leader).pg v6576 check_osd_map -- osdmap not readable, waiting
2015-07-27 14:13:51.395549 7fdad84f3700 10 mon.node173@0(leader).osd e195 check_subs
2015-07-27 14:13:51.395553 7fdad84f3700 10 mon.node173@0(leader).osd e195 committed, telling random osd.0 10.118.202.181:6800/6244 all about it
2015-07-27 14:13:51.395561 7fdad84f3700 10 mon.node173@0(leader).osd e195 build_incremental [194..195]

#7 Updated by huang jun over 8 years ago

yes, you can set mon_osd_min_up_ratio=0

#8 Updated by huanwen ren over 8 years ago

oh,yes
thank you

#9 Updated by Sage Weil almost 7 years ago

  • Status changed from New to Won't Fix

#10 Updated by Sage Weil almost 7 years ago

it may take up to 10 min for the mon to notice if all osds are down.

#11 Updated by Greg Farnum almost 7 years ago

The monitor has to time them out and has some limits on marking down whole trees now.

Also available in: Atom PDF