Project

General

Profile

Bug #46137

Monitor leader is marking multiple osd's down

Added by Prayank Saxena over 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My ceph cluster consist of 5 Mon and 58 DN with 1302 total osd's (HDD's) with 12.2.8 Luminous (stable) version and Filestore

We have large store.db size across 5 MON nodes size 18GiB > threshold value of 15GiB
On Friday we have increased mon_data-size_warn from 15GiB to 30GiB and restarted the Mon service in rolling fashion, since restart store.db size reduced to around 2GiB earch and we are getting below logs in /var/log/ceph/ceph.log in Monitor node:

2020-06-22 11:59:01.538165 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332447 : cluster [INF] osd.385 marked down after no beacon for 300.784393 seconds
2020-06-22 11:59:01.538222 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332448 : cluster [INF] osd.405 marked down after no beacon for 300.959673 seconds
2020-06-22 11:59:01.538288 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332449 : cluster [INF] osd.510 marked down after no beacon for 300.717532 seconds
2020-06-22 11:59:01.538337 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332450 : cluster [INF] osd.524 marked down after no beacon for 300.784415 seconds
2020-06-22 11:59:01.538385 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332451 : cluster [INF] osd.688 marked down after no beacon for 300.738115 seconds
2020-06-22 11:59:01.538411 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332452 : cluster [INF] osd.715 marked down after no beacon for 300.682862 seconds
2020-06-22 11:59:01.538449 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332453 : cluster [INF] osd.868 marked down after no beacon for 300.771166 seconds
2020-06-22 11:59:01.538506 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332454 : cluster [INF] osd.1091 marked down after no beacon for 300.945268 seconds
2020-06-22 11:59:01.538526 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332455 : cluster [INF] osd.1096 marked down after no beacon for 300.933589 seconds
2020-06-22 11:59:01.538565 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332456 : cluster [INF] osd.1195 marked down after no beacon for 300.763656 seconds
2020-06-22 11:59:01.538593 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332457 : cluster [INF] osd.1252 marked down after no beacon for 300.759198 seconds
2020-06-22 11:59:01.538619 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332458 : cluster [INF] osd.1284 marked down after no beacon for 300.774742 seconds
2020-06-22 11:59:01.538648 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332459 : cluster [INF] osd.1328 marked down after no beacon for 300.784440 seconds
2020-06-22 11:59:01.538673 mon.nvmbd1cgy130d00 mon.0 10.137.78.218:6789/0 332460 : cluster [INF] osd.1355 marked down after no beacon for 300.776401 seconds

leader (10.137.78.218)
Jun 21 03:36:05 nvmbd1cgy130d00 ceph-mon: 2020-06-21 03:36:05.831788 7f0760c3d700 -1 mon.nvmbd1cgy130d00@0(leader).osd e1091247 no beacon from osd.182 since 2020-06-21 03:31:04.899937, 300.931732 seconds ago. marking down
Jun 21 03:36:05 nvmbd1cgy130d00 ceph-mon: 2020-06-21 03:36:05.831809 7f0760c3d700 -1 mon.nvmbd1cgy130d00@0(leader).osd e1091247 no beacon from osd.188 since 2020-06-21 03:31:04.809882, 301.021786 seconds ago. marking down
Jun 21 03:36:05 nvmbd1cgy130d00 ceph-mon: 2020-06-21 03:36:05.831833 7f0760c3d700 -1 mon.nvmbd1cgy130d00@0(leader).osd e1091247 no beacon from osd.206 since 2020-06-21 03:31:04.863162, 300.968507 seconds ago. marking down
Jun 21 03:36:05 nvmbd1cgy130d00 ceph-mon: 2020-06-21 03:36:05.831866 7f0760c3d700 -1 mon.nvmbd1cgy130d00@0(leader).osd e1091247 no beacon from osd.282 since 2020-06-21 03:31:04.825000, 301.006669 seconds ago. marking down

Below is the ceph.conf detail of Mon node
  1. Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
  1. debug_lockdep = 0/0
  2. debug_context = 0/0
  3. debug_crush = 0/0
  4. debug_buffer = 0/0
  5. debug_timer = 0/0
  6. debug_filer = 0/0
  7. debug_objecter = 0/0
  8. debug_rados = 0/0
  9. debug_rbd = 0/0
  10. debug_journaler = 0/0
  11. debug_objectcatcher = 0/0
  12. debug_client = 0/0
  13. debug_osd = 0/0
  14. debug_optracker = 0/0
  15. debug_objclass = 0/0
  16. debug_filestore = 0/0
  17. debug_journal = 0/0
  18. debug_ms = 0/5
  19. debug_monc = 0/0
  20. debug_tp = 0/0
  21. debug_auth = 1/5
  22. debug_finisher = 0/0
  23. debug_heartbeatmap = 1/5
  24. debug_perfcounter = 0/0
  25. debug_asok = 0/0
  26. debug_throttle = 0/0
  27. debug_mon = 1/5
  28. debug_paxos = 1/5
  29. debug_rgw = 0/0
  30. debug_none = 0/0
  31. debug_mds = 0/0
  32. debug_mds_balancer = 0/0
  33. debug_mds_locker = 0/0
  34. debug_mds_log = 0/0
  35. debug_mds_log_expire = 0/0
  36. debug_mds_migrator = 0/0
  37. debug_striper = 0/1
  38. debug_rbd_replay = 0/0
  39. debug_objectcacher = 0/0
  40. debug_keyvaluestore = 0/0
  41. debug_crypto = 0/0
  42. debug_civetweb = 0/0
  43. debug_javaclient = 0/0
  44. debug_xio = 0/0
    log_file = /home/ceph/logs/$cluster-$name.log
mon_initial_members = NVMBD1CGK190D00,nvmbd1cgy050d00,nvmbd1cgy070d00,nvmbd1cgy090d00,nvmbd1cgy130d00
mon addr = 10.137.81.13:6789/0,10.137.78.226:6789/0,10.137.78.232:6789/0,10.137.78.228:6789/0,10.137.78.218:6789/0
cephx require signatures = False # Kernel RBD does NOT support signatures!
cephx cluster require signatures = True
cephx service require signatures = False
fsid = 1a26e029-3734-4b0e-b86e-ca2778d0c990
max open files = 131072
osd pool default pg num = 128
osd pool default pgp num = 128
osd pool default size = 3
osd pool default min size = 1
osd pool default crush rule = 0
  1. Disable in-memory logs

[client]
rbd cache = true
rbd cache writethrough until flush = true
rbd concurrent management ops = 20
admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor
rbd default map options = rw
rbd default features = 3 # sum features digits
rbd default format = 2

[mon]
mon data size warn = 32212254720
mon warn on legacy crush tunables = false
mon osd down out interval = 600
mon osd min down reporters = 24
mon clock drift allowed = 0.15
mon clock drift warn backoff = 30
mon osd full ratio = 0.95
mon osd nearfull ratio = 0.9
mon osd report timeout = 300
mon pg warn max per osd = 0
mon osd allow primary affinity = true
mon pg warn max object skew = 10
mon osd adjust heartbeat grace = false
mon osd adjust down out interval = false
mon compact on start = false

[mon.nvmbd1cgy050d00]
host = nvmbd1cgy050d00
mon addr = 10.137.78.226
[mon.nvmbd1cgy070d00]
host = nvmbd1cgy070d00
mon addr = 10.137.78.232
[mon.nvmbd1cgy090d00]
host = nvmbd1cgy090d00
mon addr = 10.137.78.228
[mon.nvmbd1cgy130d00]
host = nvmbd1cgy130d00
mon addr = 10.137.78.218
[mon.NVMBD1CGK190D00]
host = NVMBD1CGK190D00
mon addr = 10.137.81.13

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 10240
cluster_network = 192.168.228.0/26,192.168.211.0/24
public_network = 10.137.78.0/23,10.137.81.0/24
osd mon heartbeat interval = 120 # Performance tuning
filestore merge threshold = 40
filestore split multiple = 8
osd enable_op_tracker = true
osd max scrubs = 1
osd scrub thread suicide timeout = 300
filestore op thread suicide timeout = 300
osd op thread suicide timeout = 300
keyvaluestore op thread suicide timeout = 300
osd op threads = 16
osd disk threads = 8 # Recovery tuning
osd recovery max active = 5
osd max backfills = 2
osd backfill full ratio = 0.87
osd recovery op priority = 2
osd recovery max chunk = 1048576
osd recovery threads = 1
osd objectstore = filestore
osd crush update on start = true # Deep scrub impact
osd scrub sleep = 0.1
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 0
osd scrub chunk max = 5
osd deep scrub stride = 1048576
osd heartbeat grace = 120

History

#1 Updated by Prayank Saxena over 3 years ago

Every few mins multiple osd's are going down and coming back up which is causing recovery of data, This is occurring every few mins and formed a loop, can anyone help me understand why i am facing this issue and is there any solution to resolve this issue?

#2 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)

Also available in: Atom PDF