Project

General

Profile

Actions

Support #22956

closed

Ceph cluster single node in error state

Added by Nitesh Sharma about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I have deployed cephh cluster on single node with command line using following document on RHOS:7.4
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/installation_guide_for_red_hat_enterprise_linux/index#osd_bootstrapping

I tried it three time thinking might me doing anything wrong but looks like all steps were perfectly performed. No error in logs.

In audit log i see
--------------------------------------

2018-02-08 00:38:56.741341 mon.0 192.168.4.148:6789/0 32 : audit [INF] from='client.? 192.168.4.148:0/2041677954' entity='client.admin' cmd=[{"prefix": "osd crush add", "args": ["host=cephserver"], "id": 0, "weight": 1.0}]: dispatch
2018-02-08 00:38:56.793342 mon.0 192.168.4.148:6789/0 33 : audit [INF] from='client.? 192.168.4.148:0/2041677954' entity='client.admin' cmd='[{"prefix": "osd crush add", "args": ["host=cephserver"], "id": 0, "weight": 1.0}]': finished
2018-02-08 00:40:04.283666 mon.0 192.168.4.148:6789/0 36 : audit [INF] from='client.? 192.168.4.148:0/4262743190' entity='osd.0' cmd=[{"prefix": "osd crush create-or-move", "args": ["host=cephserver", "root=default"], "id": 0, "weight": 0.488}]: dispatch
----------------------------------

Is this the error ?

OSD logs :
------------------------------------------------------------------------------

2018-02-08 00:37:26.085824 7fd888611a40 -1 created new key in keyring /var/lib/ceph/osd/ceph-0/keyring
2018-02-08 00:40:04.321558 7fd17fc3ea40 0 set uid:gid to 167:167 (ceph:ceph)
2018-02-08 00:40:04.321581 7fd17fc3ea40 0 ceph version 10.2.7-48.el7cp (cf7751bcd460c757e596d3ee2991884e13c37b96), process ceph-osd, pid 11571
2018-02-08 00:40:04.322319 7fd17fc3ea40 0 pidfile_write: ignore empty --pid-file
2018-02-08 00:40:04.355459 7fd17fc3ea40 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2018-02-08 00:40:04.355783 7fd17fc3ea40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2018-02-08 00:40:04.355797 7fd17fc3ea40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2018-02-08 00:40:04.355815 7fd17fc3ea40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported
2018-02-08 00:40:04.356162 7fd17fc3ea40 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2018-02-08 00:40:04.356206 7fd17fc3ea40 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf
2018-02-08 00:40:04.356888 7fd17fc3ea40 1 leveldb: Recovering log #5
2018-02-08 00:40:04.356934 7fd17fc3ea40 1 leveldb: Level-0 table #7: started
2018-02-08 00:40:04.357472 7fd17fc3ea40 1 leveldb: Level-0 table #7: 146 bytes OK
2018-02-08 00:40:04.358583 7fd17fc3ea40 1 leveldb: Delete type=0 #5

2018-02-08 00:40:04.358625 7fd17fc3ea40 1 leveldb: Delete type=3 #4

2018-02-08 00:40:04.358743 7fd17fc3ea40 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2018-02-08 00:40:04.358872 7fd17fc3ea40 1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2018-02-08 00:40:04.358880 7fd17fc3ea40 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2018-02-08 00:40:04.377764 7fd17fc3ea40 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2018-02-08 00:40:04.378052 7fd17fc3ea40 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade
2018-02-08 00:40:04.379084 7fd17fc3ea40 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
2018-02-08 00:40:04.379246 7fd17fc3ea40 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2018-02-08 00:40:04.384628 7fd17fc3ea40 0 osd.0 0 crush map has features 2199057072128, adjusting msgr requires for clients
2018-02-08 00:40:04.384637 7fd17fc3ea40 0 osd.0 0 crush map has features 2199057072128 was 8705, adjusting msgr requires for mons
2018-02-08 00:40:04.384640 7fd17fc3ea40 0 osd.0 0 crush map has features 2199057072128, adjusting msgr requires for osds
2018-02-08 00:40:04.384674 7fd17fc3ea40 0 osd.0 0 load_pgs
2018-02-08 00:40:04.384690 7fd17fc3ea40 0 osd.0 0 load_pgs opened 0 pgs
2018-02-08 00:40:04.384700 7fd17fc3ea40 0 osd.0 0 using 0 op queue with priority op cut off at 64.
2018-02-08 00:40:04.385545 7fd17fc3ea40 -1 osd.0 0 log_to_monitors {default=true}
2018-02-08 00:40:04.388402 7fd17fc3ea40 0 osd.0 0 done with init, starting boot process
2018-02-08 00:40:05.340232 7fd17098b700 0 osd.0 6 crush map has features 2200130813952, adjusting msgr requires for clients
2018-02-08 00:40:05.340240 7fd17098b700 0 osd.0 6 crush map has features 2200130813952 was 2199057080833, adjusting msgr requires for mons
2018-02-08 00:40:05.340246 7fd17098b700 0 osd.0 6 crush map has features 2200130813952, adjusting msgr requires for osds
----------------------------------------------------------

mon logs:
-----------------------------------------------------------

2018-02-08 00:43:05.222962 7f5938acc700 1 mon.cephserver@0(leader).log v40 check_sub sending message to client.? 192.168.4.148:0/1548506271 with 0 entries (version 40)
2018-02-08 00:43:50.004946 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:44:50.005137 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:45:50.005293 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:46:50.005463 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:47:50.005661 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:48:50.005852 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:49:22.151908 7f59372c9700 0 mon.cephserver@0(leader) e1 handle_command mon_command({"prefix": "status"} v 0) v1
2018-02-08 00:49:22.151943 7f59372c9700 0 log_channel(audit) log [DBG] : from='client.? 192.168.4.148:0/2177931169' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2018-02-08 00:49:22.153115 7f59372c9700 1 mon.cephserver@0(leader).log v40 check_sub sending message to client.? 192.168.4.148:0/2177931169 with 1 entries (version 40)
2018-02-08 00:49:22.204363 7f5938acc700 1 mon.cephserver@0(leader).log v41 check_sub sending message to client.? 192.168.4.148:0/2177931169 with 0 entries (version 41)
2018-02-08 00:49:50.006043 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:50:50.006261 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:51:50.006434 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:52:50.006600 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:53:50.006774 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:54:50.006956 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:55:50.007124 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:56:50.007264 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:57:50.007422 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:58:27.593376 7f59372c9700 0 mon.cephserver@0(leader) e1 handle_command mon_command({"prefix": "status"} v 0) v1
2018-02-08 00:58:27.593412 7f59372c9700 0 log_channel(audit) log [DBG] : from='client.? 192.168.4.148:0/1268865295' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2018-02-08 00:58:27.594759 7f59372c9700 1 mon.cephserver@0(leader).log v41 check_sub sending message to client.? 192.168.4.148:0/1268865295 with 1 entries (version 41)
2018-02-08 00:58:27.646094 7f5938acc700 1 mon.cephserver@0(leader).log v42 check_sub sending message to client.? 192.168.4.148:0/1268865295 with 0 entries (version 42)
2018-02-08 00:58:50.007588 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 00:59:50.007755 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:00:50.007933 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:01:50.008137 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:02:50.008320 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:03:50.008494 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:04:50.008673 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:05:50.008854 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:06:50.009037 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:07:50.009217 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB
2018-02-08 01:08:50.009414 7f5937aca700 0 mon.cephserver@0(leader).data_health(3) update_stats avail 98% total 93138 MB, used 1503 MB, avail 91635 MB


ceph s
---------------------------------

[root@cephserver ~]# ceph -s
cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
health HEALTH_ERR
64 pgs are stuck inactive for more than 300 seconds
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck inactive
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
monmap e1: 1 mons at {cephserver=192.168.4.148:6789/0}
election epoch 3, quorum 0 cephserver
osdmap e7: 1 osds: 1 up, 1 in
flags sortbitwise,require_jewel_osds,recovery_deletes
pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects
5152 MB used, 494 GB / 499 GB avail
64 undersized+degraded+peered


[root@cephserver ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.00000 root default
-2 1.00000 host cephserver
0 1.00000 osd.0 up 1.00000 1.00000

Any help appreciated here


Files

ceph.audit.log (4.09 KB) ceph.audit.log Nitesh Sharma, 02/08/2018 06:18 AM
ceph-mon.c.log (31.2 KB) ceph-mon.c.log Nitesh Sharma, 02/08/2018 06:18 AM
ceph-mon.cephserver.log (30.7 KB) ceph-mon.cephserver.log Nitesh Sharma, 02/08/2018 06:18 AM
ceph-osd.0.log (7.26 KB) ceph-osd.0.log Nitesh Sharma, 02/08/2018 06:18 AM
Actions #1

Updated by Nathan Cutler about 6 years ago

  • Tracker changed from Tasks to Support
  • Project changed from Stable releases to Ceph
Actions #2

Updated by Nitesh Sharma about 6 years ago

[global]
fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
mon initial members = cephserver
mon host = 192.168.4.148
public network = 192.168.0.0/16
#cluster network = 192.168.0.0/16
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
[osd]
journal aio = true
journal dio = true
journal block align = true
journal force aio = true
debug osd = 20
debug filestore = 20
debug ms = 1
osd journal size = 1024

[osd.0]
osd host = cephserver

Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to Closed

The issue is that you have insufficient OSDs for the redundancy of your pool (probably replication 3). These types of issues should be raised on the mailing list (if you are using upstream) or via support channels if you're using RHCS.

Actions

Also available in: Atom PDF