Bug #13257
closedOsd fsid file has been tampered with
0%
Description
We have a ceph242 nodes, found the nodes in the start-up process of osd. Success in 60 has been started.
Through the log found ceph - disk, while reading fsid appear "Too many lines" error.
By looking at the osd. 60 in the corresponding disk space fsid, found fsid recorded some log
sdk1--->osd.60
/var/lib/ceph/tmp/mnt.jEleYY with options noatime,inode64,nouuid Sep 22 21:31:40 ceph2 kernel: [ 232.441741] XFS (sdk1): Mounting Filesystem Sep 22 21:31:40 ceph2 kernel: [ 232.552147] XFS (sdk1): Starting recovery (logdev: internal) Sep 22 21:31:40 ceph2 kernel: [ 232.587603] XFS (sdk1): Ending recovery (logdev: internal) Sep 22 21:31:40 ceph2 ceph[2741]: libust[8328/8328]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375) Sep 22 21:31:40 ceph2 ceph[2741]: ERROR:ceph-disk:Failed to activate Sep 22 21:31:40 ceph2 ceph[2741]: WARNING:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.jEleYY Sep 22 21:31:40 ceph2 ceph[2741]: ceph-disk: Error: File is corrupt: /var/lib/ceph/tmp/mnt.jEleYY/fsid: Too many lines: 2f259a2c-c5de-400e-95a6-e13d0288ac3e Sep 22 21:31:40 ceph2 ceph[2741]: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 22 21:31:40 ceph2 ceph[2741]: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 22 21:31:40 ceph2 ceph[2741]: 2015-09-22 17:04:11.656248 7fe41ce3d900 -1 osd.60 448119 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is but only the following values are allowed: idle, be or rt
fsid's info
root@ceph2 /media]# pwd /media [root@ceph2 /media]# cat whoami 60 [root@ceph2 /media]# head fsid 2f259a2c-c5de-400e-95a6-e13d0288ac3e SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2015-09-22 17:04:11.656248 7fe41ce3d900 -1 osd.60 448119 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is but only the following values are allowed: idle, be or rt 2015-09-22 17:51:45.783415 7fe413b40700 -1 osd.60 449442 heartbeat_check: no reply from osd.34 since back 2015-09-22 17:51:25.590335 front 2015-09-22 17:51:25.590335 (cutoff 2015-09-22 17:51:25.783409) 2015-09-22 17:51:46.783569 7fe413b40700 -1 osd.60 449442 heartbeat_check: no reply from osd.34 since back 2015-09-22 17:51:25.590335 front 2015-09-22 17:51:25.590335 (cutoff 2015-09-22 17:51:26.783563) 2015-09-22 17:52:02.035795 7fe413b40700 -1 osd.60 449456 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:42.035793) 2015-09-22 17:52:03.036250 7fe413b40700 -1 osd.60 449457 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:43.036248) 2015-09-22 17:52:04.036404 7fe413b40700 -1 osd.60 449458 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:44.036399) 2015-09-22 17:52:05.036847 7fe413b40700 -1 osd.60 449459 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:45.036845) [root@ceph2 /media]# 2015-09-23 17:01 -rw-r--r-- 1 root root 3 8? 11 11:35 whoami [root@ceph2 /media]# tail -f fsid 2015-09-22 20:02:01.640506 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.640504) 2015-09-22 20:02:01.640562 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.640504) 2015-09-22 20:02:01.729442 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.729440) 2015-09-22 20:02:01.729458 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.729440) 2015-09-22 20:02:02.729596 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:42.729591) 2015-09-22 20:02:02.729656 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:42.729591) 2015-09-22 20:02:03.342778 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.342776) 2015-09-22 20:02:03.342788 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.342776) 2015-09-22 20:02:03.729905 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.729902) 2015-09-22 20:02:03.729930 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.729902)
Updated by huanwen ren over 8 years ago
huanwen ren wrote:
We have a ceph242 nodes, found the nodes in the start-up process of osd.60 no start success
Through the log found ceph - disk, while reading fsid appear "Too many lines" error.
By looking at the osd. 60 in the corresponding disk space fsid, found fsid recorded some logsdk1--->osd.60
[...]fsid's info
[...]
Updated by Sage Weil over 8 years ago
- Status changed from New to Rejected
Sep 22 21:31:40 ceph2 ceph2741: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Sep 22 21:31:40 ceph2 ceph2741: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
that's a bad disk.
Updated by huanwen ren over 8 years ago
but Disk error does not affect the osd
After I remove fsid exception logging information, osd.60 can normal boot