Project

General

Profile

Actions

Bug #13257

closed

Osd fsid file has been tampered with

Added by huanwen ren over 8 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a ceph242 nodes, found the nodes in the start-up process of osd. Success in 60 has been started.
Through the log found ceph - disk, while reading fsid appear "Too many lines" error.
By looking at the osd. 60 in the corresponding disk space fsid, found fsid recorded some log

sdk1--->osd.60

/var/lib/ceph/tmp/mnt.jEleYY with options noatime,inode64,nouuid
Sep 22 21:31:40 ceph2 kernel: [  232.441741] XFS (sdk1): Mounting Filesystem
Sep 22 21:31:40 ceph2 kernel: [  232.552147] XFS (sdk1): Starting recovery (logdev: internal)
Sep 22 21:31:40 ceph2 kernel: [  232.587603] XFS (sdk1): Ending recovery (logdev: internal)
Sep 22 21:31:40 ceph2 ceph[2741]: libust[8328/8328]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Sep 22 21:31:40 ceph2 ceph[2741]: ERROR:ceph-disk:Failed to activate
Sep 22 21:31:40 ceph2 ceph[2741]: WARNING:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.jEleYY
Sep 22 21:31:40 ceph2 ceph[2741]: ceph-disk: Error: File is corrupt: /var/lib/ceph/tmp/mnt.jEleYY/fsid: Too many lines: 2f259a2c-c5de-400e-95a6-e13d0288ac3e
Sep 22 21:31:40 ceph2 ceph[2741]: SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Sep 22 21:31:40 ceph2 ceph[2741]: SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Sep 22 21:31:40 ceph2 ceph[2741]: 2015-09-22 17:04:11.656248 7fe41ce3d900 -1 osd.60 448119 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed: idle, be or rt

fsid's info

root@ceph2 /media]# pwd
/media
[root@ceph2 /media]# cat whoami 
60
[root@ceph2 /media]# head fsid
2f259a2c-c5de-400e-95a6-e13d0288ac3e
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2015-09-22 17:04:11.656248 7fe41ce3d900 -1 osd.60 448119 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed: idle, be or rt
2015-09-22 17:51:45.783415 7fe413b40700 -1 osd.60 449442 heartbeat_check: no reply from osd.34 since back 2015-09-22 17:51:25.590335 front 2015-09-22 17:51:25.590335 (cutoff 2015-09-22 17:51:25.783409)
2015-09-22 17:51:46.783569 7fe413b40700 -1 osd.60 449442 heartbeat_check: no reply from osd.34 since back 2015-09-22 17:51:25.590335 front 2015-09-22 17:51:25.590335 (cutoff 2015-09-22 17:51:26.783563)
2015-09-22 17:52:02.035795 7fe413b40700 -1 osd.60 449456 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:42.035793)
2015-09-22 17:52:03.036250 7fe413b40700 -1 osd.60 449457 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:43.036248)
2015-09-22 17:52:04.036404 7fe413b40700 -1 osd.60 449458 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:44.036399)
2015-09-22 17:52:05.036847 7fe413b40700 -1 osd.60 449459 heartbeat_check: no reply from osd.37 since back 2015-09-22 17:51:41.998089 front 2015-09-22 17:51:41.998089 (cutoff 2015-09-22 17:51:45.036845)
[root@ceph2 /media]# 

2015-09-23 17:01

-rw-r--r--   1 root root      3 8?  11 11:35 whoami
[root@ceph2 /media]# tail -f fsid
2015-09-22 20:02:01.640506 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.640504)
2015-09-22 20:02:01.640562 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.640504)
2015-09-22 20:02:01.729442 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.729440)
2015-09-22 20:02:01.729458 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:41.729440)
2015-09-22 20:02:02.729596 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:42.729591)
2015-09-22 20:02:02.729656 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:42.729591)
2015-09-22 20:02:03.342778 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.342776)
2015-09-22 20:02:03.342788 7fe3f885b700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.342776)
2015-09-22 20:02:03.729905 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.33 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.729902)
2015-09-22 20:02:03.729930 7fe413b40700 -1 osd.60 450045 heartbeat_check: no reply from osd.37 since back 2015-09-22 20:01:41.322865 front 2015-09-22 20:01:41.322865 (cutoff 2015-09-22 20:01:43.729902)

Actions #1

Updated by huanwen ren over 8 years ago

huanwen ren wrote:

We have a ceph242 nodes, found the nodes in the start-up process of osd.60 no start success

Through the log found ceph - disk, while reading fsid appear "Too many lines" error.
By looking at the osd. 60 in the corresponding disk space fsid, found fsid recorded some log

sdk1--->osd.60
[...]

fsid's info
[...]

Actions #2

Updated by huanwen ren over 8 years ago

Correction:osd.60 no start success

Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from New to Rejected

Sep 22 21:31:40 ceph2 ceph2741: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Sep 22 21:31:40 ceph2 ceph2741: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 18 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

that's a bad disk.

Actions #4

Updated by huanwen ren over 8 years ago

but Disk error does not affect the osd
After I remove fsid exception logging information, osd.60 can normal boot

Actions

Also available in: Atom PDF