Project

General

Profile

Actions

Bug #42297

closed

ceph-bluestore-tool repair osd error

Added by Jiang Yu over 4 years ago. Updated over 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/nautilus-p2p
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello everyone,

I am from ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
Upgrade to ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)

1.Upgrade step:
(1)yum install ceph
(2)restart each service
(3)Then check the cluster status:
[root@ceph2 ~]# ceph -s
cluster:
id: c4051efa-1997-43ef-8497-fb02bdf08233
health: HEALTH_WARN
1 filesystem is degraded
noout flag(s) set
Legacy BlueStore stats reporting detected on 6 OSD

services:
mon: 3 daemons, quorum ceph1,ceph3,ceph2 (age 6h)
mgr: ceph2(active, since 6h), standbys: ceph3, ceph1
mds: cephfs:1/1 {0=ceph1=up:replay} 2 up:standby
osd: 6 osds: 6 up, 6 in
flags noout
rgw: 3 daemons active (ceph1, ceph2, ceph3)
data:
pools: 7 pools, 176 pgs
objects: 245 objects, 5.8 KiB
usage: 6.1 GiB used, 293 GiB / 299 GiB avail
pgs: 176 active+clean

(4)In order to solve the 'Legacy BlueStore stats reporting detected on 6 OSD' problem, I checked the ceph users:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036010.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036002.html

(5)And performed a repair
[root@ceph1 ~]# systemctl stop
[root@ceph1 ~]# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1/
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: legacy statfs record found, removing
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool 2
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool ffffffffffffffff
repair success
[root@ceph1 ~]# systemctl start

Problem:

2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: legacy statfs record found, removing
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool 2
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool ffffffffffffffff

An error occurred during the execution of ceph-bluestore-tool repair. Does this issue affect my system?

pool info:
[root@ceph1 ~]# ceph osd dump | grep pool
pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 25 flags hashpspool stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 26 flags hashpspool stripe_width 0 application cephfs
pool 3 '.rgw.root' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 28 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 31 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 34 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.log' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 36 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 7 'rbd' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 45 lfor 0/0/43 flags hashpspool stripe_width 0

Actions #1

Updated by Jiang Yu over 4 years ago

This is because my cluster has 1 filesystem is degraded alarm. When I restore the filesystem, it can succeed.

[root@ceph1 ~]# systemctl stop
[root@ceph1 ~]# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1/
Repair success
[root@ceph1 ~]# systemctl start

Actions #2

Updated by Jiang Yu over 4 years ago

Jiang Yu wrote:

This is because my cluster has 1 filesystem is degraded alarm. When I restore the filesystem, it can succeed.

[root@ceph1 ~]# systemctl stop
[root@ceph1 ~]# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1/
Repair success
[root@ceph1 ~]# systemctl start

Sorry, the above problem will still occur. The reason for not appearing this time is because it was executed twice. The following problems do not occur during the second pass
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: legacy statfs record found, removing
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool 2
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool ffffffffffffffff

Actions #3

Updated by Jiang Yu over 4 years ago

Jiang Yu wrote:

Jiang Yu wrote:

This is because my cluster has 1 filesystem is degraded alarm. When I restore the filesystem, it can succeed.

[root@ceph1 ~]# systemctl stop
[root@ceph1 ~]# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1/
Repair success
[root@ceph1 ~]# systemctl start

Sorry, the above problem will still occur. The reason for not appearing this time is because it was executed twice. The following problems do not occur during the second pass
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: legacy statfs record found, removing
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool 2
2019-10-14 15:39:53.940 7f87c8114f80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool ffffffffffffffff

Does this affect my system?

Actions #4

Updated by Igor Fedotov over 4 years ago

Hi Yu!
are you getting these errors during repair only or it appears afterward as well?
They are expected during repair due to legacy stats in DB. And repair run is required exactly to fix them so they shouldn't appear after successful repair completion.
If that's not the case we should proceed with the additional troubleshooting...

Actions #5

Updated by Jiang Yu over 4 years ago

Igor Fedotov wrote:

Hi Yu!
are you getting these errors during repair only or it appears afterward as well?
They are expected during repair due to legacy stats in DB. And repair run is required exactly to fix them so they shouldn't appear after successful repair completion.
If that's not the case we should proceed with the additional troubleshooting...

Hi lgor Fedotov,

I was experiencing this problem when I just upgraded the cluster to 14.2.4 and performed the fix for the first time.
`ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1/`

When I want to reproduce the problem, the second time I execute the same command, the error no longer appears.

Actions #6

Updated by Igor Fedotov over 4 years ago

I think that's not a bug.
First time the tool showed some errors in DB due to legacy stats layout that left from the original deployment. Then tool cleaned that up and finally you're getting no errors.
So everything works exactly as it's expected to do.

Mind closing the ticket?

Actions #7

Updated by Jiang Yu over 4 years ago

Igor Fedotov wrote:

I think that's not a bug.
First time the tool showed some errors in DB due to legacy stats layout that left from the original deployment. Then tool cleaned that up and finally you're getting no errors.
So everything works exactly as it's expected to do.

Mind closing the ticket?

Ok, no problem

Actions #8

Updated by Igor Fedotov over 4 years ago

  • Status changed from New to Rejected
Actions

Also available in: Atom PDF