Bug #5173: ceph scrub found missing pg object - Ceph - Ceph

Actions

Copy link

Bug #5173

closed

ceph scrub found missing pg object

Added by Ivan Kudryavtsev almost 11 years ago. Updated almost 11 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I'm using ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
All data is 3-times replicated (pools Size = 3):

Got last day:

2013-05-26 07:25:45.627279 7f2279192700 0 log [ERR] : 2.df osd.35 missing 128ef5df/rb.0.3573.238e1f29.00000010d0cd/head//2
2013-05-26 07:25:45.627283 7f2279192700 0 log [ERR] : 2.df osd.11 missing 128ef5df/rb.0.3573.238e1f29.00000010d0cd/head//2
2013-05-26 07:38:23.290418 7f2279192700 0 log [ERR] : 2.df deep-scrub stat mismatch, got 8101/8102 objects, 0/0 clones, 9758007408/9758011504 bytes.
2013-05-26 07:38:23.290472 7f2279192700 0 log [ERR] : 2.df deep-scrub 1 missing, 0 inconsistent objects
2013-05-26 07:38:23.290476 7f2279192700 0 log [ERR] : 2.df deep-scrub 3 errors

ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.df is active+clean+inconsistent, acting [35,11,18]
1 scrub errors

As I can understand from log. It was unable to found file on 2 or 3 nodes. How this could be, it's very-very small probability that two osd will fail simultaneously, aren't? And how to fix?

I found:

root@ceph-osd-3-1:/srv/ceph/osd35/current/2.df_head/DIR_F/DIR_D/DIR_5# ls rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2

root@ceph-osd-3-1:/srv/ceph/osd35/current/2.df_head/DIR_F/DIR_D/DIR_5# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: 8d0h/2256d Inode: 2240333280 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-05-26 14:19:31.917155860 +0700
Modify: 2013-05-26 14:19:32.353155849 +0700
Change: 2013-05-26 14:19:32.369155851 +0700

root@ceph-osd-2-1:/srv/ceph/osd18/current/2.df_head/DIR_F/DIR_D/DIR_5/DIR_F# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: fe08h/65032d Inode: 1373027321 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-04-06 19:55:14.751128701 +0700
Modify: 2013-04-06 19:55:14.751128701 +0700
Change: 2013-04-06 19:55:14.751128701 +0700

root@ceph-osd-1-1:/srv/ceph/osd11/current/2.df_head/DIR_F/DIR_D/DIR_5/DIR_F# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: fe03h/65027d Inode: 1342187995 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-05-26 14:19:45.324109104 +0700
Modify: 2013-05-26 14:19:45.356109105 +0700
Change: 2013-05-26 14:19:45.360109105 +0700

It exists on all OSD devices. What's wrong?

Actions

Copy link

Updated by Ivan Kudryavtsev almost 11 years ago

All files have equal md5 sums equal to:

620f0b67a91f7f74151bc5be745b7110

Actions

Copy link

Updated by Ivan Kudryavtsev almost 11 years ago

Run ceph pg repair 2.df

Finally, I umounted all osds one by one and checked XFS and mounted back with barriers (were nobarrier before).
After mounting, run again repair and it worked ok. PG repaired.

Actions

Copy link

Updated by Sage Weil almost 11 years ago

Status changed from New to Can't reproduce

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #5173

ceph scrub found missing pg object

Updated by Ivan Kudryavtsev almost 11 years ago

Updated by Ivan Kudryavtsev almost 11 years ago

Updated by Sage Weil almost 11 years ago