Bug #5173
closedceph scrub found missing pg object
0%
Description
I'm using ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
All data is 3-times replicated (pools Size = 3):
Got last day:
2013-05-26 07:25:45.627279 7f2279192700 0 log [ERR] : 2.df osd.35 missing 128ef5df/rb.0.3573.238e1f29.00000010d0cd/head//2
2013-05-26 07:25:45.627283 7f2279192700 0 log [ERR] : 2.df osd.11 missing 128ef5df/rb.0.3573.238e1f29.00000010d0cd/head//2
2013-05-26 07:38:23.290418 7f2279192700 0 log [ERR] : 2.df deep-scrub stat mismatch, got 8101/8102 objects, 0/0 clones, 9758007408/9758011504 bytes.
2013-05-26 07:38:23.290472 7f2279192700 0 log [ERR] : 2.df deep-scrub 1 missing, 0 inconsistent objects
2013-05-26 07:38:23.290476 7f2279192700 0 log [ERR] : 2.df deep-scrub 3 errors
ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.df is active+clean+inconsistent, acting [35,11,18]
1 scrub errors
As I can understand from log. It was unable to found file on 2 or 3 nodes. How this could be, it's very-very small probability that two osd will fail simultaneously, aren't? And how to fix?
I found:
root@ceph-osd-3-1:/srv/ceph/osd35/current/2.df_head/DIR_F/DIR_D/DIR_5# ls rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
root@ceph-osd-3-1:/srv/ceph/osd35/current/2.df_head/DIR_F/DIR_D/DIR_5# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: 8d0h/2256d Inode: 2240333280 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-05-26 14:19:31.917155860 +0700
Modify: 2013-05-26 14:19:32.353155849 +0700
Change: 2013-05-26 14:19:32.369155851 +0700
root@ceph-osd-2-1:/srv/ceph/osd18/current/2.df_head/DIR_F/DIR_D/DIR_5/DIR_F# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: fe08h/65032d Inode: 1373027321 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-04-06 19:55:14.751128701 +0700
Modify: 2013-04-06 19:55:14.751128701 +0700
Change: 2013-04-06 19:55:14.751128701 +0700
root@ceph-osd-1-1:/srv/ceph/osd11/current/2.df_head/DIR_F/DIR_D/DIR_5/DIR_F# stat rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2
File: «rb.0.3573.238e1f29.00000010d0cd__head_128EF5DF__2»
Size: 4096 Blocks: 16 IO Block: 4096 ????? ??
Device: fe03h/65027d Inode: 1342187995 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-05-26 14:19:45.324109104 +0700
Modify: 2013-05-26 14:19:45.356109105 +0700
Change: 2013-05-26 14:19:45.360109105 +0700
It exists on all OSD devices. What's wrong?
Updated by Ivan Kudryavtsev almost 11 years ago
All files have equal md5 sums equal to:
620f0b67a91f7f74151bc5be745b7110
Updated by Ivan Kudryavtsev almost 11 years ago
Run ceph pg repair 2.df
Finally, I umounted all osds one by one and checked XFS and mounted back with barriers (were nobarrier before).
After mounting, run again repair and it worked ok. PG repaired.
Updated by Sage Weil almost 11 years ago
- Status changed from New to Can't reproduce