Bug #12738
scrub bogus results when missing a clone
0%
Description
2015-08-20 20:15:47.150457 mon.0 10.12.2.2:6789/0 1313 : cluster [INF] pgmap v5060458: 2048 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 1 active+clean+scrubbing+deep+inconsistent, 2046 active+clean; 6933 GB data, 19127 GB used, 150 TB / 169 TB avail; 3745 kB/s rd, 10543 kB/s wr, 559 op/s
2015-08-20 20:15:43.295815 osd.19 10.12.2.6:6838/1861727 295 : cluster [INF] 2.490 repair starts
2015-08-20 20:15:44.865367 osd.19 10.12.2.6:6838/1861727 296 : cluster [ERR] repair 2.490 3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2 missing clones
2015-08-20 20:15:44.865606 osd.19 10.12.2.6:6838/1861727 297 : cluster [ERR] repair 2.490 2d7b9490/rbd_data.18f92c3d1b58ba.0000000000006167/head//2 expected clone 3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/141//2
2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster [ERR] repair 2.490 68c89490/rbd_data.16796a3d1b58ba.0000000000000047/head//2 expected clone 2d7b9490/rbd_data.18f92c3d1b58ba.0000000000006167/141//2
2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster [ERR] repair 2.490 ded49490/rbd_data.11a25c7934d3d4.0000000000008a8a/head//2 expected clone 68c89490/rbd_data.16796a3d1b58ba.0000000000000047/141//2
2015-08-20 20:15:44.865881 osd.19 10.12.2.6:6838/1861727 300 : cluster [ERR] repair 2.490 bac19490/rbd_data.1238e82ae8944a.000000000000032e/head//2 expected clone ded49490/rbd_data.11a25c7934d3d4.0000000000008a8a/141//2
2015-08-20 20:15:44.865965 osd.19 10.12.2.6:6838/1861727 301 : cluster [ERR] repair 2.490 83319490/rbd_data.18e0c63d1b58ba.000000000000beb0/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.000000000000032e/141//2
2015-08-20 20:15:44.866127 osd.19 10.12.2.6:6838/1861727 302 : cluster [ERR] repair 2.490 c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/head//2 expected clone 83319490/rbd_data.18e0c63d1b58ba.000000000000beb0/141//2
2015-08-20 20:15:44.866268 osd.19 10.12.2.6:6838/1861727 303 : cluster [ERR] repair 2.490 a0cd8490/rbd_data.e4b68613183f2.0000000000010bd9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/141//2
2015-08-20 20:15:44.866360 osd.19 10.12.2.6:6838/1861727 304 : cluster [ERR] repair 2.490 77ad8490/rbd_data.1238e82ae8944a.0000000000000890/head//2 expected clone a0cd8490/rbd_data.e4b68613183f2.0000000000010bd9/141//2
2015-08-20 20:15:44.866402 osd.19 10.12.2.6:6838/1861727 305 : cluster [ERR] repair 2.490 4f5d8490/rbd_data.19f7e72ae8944a.0000000000000baf/head//2 expected clone 77ad8490/rbd_data.1238e82ae8944a.0000000000000890/141//2
2015-08-20 20:15:44.866553 osd.19 10.12.2.6:6838/1861727 306 : cluster [ERR] repair 2.490 aa5a8490/rbd_data.18e0c63d1b58ba.0000000000031ff2/head//2 expected clone 4f5d8490/rbd_data.19f7e72ae8944a.0000000000000baf/141//2
2015-08-20 20:15:44.866685 osd.19 10.12.2.6:6838/1861727 307 : cluster [ERR] repair 2.490 ae0a8490/rbd_data.1a13562ae8944a.0000000000011861/head//2 expected clone aa5a8490/rbd_data.18e0c63d1b58ba.0000000000031ff2/141//2
2015-08-20 20:15:44.866759 osd.19 10.12.2.6:6838/1861727 308 : cluster [ERR] repair 2.490 66398490/rbd_data.16610e3d1b58ba.000000000000166e/head//2 expected clone ae0a8490/rbd_data.1a13562ae8944a.0000000000011861/141//2
2015-08-20 20:15:44.866783 osd.19 10.12.2.6:6838/1861727 309 : cluster [ERR] repair 2.490 5a688490/rbd_data.18e0c63d1b58ba.00000000000124b9/head//2 expected clone 66398490/rbd_data.16610e3d1b58ba.000000000000166e/141//2
2015-08-20 20:15:44.866797 osd.19 10.12.2.6:6838/1861727 310 : cluster [ERR] repair 2.490 6d178490/rbd_data.1238e82ae8944a.000000000000168f/head//2 expected clone 5a688490/rbd_data.18e0c63d1b58ba.00000000000124b9/141//2
2015-08-20 20:15:44.866818 osd.19 10.12.2.6:6838/1861727 311 : cluster [ERR] repair 2.490 a9b68490/rbd_data.18e0c63d1b58ba.00000000000319bb/head//2 expected clone 6d178490/rbd_data.1238e82ae8944a.000000000000168f/141//2
2015-08-20 20:15:44.866832 osd.19 10.12.2.6:6838/1861727 312 : cluster [ERR] repair 2.490 fde58490/rbd_data.18f92c3d1b58ba.000000000000a4b9/head//2 expected clone a9b68490/rbd_data.18e0c63d1b58ba.00000000000319bb/141//2
I thinks it's some kind of silly off-by-one in ReplicatedPG::_scrub.
Related issues
Associated revisions
osd: Make the _scrub routine produce good output and detect errors properly
Catch decode errors so osd doesn't crash on corrupt OI_ATTR or SS_ATTR
Use boost::optional<> to make current state clearer
Create next_clone as needed using head/curclone
Add equivalent logic after getting to end of scrubmap.objects
Fixes: #12738
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Make the _scrub routine produce good output and detect errors properly
Catch decode errors so osd doesn't crash on corrupt OI_ATTR or SS_ATTR
Use boost::optional<> to make current state clearer
Create next_clone as needed using head/curclone
Add equivalent logic after getting to end of scrubmap.objects
Fixes: #12738
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit a23036c6fd7de5d1dbc2bd30c967c0be51d94ca5)
Conflicts:
src/osd/ReplicatedPG.cc (no num_objects_pinned in hammer)
src/osd/ReplicatedPG.h (no get_temp_recovery_object() in hammer)
History
#1 Updated by Samuel Just over 8 years ago
- Status changed from New to 12
#2 Updated by David Zafman over 8 years ago
- Assignee set to David Zafman
#3 Updated by David Zafman over 8 years ago
- Status changed from 12 to In Progress
#4 Updated by David Zafman over 8 years ago
- Status changed from In Progress to Fix Under Review
Part of pull request # 5783
0ac3dbb7ad2eedf570c56afa8fda327b78492388
#5 Updated by David Zafman over 8 years ago
- Status changed from Fix Under Review to Resolved
a23036c6fd7de5d1dbc2bd30c967c0be51d94ca5
#6 Updated by Ken Dreyer over 8 years ago
- Status changed from Resolved to Pending Backport
- Backport changed from hammer,firefly to hammer
#7 Updated by Loïc Dachary over 8 years ago
- Copied to Backport #14077: hammer: scrub bogus results when missing a clone added
#8 Updated by Loïc Dachary about 8 years ago
- Status changed from Pending Backport to Resolved