Project

General

Profile

Bug #17859

filestore: can get stuck in an unbounded loop during scrub

Added by Sage Weil 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
Start date:
11/10/2016
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

In list_by_hash_{bitwise,nibblewise} there is a condition

    if (cmp_bitwise(j->second, end) >= 0) {
      if (next)
        *next = ghobject_t::get_max();
      return 0;
    }

if we set next to max, the caller doesn't break out of the loop and will continue on to iterate over every subsequent subdir in the collection. this can either be very slow or can make the osd suicide.

This was added during a refactor in 921c4586f165ce39c17ef8b579c548dc8f6f4500. I'm pretty sure it should just set *next = j->second instead.

big big thanks to mistur on irc for helping narrow this down.


Related issues

Duplicated by Bug #17781: All OSDs restart randomly on "hit timeout suicide" when scrub activate Duplicate 11/02/2016
Copied to Backport #17915: jewel: filestore: can get stuck in an unbounded loop during scrub Resolved

History

#1 Updated by Sage Weil 4 months ago

  • Subject changed from filestore: can get stuck in infinite loop with dirs of size exactly max objects to filestore: can get stuck in an unbounded loop with dirs of size exactly max objects

#2 Updated by Sage Weil 4 months ago

  • Subject changed from filestore: can get stuck in an unbounded loop with dirs of size exactly max objects to filestore: can get stuck in an unbounded loop during scrub

#3 Updated by Sage Weil 4 months ago

  • Description updated (diff)

#4 Updated by Sage Weil 4 months ago

  • Duplicated by Bug #17781: All OSDs restart randomly on "hit timeout suicide" when scrub activate added

#5 Updated by Sage Weil 4 months ago

  • Status changed from Verified to Pending Backport
  • Priority changed from Immediate to Urgent

#6 Updated by Loic Dachary 4 months ago

  • Copied to Backport #17915: jewel: filestore: can get stuck in an unbounded loop during scrub added

#8 Updated by Loic Dachary 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF