Project

General

Profile

Actions

Bug #17859

closed

filestore: can get stuck in an unbounded loop during scrub

Added by Sage Weil over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In list_by_hash_{bitwise,nibblewise} there is a condition

    if (cmp_bitwise(j->second, end) >= 0) {
      if (next)
        *next = ghobject_t::get_max();
      return 0;
    }

if we set next to max, the caller doesn't break out of the loop and will continue on to iterate over every subsequent subdir in the collection. this can either be very slow or can make the osd suicide.

This was added during a refactor in 921c4586f165ce39c17ef8b579c548dc8f6f4500. I'm pretty sure it should just set *next = j->second instead.

big big thanks to mistur on irc for helping narrow this down.


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #17781: All OSDs restart randomly on "hit timeout suicide" when scrub activate Duplicate11/02/2016

Actions
Copied to Ceph - Backport #17915: jewel: filestore: can get stuck in an unbounded loop during scrubResolvedLoïc DacharyActions
Actions

Also available in: Atom PDF