Project

General

Profile

Actions

Bug #17859

closed

filestore: can get stuck in an unbounded loop during scrub

Added by Sage Weil over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In list_by_hash_{bitwise,nibblewise} there is a condition

    if (cmp_bitwise(j->second, end) >= 0) {
      if (next)
        *next = ghobject_t::get_max();
      return 0;
    }

if we set next to max, the caller doesn't break out of the loop and will continue on to iterate over every subsequent subdir in the collection. this can either be very slow or can make the osd suicide.

This was added during a refactor in 921c4586f165ce39c17ef8b579c548dc8f6f4500. I'm pretty sure it should just set *next = j->second instead.

big big thanks to mistur on irc for helping narrow this down.


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #17781: All OSDs restart randomly on "hit timeout suicide" when scrub activate Duplicate11/02/2016

Actions
Copied to Ceph - Backport #17915: jewel: filestore: can get stuck in an unbounded loop during scrubResolvedLoïc DacharyActions
Actions #1

Updated by Sage Weil over 7 years ago

  • Subject changed from filestore: can get stuck in infinite loop with dirs of size exactly max objects to filestore: can get stuck in an unbounded loop with dirs of size exactly max objects
Actions #2

Updated by Sage Weil over 7 years ago

  • Subject changed from filestore: can get stuck in an unbounded loop with dirs of size exactly max objects to filestore: can get stuck in an unbounded loop during scrub
Actions #3

Updated by Sage Weil over 7 years ago

  • Description updated (diff)
Actions #4

Updated by Sage Weil over 7 years ago

  • Has duplicate Bug #17781: All OSDs restart randomly on "hit timeout suicide" when scrub activate added
Actions #5

Updated by Sage Weil over 7 years ago

  • Status changed from 12 to Pending Backport
  • Priority changed from Immediate to Urgent
Actions #6

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #17915: jewel: filestore: can get stuck in an unbounded loop during scrub added
Actions #8

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF