Project

General

Profile

Actions

Bug #2002

closed

osd: racy push/pull for clones

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There is currently a race where:
- an adjacent clone is missing
- we (calculate some clone overlap? and) start pulling
- we get adjacent clone
- we get push, calc a different overlap, and then get confused.

Also, we don't work efficiently when pulling clones in parallel. We should probably serialize on each object_t so that we don't waste disk space. Recovery will probably still be faster.


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Feature #2055: osd: fix up push cloningDuplicate

Actions
Has duplicate Ceph - Bug #1943: osd: bad clone transaction on journal replayDuplicateSage Weil01/14/2012

Actions
Actions #1

Updated by Sage Weil about 12 years ago

:osd.log.badpushpull

shows the (or similar) badness. workload was


kernel:
  branch: master
interactive-on-error: true

roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - client.0
  - osd.3
  - osd.4
  - osd.5
tasks:
- ceph:
    btrfs: 1
    log-whitelist:
    - wrongly marked me down or wrong addr
    conf:
      osd:
        debug ms: 1
        debug osd: 20
      mon:
        debug ms: 10
        debug mon: 20
- thrashosds:
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      write: 100
      snap_create: 50
      snap_remove: 50
      snap_rollback: 50
    ops: 4000

on 2116f012eddfe3278fcdfeb5a2ddc877491d210d
Actions #2

Updated by Sage Weil about 12 years ago

  • Target version set to v0.43
Actions #3

Updated by Sage Weil about 12 years ago

  • Status changed from New to 7
  • Source set to Development

reenabling this in my thrashing tests. if all goes well, i'll reenable in master under the assumption that sam's cleanups addressed the problem.

Actions #4

Updated by Sage Weil about 12 years ago

  • Target version changed from v0.43 to v0.44
Actions #5

Updated by Sage Weil about 12 years ago

  • Status changed from 7 to Resolved

haven't seen this in forever; looks fixed.

Actions #6

Updated by Sage Weil about 12 years ago

  • Status changed from Resolved to 7
  • Target version changed from v0.44 to v0.45

i take that back; this wasn't enabled in qa. adding to the teuthology ceph.conf file.

Actions #7

Updated by Sage Weil about 12 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF