Bug #2002: osd: racy push/pull for clones - Ceph - Ceph

Actions

Copy link

Bug #2002

closed

osd: racy push/pull for clones

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

v0.45

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

There is currently a race where:
- an adjacent clone is missing
- we (calculate some clone overlap? and) start pulling
- we get adjacent clone
- we get push, calc a different overlap, and then get confused.

Also, we don't work efficiently when pulling clones in parallel. We should probably serialize on each object_t so that we don't waste disk space. Recovery will probably still be faster.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Sage Weil about 12 years ago

sage@metropolis.ceph.dreamhost.com:osd.log.badpushpull

shows the (or similar) badness. workload was


kernel:
  branch: master
interactive-on-error: true

roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - client.0
  - osd.3
  - osd.4
  - osd.5
tasks:
- ceph:
    btrfs: 1
    log-whitelist:
    - wrongly marked me down or wrong addr
    conf:
      osd:
        debug ms: 1
        debug osd: 20
      mon:
        debug ms: 10
        debug mon: 20
- thrashosds:
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      write: 100
      snap_create: 50
      snap_remove: 50
      snap_rollback: 50
    ops: 4000

on 2116f012eddfe3278fcdfeb5a2ddc877491d210d

Actions

Copy link