Bug #2061: osd: scrub mismatch - Ceph - Ceph

Actions

Copy link

Bug #2061

closed

osd: scrub mismatch

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

OSD

Target version:

v0.43

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

New one, "[ERR] 0.c scrub stat mismatch, got 6/6 objects, 2/5 clones, 13511948/13511948 bytes."

Workload was

  kernel:
    sha1: eda84b58922928516e6e62af85430b7c9705b6cf
  nuke-on-error: true
  overrides:
    ceph:
      coverage: true
      fs: xfs
      log-whitelist:
      - clocks not synchronized
      - old request
      sha1: b54bac3061666b1c781351154b1f3d78242709ec
  roles:
  - - mon.a
    - osd.0
    - osd.1
    - osd.2
  - - mds.a
    - client.0
    - osd.3
    - osd.4
    - osd.5
  tasks:
  - chef: null
  - ceph:
      log-whitelist:
      - wrongly marked me down or wrong addr
  - thrashosds: null
  - rados:
      clients:
      - client.0
      objects: 50
      op_weights:
        delete: 50
        read: 100
        snap_create: 50
        snap_remove: 50
        snap_rollback: 50
        write: 100
      ops: 4000

and "[ERR] 0.a backfill osd.1 stat mismatch on finish: num_bytes 512000 != expected 516096"

  kernel:
    sha1: eda84b58922928516e6e62af85430b7c9705b6cf
  nuke-on-error: true
  overrides:
    ceph:
      coverage: true
      fs: btrfs
      log-whitelist:
      - clocks not synchronized
      - old request
      sha1: b54bac3061666b1c781351154b1f3d78242709ec
  roles:
  - - mon.a
    - mds.a
    - osd.0
    - osd.1
  - - mon.b
    - mon.c
    - osd.2
  tasks:
  - chef: null
  - ceph:
      conf:
        osd:
          osd min pg log entries: 5
      log-whitelist:
      - wrongly marked me down or wrong addr
  - backfill: null

Actions

Copy link

Updated by Sage Weil about 12 years ago

oooooh, these went away and i was confused. but hten i just ran the regression suite against next and hit them again


12424: collection:thrash clusters:6-osd-2-machine.yaml fs:btrfs.yaml
thrashers:default.yaml workloads:snaps-few-objects.yaml
    "2012-02-17 13:53:34.665946 osd.3 10.3.14.174:6800/9147 140 : [ERR] 0.12
scrub stat mismatch, got 14/14 objects, 11/14 clones, 25710376/25710376 bytes." 
in cluster log
12429: collection:thrash clusters:6-osd-2-machine.yaml fs:xfs.yaml
thrashers:default.yaml workloads:snaps-few-objects.yaml
    "2012-02-17 14:17:32.195973 osd.0 10.3.14.175:6806/17153 53 : [ERR] 0.e
scrub stat mismatch, got 42/42 objects, 35/37 clones, 71067132/71067132 bytes." 
in cluster log

i bet these were fixed by the recovery refactor.

Actions

Copy link