Bug #5873: osd: unfound object from thrashing when all osds are up - Ceph - Ceph

Actions

Copy link

Bug #5873

closed

osd: unfound object from thrashing when all osds are up

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:

Duplicate

Priority:

Urgent

Assignee:

Samuel Just

Category:

OSD

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

ceph ubuntu@plana26:~$ ceph osd dump
epoch 24
fsid a235a407-d6c3-40cb-9a2d-e02877f6a370
created 2013-08-03 09:55:28.662129
modified 2013-08-03 09:57:06.624303
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 24 pgp_num 24 last_change 7 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 34 pgp_num 24 last_change 16 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 34 pgp_num 24 last_change 13 owner 0

max_osd 6
osd.0 up   out weight 0 up_from 3 up_thru 4 down_at 0 last_clean_interval [0,0) 10.214.131.14:6801/30903 10.214.131.14:6803/30903 10.214.131.14:6805/30903 10.214.131.14:6807/30903 exists,up 01aba9ee-208b-4584-8894-8959bf0f43e6
osd.1 up   out weight 0 up_from 3 up_thru 8 down_at 0 last_clean_interval [0,0) 10.214.131.14:6800/30902 10.214.131.14:6802/30902 10.214.131.14:6804/30902 10.214.131.14:6806/30902 exists,up c098f8ad-e4e5-4418-982f-f09441395ae7
osd.2 up   in  weight 1 up_from 2 up_thru 23 down_at 0 last_clean_interval [0,0) 10.214.131.14:6808/30904 10.214.131.14:6809/30904 10.214.131.14:6810/30904 10.214.131.14:6811/30904 exists,up 8a977a93-30e3-4a4c-a9ea-2c871338bc22
osd.3 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6804/25399 10.214.132.38:6805/25399 10.214.132.38:6806/25399 10.214.132.38:6807/25399 exists,up 96836c48-dabb-4794-863a-8b2e8191d326
osd.4 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6800/25397 10.214.132.38:6801/25397 10.214.132.38:6802/25397 10.214.132.38:6803/25397 exists,up 551d1bf2-8c3c-45bd-b472-aef32b061530
osd.5 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6808/25403 10.214.132.38:6809/25403 10.214.132.38:6810/25403 10.214.132.38:6811/25403 exists,up b6ca68f7-bd89-4e98-a288-00dff5db7ea2

ubuntu@plana26:~$ ceph -s
  cluster a235a407-d6c3-40cb-9a2d-e02877f6a370
   health HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; 27 requests are blocked > 32 sec; recovery 838/29008 degraded (2.889%); 1/14504 unfound (0.007%)
   monmap e1: 3 mons at {a=10.214.131.14:6789/0,b=10.214.132.38:6789/0,c=10.214.131.14:6790/0}, election epoch 6, quorum 0,1,2 a,b,c
   osdmap e24: 6 osds: 6 up, 4 in
    pgmap v806: 92 pgs: 91 active+clean, 1 active+recovering; 57932 MB data, 114 GB used, 2495 GB / 2749 GB avail; 838/29008 degraded (2.889%); 1/14504 unfound (0.007%)
   mdsmap e5: 1/1/1 up {0=a=up:active}

ubuntu@teuthology:/a/teuthology-2013-08-02_01:00:11-rados-next-testing-basic-plana/93499$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 05542c395ce50bb1750cc6fead85727903fc3e72
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  s3tests:
    branch: next
  workunit:
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 3
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800
teuthology_branch: next

cluster is still running

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #5873

osd: unfound object from thrashing when all osds are up

Updated by Sage Weil over 10 years ago

Updated by Ian Colle over 10 years ago

Updated by Samuel Just over 10 years ago