Project

General

Profile

Actions

Bug #5873

closed

osd: unfound object from thrashing when all osds are up

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph ubuntu@plana26:~$ ceph osd dump
epoch 24
fsid a235a407-d6c3-40cb-9a2d-e02877f6a370
created 2013-08-03 09:55:28.662129
modified 2013-08-03 09:57:06.624303
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 24 pgp_num 24 last_change 7 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 34 pgp_num 24 last_change 16 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 34 pgp_num 24 last_change 13 owner 0

max_osd 6
osd.0 up   out weight 0 up_from 3 up_thru 4 down_at 0 last_clean_interval [0,0) 10.214.131.14:6801/30903 10.214.131.14:6803/30903 10.214.131.14:6805/30903 10.214.131.14:6807/30903 exists,up 01aba9ee-208b-4584-8894-8959bf0f43e6
osd.1 up   out weight 0 up_from 3 up_thru 8 down_at 0 last_clean_interval [0,0) 10.214.131.14:6800/30902 10.214.131.14:6802/30902 10.214.131.14:6804/30902 10.214.131.14:6806/30902 exists,up c098f8ad-e4e5-4418-982f-f09441395ae7
osd.2 up   in  weight 1 up_from 2 up_thru 23 down_at 0 last_clean_interval [0,0) 10.214.131.14:6808/30904 10.214.131.14:6809/30904 10.214.131.14:6810/30904 10.214.131.14:6811/30904 exists,up 8a977a93-30e3-4a4c-a9ea-2c871338bc22
osd.3 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6804/25399 10.214.132.38:6805/25399 10.214.132.38:6806/25399 10.214.132.38:6807/25399 exists,up 96836c48-dabb-4794-863a-8b2e8191d326
osd.4 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6800/25397 10.214.132.38:6801/25397 10.214.132.38:6802/25397 10.214.132.38:6803/25397 exists,up 551d1bf2-8c3c-45bd-b472-aef32b061530
osd.5 up   in  weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6808/25403 10.214.132.38:6809/25403 10.214.132.38:6810/25403 10.214.132.38:6811/25403 exists,up b6ca68f7-bd89-4e98-a288-00dff5db7ea2

ubuntu@plana26:~$ ceph -s
  cluster a235a407-d6c3-40cb-9a2d-e02877f6a370
   health HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; 27 requests are blocked > 32 sec; recovery 838/29008 degraded (2.889%); 1/14504 unfound (0.007%)
   monmap e1: 3 mons at {a=10.214.131.14:6789/0,b=10.214.132.38:6789/0,c=10.214.131.14:6790/0}, election epoch 6, quorum 0,1,2 a,b,c
   osdmap e24: 6 osds: 6 up, 4 in
    pgmap v806: 92 pgs: 91 active+clean, 1 active+recovering; 57932 MB data, 114 GB used, 2495 GB / 2749 GB avail; 838/29008 degraded (2.889%); 1/14504 unfound (0.007%)
   mdsmap e5: 1/1/1 up {0=a=up:active}

ubuntu@teuthology:/a/teuthology-2013-08-02_01:00:11-rados-next-testing-basic-plana/93499$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 05542c395ce50bb1750cc6fead85727903fc3e72
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  s3tests:
    branch: next
  workunit:
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 3
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800
teuthology_branch: next

cluster is still running

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #5799: SIGABRT in build_push_op -> object_info_t::decodeResolvedSamuel Just07/29/2013

Actions
Actions

Also available in: Atom PDF