Actions
Bug #5873
closedosd: unfound object from thrashing when all osds are up
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ceph ubuntu@plana26:~$ ceph osd dump epoch 24 fsid a235a407-d6c3-40cb-9a2d-e02877f6a370 created 2013-08-03 09:55:28.662129 modified 2013-08-03 09:57:06.624303 flags pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 24 pgp_num 24 last_change 7 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 34 pgp_num 24 last_change 16 owner 0 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 34 pgp_num 24 last_change 13 owner 0 max_osd 6 osd.0 up out weight 0 up_from 3 up_thru 4 down_at 0 last_clean_interval [0,0) 10.214.131.14:6801/30903 10.214.131.14:6803/30903 10.214.131.14:6805/30903 10.214.131.14:6807/30903 exists,up 01aba9ee-208b-4584-8894-8959bf0f43e6 osd.1 up out weight 0 up_from 3 up_thru 8 down_at 0 last_clean_interval [0,0) 10.214.131.14:6800/30902 10.214.131.14:6802/30902 10.214.131.14:6804/30902 10.214.131.14:6806/30902 exists,up c098f8ad-e4e5-4418-982f-f09441395ae7 osd.2 up in weight 1 up_from 2 up_thru 23 down_at 0 last_clean_interval [0,0) 10.214.131.14:6808/30904 10.214.131.14:6809/30904 10.214.131.14:6810/30904 10.214.131.14:6811/30904 exists,up 8a977a93-30e3-4a4c-a9ea-2c871338bc22 osd.3 up in weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6804/25399 10.214.132.38:6805/25399 10.214.132.38:6806/25399 10.214.132.38:6807/25399 exists,up 96836c48-dabb-4794-863a-8b2e8191d326 osd.4 up in weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6800/25397 10.214.132.38:6801/25397 10.214.132.38:6802/25397 10.214.132.38:6803/25397 exists,up 551d1bf2-8c3c-45bd-b472-aef32b061530 osd.5 up in weight 1 up_from 3 up_thru 22 down_at 0 last_clean_interval [0,0) 10.214.132.38:6808/25403 10.214.132.38:6809/25403 10.214.132.38:6810/25403 10.214.132.38:6811/25403 exists,up b6ca68f7-bd89-4e98-a288-00dff5db7ea2 ubuntu@plana26:~$ ceph -s cluster a235a407-d6c3-40cb-9a2d-e02877f6a370 health HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; 27 requests are blocked > 32 sec; recovery 838/29008 degraded (2.889%); 1/14504 unfound (0.007%) monmap e1: 3 mons at {a=10.214.131.14:6789/0,b=10.214.132.38:6789/0,c=10.214.131.14:6790/0}, election epoch 6, quorum 0,1,2 a,b,c osdmap e24: 6 osds: 6 up, 4 in pgmap v806: 92 pgs: 91 active+clean, 1 active+recovering; 57932 MB data, 114 GB used, 2495 GB / 2749 GB avail; 838/29008 degraded (2.889%); 1/14504 unfound (0.007%) mdsmap e5: 1/1/1 up {0=a=up:active} ubuntu@teuthology:/a/teuthology-2013-08-02_01:00:11-rados-next-testing-basic-plana/93499$ cat orig.config.yaml kernel: kdb: true sha1: 05542c395ce50bb1750cc6fead85727903fc3e72 machine_type: plana nuke-on-error: true os_type: ubuntu overrides: admin_socket: branch: next ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 1 debug paxos: 20 fs: ext4 log-whitelist: - slow request sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae ceph-deploy: branch: dev: next conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 install: ceph: sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae s3tests: branch: next workunit: sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 3 chance_pgpnum_fix: 1 timeout: 1200 - radosbench: clients: - client.0 time: 1800 teuthology_branch: next
cluster is still running
Actions