Actions
Bug #9209
closedosd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0)
% Done:
100%
Source:
Development
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Using
$ ceph --version ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)
The following teuthology job is scheduled
os_type: ubuntu os_version: '14.04' nuke-on-error: false overrides: ceph: conf: global: osd heartbeat grace: 40 mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - scrub mismatch - ScrubResult ceph-deploy: branch: dev: next conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: branch: master roles: - - mon.a - osd.0 - osd.1 - osd.2 - osd.3 - - mon.b - mon.c - osd.4 - osd.5 - osd.6 - osd.7 - - client.0 - osd.8 - osd.9 - osd.10 - osd.11 - osd.12 - osd.13 - osd.14 - osd.15 - osd.16 - osd.17 suite_path: /home/loic/software/ceph/ceph-qa-suite tasks: - install: branch: master - ceph: fs: xfs - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 min_in: 10 timeout: 1200 - rados: clients: [client.0] ops: 4000 objects: 500 ec_pool: true erasure_code_profile: plugin: jerasure k: 6 m: 2 technique: reed_sol_van ruleset-failure-domain: osd op_weights: read: 45 write: 0 append: 45 delete: 10
And crashes three OSDs with the following
osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0) ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb6a24b] 2: ceph-osd() [0x9ea323] 3: (ECBackend::start_read_op(int, std::map<hobject_t, ECBackend::read_request_t, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECBackend::read_request_t> > >&, std::tr1::shared_ptr<OpRequest>)+0x1019) [0x9f3509] 4: (ECBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x624) [0x9f3d54] 5: (ECBackend::run_recovery_op(PGBackend::RecoveryHandle*, int)+0x2d1) [0x9fb331] 6: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0xaf9) [0x8538c9] 7: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x885acb] 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x28b) [0x688a8b] 9: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x6e80e7] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb5b3d6] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb5c480] 12: (()+0x8182) [0x7f9fd315f182] 13: (clone()+0x6d) [0x7f9fd16cb38d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Which is presumably a side effect of a failure to get the required number of OSDs
1.3 8 0 0 0 0 9691968 34 34 active+clean 2014-08-23 17:44:43.640373 45'34 45:117 [2147483647,15,10,13,6,2147483647,12,14] 15 [2147483647,15,10,13,6,2147483647,12,14] 15 0'0 2014-08-23 17:42:27.682870 0'0 2014-08-23 17:42:27.682870
using the generated ruleset
$ ceph osd crush rule dump unique_pool_0 { "rule_id": 1, "rule_name": "unique_pool_0", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 20, "steps": [ { "op": "set_chooseleaf_tries", "num": 5}, { "op": "take", "item": -1, "item_name": "default"}, { "op": "choose_indep", "num": 0, "type": "osd"}, { "op": "emit"}]}
for a pool of size 8
pool 1 'unique_pool_0' erasure size 8 min_size 6 crush_ruleset 1 object_hash rjenkins pg_num 26 pgp_num 16 last_change 18 flags hashpspool stripe_width 4224 max_osd 18
Files
Actions