Bug #9209: osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0) - Ceph - Ceph

Actions

Copy link

Bug #9209

closed

osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0)

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Loïc Dachary

Category:

common

Target version:

0.85 cont.

% Done:

100%

Source:

Development

Tags:

Backport:

firefly

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Using

$ ceph --version
ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)

The following teuthology job is scheduled

os_type: ubuntu
os_version: '14.04'
nuke-on-error: false
overrides:
  ceph:
    conf:
      global:
        osd heartbeat grace: 40
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      branch: master
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
  - osd.3
- - mon.b
  - mon.c
  - osd.4
  - osd.5
  - osd.6
  - osd.7
- - client.0
  - osd.8
  - osd.9
  - osd.10
  - osd.11
  - osd.12
  - osd.13
  - osd.14
  - osd.15
  - osd.16
  - osd.17
suite_path: /home/loic/software/ceph/ceph-qa-suite
tasks:
- install:
    branch: master
- ceph:
    fs: xfs
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    min_in: 10
    timeout: 1200
- rados:
    clients: [client.0]
    ops: 4000
    objects: 500
    ec_pool: true
    erasure_code_profile:
      plugin: jerasure
      k: 6
      m: 2
      technique: reed_sol_van
      ruleset-failure-domain: osd
    op_weights:
      read: 45
      write: 0
      append: 45
      delete: 10

And crashes three OSDs with the following

osd/ECUtil.h: 66: FAILED assert(offset % stripe_width == 0)

 ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb6a24b]
 2: ceph-osd() [0x9ea323]
 3: (ECBackend::start_read_op(int, std::map<hobject_t, ECBackend::read_request_t, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECBackend::read_request_t> > >&, std::tr1::shared_ptr<OpRequest>)+0x1019) [0x9f3509]
 4: (ECBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x624) [0x9f3d54]
 5: (ECBackend::run_recovery_op(PGBackend::RecoveryHandle*, int)+0x2d1) [0x9fb331]
 6: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0xaf9) [0x8538c9]
 7: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x885acb]
 8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x28b) [0x688a8b]
 9: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x6e80e7]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb5b3d6]
 11: (ThreadPool::WorkThread::entry()+0x10) [0xb5c480]
 12: (()+0x8182) [0x7f9fd315f182]
 13: (clone()+0x6d) [0x7f9fd16cb38d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Which is presumably a side effect of a failure to get the required number of OSDs

1.3    8    0    0    0    0    9691968    34    34    active+clean    2014-08-23 17:44:43.640373    45'34    45:117    [2147483647,15,10,13,6,2147483647,12,14]    15    [2147483647,15,10,13,6,2147483647,12,14]    15    0'0    2014-08-23 17:42:27.682870    0'0    2014-08-23 17:42:27.682870

using the generated ruleset

$ ceph osd crush rule dump unique_pool_0
{ "rule_id": 1,
  "rule_name": "unique_pool_0",
  "ruleset": 1,
  "type": 3,
  "min_size": 3,
  "max_size": 20,
  "steps": [
        { "op": "set_chooseleaf_tries",
          "num": 5},
        { "op": "take",
          "item": -1,
          "item_name": "default"},
        { "op": "choose_indep",
          "num": 0,
          "type": "osd"},
        { "op": "emit"}]}

for a pool of size 8

pool 1 'unique_pool_0' erasure size 8 min_size 6 crush_ruleset 1 object_hash rjenkins pg_num 26 pgp_num 16 last_change 18 flags hashpspool stripe_width 4224
max_osd 18

Files

ceph-osd.1.log (10.3 MB) ceph-osd.1.log

Loïc Dachary, 08/23/2014 11:23 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

The same YAML file run against firefly 0.80.5-171-gca3ac90-1trusty instead of master succeeds.

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Status changed from New to 12

The job above with k=2,m=1 passes

    erasure_code_profile:
      plugin: jerasure
      k: 2
      m: 1
      technique: reed_sol_van
      ruleset-failure-domain: osd

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Assignee set to Loïc Dachary

The teuthology job re-creating the problem is running on teuthology.front.sepia.com in screen -x -r 17865.loic

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

The stripe width for k=6,m=2 is 4224 instead of the 4096 default. It probably breaks a requirement somewhere.

pool 2 'pool-jerasure' erasure size 8 min_size 6 crush_ruleset 2 object_hash rjenkins pg_num 12 pgp_num 12 last_change 64 flags hashpspool stripe_width 4224

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

2014-08-25 13:15:52.293417 7fa8e74b6700 10 osd.1 pg_epoch: 13 pg[1.7s0( v 13'9 lc 11'1 (0'0,13'9] local-les=13 n=4 ec=10 les/c 13/11 12/12/12) [1,6,11,16,4,2,12,9] r=0 lpr=12 pi=10-11/1 luod=11'7 rops=3 crt=11'7 lcod 0'0 mlcod 0'0 active+recovering+degraded m=3] start_read_op: starting ReadOp(tid=55, to_read={c04f7d07/vpm03011797-46/head//1=read_request_t(to_read=[0,1048576], need=2(5),4(4),6(1),11(2),12(6),16(3), want_attrs=1),3dbfd9e7/vpm03011797-31/head//1=read_request_t(to_read=[0,1048576], need=2(5),4(4),6(1),11(2),12(6),16(3), want_attrs=1),133a4af7/vpm03011797-30/head//1=read_request_t(to_read=[0,1048576], need=2(5),4(4),6(1),11(2),12(6),16(3), want_attrs=1)}, complete={}, priority=10, obj_to_source={}, source_to_obj={}, in_progress=)

to_read=[0,1048576] will call offset_len_to_stripe_bounds and fail because 1048576=1024*1024 % stripe_width=4224 != 0

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

In the logs RecoveryOp::IDLE shows

   -86> 2014-08-25 13:15:52.292969 7fa8e74b6700 10 osd.1 pg_epoch: 13 pg[1.7s0( v 13'9 lc 11'1 (0'0,13'9] local-les=13 n=4 ec=10 les/c 13/11 12/12/12) [1,6,11,16,4,2,12,9] r=0 lpr=12 pi=10-11/1 luod=11'7 rops=3 crt=11'7 lcod 0'0 mlcod 0'0 active+recovering+degraded m=3] continue_recovery_op: IDLE return RecoveryOp(hoid=3dbfd9e7/vpm03011797-31/head//1 v=11'5 missing_on=1(0) missing_on_shards=0 recovery_info=ObjectRecoveryInfo(3dbfd9e7/vpm03011797-31/head//1@11'5, copy_subset: [], clone_subset: {}) recovery_progress=ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true) pending_read=0 obc refcount=0 state=READING waiting_on_pushes= extent_requested=0,1048576

with extent_requested=0,1048576 although

ceph --admin-daemon /var/run/ceph/ceph-mon.a.asok config get osd_recovery_max_chunk
{ "osd_recovery_max_chunk": "8388608"}

and

pool 1 'unique_pool_0' erasure size 8 min_size 6 crush_ruleset 1 object_hash rjenkins pg_num 16 pgp_num 16 last_change 10 flags hashpspool stripe_width 4224

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

ROUND_UP_TO only works with powers of 2.

$ perl -e '$n = 1 * 1024 * 1024 + 1; $d = 4224; $v = ((($n)+($d)-1) & ~(($d)-1)); print "$n => $v\n" ; print $v % $d ; print "\n"'
1048577 => 1048704
1152

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Teuthology must have changed the default recovery chunk for the OSDs at runtime because

sudo ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config get osd_recovery_max_chunk
{ "osd_recovery_max_chunk": "1048576"}

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Status changed from 12 to Fix Under Review

With the proposed patch the above workload passes. An inspection of the OSD logs shows that recovery progress such as

2014-08-25 16:22:58.010631 7fd15ef15700 10 osd.1 pg_epoch: 299 pg[1.9s0( v 166'114 (0'0,166'114] local-les=292 n=3 ec=8 les/c 292/236 291/291/259) [1,12,15,5,10,2,11,6] r=0 lpr=291 pi=8-290/17 rops=3 crt=166'114 mlcod 0'0 active+recovering+degraded] continue_recovery_op: WRITING continue RecoveryOp(hoid=11402609/vpm03019476-166/head//1 v=166'114 missing_on=12(1) missing_on_shards=1 recovery_info=ObjectRecoveryInfo(11402609/vpm03019476-166/head//1@166'114, copy_subset: [], clone_subset: {}) recovery_progress=ObjectRecoveryProgress(!first, data_recovered_to:2103552, data_complete:false, omap_recovered_to:, omap_complete:true) pending_read=0 obc refcount=2 state=IDLE waiting_on_pushes= extent_requested=1051776,1051776

now has extent_requested=1051776,1051776 and other similar lines where 1051776 is 1024*1024 rounded to 4224

Actions

Copy link

#10