Project

General

Profile

Bug #3428

osdc/Striper.cc: 275: FAILED assert(p->first == end - p->second.second)

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
librbd
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2012-10-30T15:13:01.009 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: In function 'void Striper::StripedReadResult::assemble_result(CephContext*, ceph::bufferlist&, bool)' thread 7f3c9e7fc700 time 2012-10-30 15:12:52.503629
2012-10-30T15:13:01.009 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: 275: FAILED assert(p->first == end - p->second.second)
2012-10-30T15:13:01.012 INFO:teuthology.orchestra.run.err: ceph version 0.53-541-g184a676 (184a676e64cdcd125f06904aa0fb93727381f1a5)
2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 1: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x351) [0x7f3ca9a74551]
2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 2: (librbd::AioCompletion::complete_request(CephContext*, long)+0x779) [0x7f3ca9a2d639]
2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 3: (librbd::C_AioRead::finish(int)+0x191) [0x7f3ca9a2dfd1]
2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 4: (Context::complete(int)+0x12) [0x7f3ca9a2e492]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 5: (librbd::C_CacheRead::finish(int)+0x16) [0x7f3ca9a2ce96]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0x12) [0x7f3ca9a2e492]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 7: (librbd::ImageCtx::aio_read_from_cache(object_t, ceph::buffer::list*, unsigned long, unsigned long, Context*)+0x2f4) [0x7f3ca9a3fb24]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 8: (librbd::aio_read(librbd::ImageCtx*, std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > > const&, char*, ceph::buffer::list*, librbd::AioCompl
etion*)+0x941) [0x7f3ca9a56801]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 9: (librbd::AioRequest::read_from_parent(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&)+0x9a) [0x7f3ca9a2ea1a]
2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 10: (librbd::AbstractWrite::should_complete(int)+0x9d5) [0x7f3ca9a301b5]
2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 11: (librbd::rados_req_cb(void*, void*)+0x27) [0x7f3ca9a42957]
2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 12: (librados::C_AioSafe::finish(int)+0x1e) [0x7f3ca8dfbdae]
2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 13: (Finisher::finisher_thread_entry()+0x218) [0x7f3ca8e64bd8]
2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 14: (()+0x7e9a) [0x7f3ca8402e9a]
2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 15: (clone()+0x6d) [0x7f3ca870a4bd]
2012-10-30T15:13:01.016 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

task is

kernel: &id001
  kdb: true
  sha1: 0ed7285e0001b960c888e5455ae982025210ed3d
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
      global:
        ms inject socket failures: 5000
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 184a676e64cdcd125f06904aa0fb93727381f1a5
  s3tests:
    branch: master
  workunit:
    sha1: 184a676e64cdcd125f06904aa0fb93727381f1a5
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana36.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe7CpJbnd7W2/n42TTTjDArnVkyZfbRANfmkdgfDM+6AYg6qd9wUhes6LP++eMvhuM96Sz5W4380o8OME0cguG1LkkADbm8pQbPAPZwF1Fj28YxgZKpc2PTPsF+sjOujC+AaXaQ82ffSkLL0oElKZgAiFEGCytSdUNFHZxjztDIOoWlt7kylQCy4sJCEbND8JFwFfeGyyePvMl3CNdbnR7H5GuyIx70iglLBO/XFwArjeOUZ/FboRZWOBivpZQf9IMy8k2rQetzxTyugd7cTVdq1G5N5NeHpbQfv286G2oDaZj1HT252jDF04UP083zMxH1W9gmOoUKzIhl+iXaNLZ
  ubuntu@plana39.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9
  ubuntu@plana40.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

ubuntu@teuthology:/a/sage-2012-10-30_13:31:12-regression-master-testing-basic/5723
ubuntu@teuthology:/a/sage-2012-10-30_13:31:12-regression-master-testing-basic/5728

Related issues

Related to Ceph - Bug #3567: librbd: striping completions modifying global state outside lock Resolved 12/03/2012

Associated revisions

Revision b35e37fb (diff)
Added by Sage Weil over 11 years ago

osdc/Striper: fix handling for sparse reads in add_partial_sparse_result()

If bl_map begins after the first item in buffer_extents, we want to
skip only the first buffer extent before doing 'continue' to loop to the
next one.

This fixes a crash caused by underflow with a pattern like:

2012-11-20 13:54:30.347861 7f9404ed6700 10 striper add_partial_sparse_result(0x1efa088) 192 covering {12288=192} (offset 2906) to [0,5286,38054,4288]
2012-11-20 13:54:30.347863 7f9404ed6700 20 striper t 0~5286 bl has 192 off 2906
2012-11-20 13:54:30.347866 7f9404ed6700 20 striper s gap 9382, skipping
2012-11-20 13:54:30.347867 7f9404ed6700 20 striper s has 192, copying
2012-11-20 13:54:30.347872 7f9404ed6700 20 striper t 9574~18446744073709547328 bl has 0 off 12480
2012-11-20 13:54:30.347874 7f9404ed6700 20 striper s at end
2012-11-20 13:54:30.347876 7f9404ed6700 20 striper t 38054~4288 bl has 0 off 12480
2012-11-20 13:54:30.347877 7f9404ed6700 20 striper s at end

Dan reproduced this with

./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx

(although I was unable to do so).

Re-fixes #3428.

Reported-and-tested-by: Dan Mick <>
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Resolved

fixed in wip-rbd-read branch, commit:40a5cebe26c216443a9c19477287644c22c5a845

#2 Updated by Dan Mick over 11 years ago

Running test_librbd_fsx I encountered this again (with master, including
the changes in wip-rbd-read):

272 read 0x6b5a thru 0x110bf (0xa566 bytes)
osdc/Striper.cc: In function 'void Striper::StripedReadResult::assemble_result(CephContext*, ceph::bufferlist&, bool)' thread 7fa644870700 time 2012-11-19 12:38:29.464006
osdc/Striper.cc: 278: FAILED assert(p->first == end - p->second.second)
ceph version 0.54-669-gde12ae9 (de12ae9803e7aeb19f94aa4c9c587b54b12e5c32)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7fa64a455569]
2: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x3c5) [0x7fa64b2e440d]
3: (librbd::AioCompletion::finalize(CephContext*, long)+0x1ec) [0x7fa64b27ff5a]
4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x295) [0x7fa64b2805a7]
5: (librbd::C_AioRead::finish(int)+0x3f2) [0x7fa64b2809e4]
6: (Context::complete(int)+0x2b) [0x7fa64b280ea9]
7: (librbd::AioRequest::complete(int)+0x6f) [0x7fa64b280f39]
8: (librbd::rados_req_cb(void*, void*)+0x34) [0x7fa64b2b0e14]
9: (librados::C_AioComplete::finish(int)+0x4d) [0x7fa64a3dde79]
10: (Finisher::finisher_thread_entry()+0x348) [0x7fa64a454554]
11: (Finisher::FinisherThread::entry()+0x1c) [0x7fa64a3c57d0]
12: (Thread::_entry_func(void*)+0x23) [0x7fa64a601011]
13: (()+0x7e9a) [0x7fa6493bee9a]
14: (clone()+0x6d) [0x7fa6496c7cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted

#3 Updated by Dan Mick over 11 years ago

...and this seems to reproduce it so far:

./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx

#4 Updated by Sage Weil over 11 years ago

  • Status changed from Resolved to In Progress

#5 Updated by Sage Weil over 11 years ago

  • Status changed from In Progress to Resolved

commit:ba1e2cc68b236e2cd79dc2c5821ed47080746a86

#7 Updated by Tamilarasi muthamizhan over 11 years ago

  • Status changed from Resolved to In Progress
  • Assignee changed from Sage Weil to Dan Mick

ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8665

#8 Updated by Tamilarasi muthamizhan over 11 years ago

Recent logs: ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8664

2012-12-04T23:44:34.023 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: 281: FAILED assert(p->first == end - p->second.second)
2012-12-04T23:44:34.023 INFO:teuthology.orchestra.run.err: ceph version 0.55-135-g4cdc30b (4cdc30b94319ba1ef72bd2d976651dc889793d61)
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f6dea4b698d]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 2: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x38f) [0x7f6deb30e725]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::finalize(CephContext*, long)+0x1d7) [0x7f6deb2ae8af]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x27a) [0x7f6deb2aeea2]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x3c8) [0x7f6deb2af2b2]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0x2b) [0x7f6deb2af707]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 7: (librbd::AioRequest::complete(int)+0x6f) [0x7f6deb2af797]
2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 8: (librbd::rbd_req_cb(void*, void*)+0x3c) [0x7f6deb2debf0]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 9: (librbd::AioCompletion::complete()+0xd4) [0x7f6deb2af906]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 10: (librbd::AioCompletion::complete_request(CephContext*, long)+0x289) [0x7f6deb2aeeb1]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 11: (librbd::C_AioRead::finish(int)+0x3c8) [0x7f6deb2af2b2]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 12: (Context::complete(int)+0x2b) [0x7f6deb2af707]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 13: (librbd::AioRequest::complete(int)+0x6f) [0x7f6deb2af797]
2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 14: (librbd::C_CacheRead::finish(int)+0x24) [0x7f6deb2af31e]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 15: (Context::complete(int)+0x2b) [0x7f6deb2af707]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 16: (ObjectCacher::C_RetryRead::finish(int)+0xb4) [0x7f6deb2fd1b6]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 17: (Context::complete(int)+0x2b) [0x7f6deb2af707]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 18: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x267) [0x7f6deb2fa7b5]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 19: (ObjectCacher::bh_read_finish(long, sobject_t, long, unsigned long, ceph::buffer::list&, int)+0x13af) [0x7f6deb2ee785]
2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 20: (ObjectCacher::C_ReadFinish::finish(int)+0x6e) [0x7f6deb2fcf10]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 21: (Context::complete(int)+0x2b) [0x7f6deb2af707]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 22: (librbd::C_Request::finish(int)+0x142) [0x7f6deb2e73f4]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 23: (librbd::context_cb(void*, void*)+0x40) [0x7f6deb2e693c]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 24: (librados::C_AioComplete::finish(int)+0x4d) [0x7f6dea440b41]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 25: (Finisher::finisher_thread_entry()+0x342) [0x7f6dea4b5afe]
2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 26: (Finisher::FinisherThread::entry()+0x1c) [0x7f6dea42891a]
2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 27: (Thread::_entry_func(void*)+0x23) [0x7f6dea6571b9]
2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 28: (()+0x7e9a) [0x7f6de9424e9a]
2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 29: (clone()+0x6d) [0x7f6de972c4bd]
2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'
ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8664$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 9954d934f23883b95fade3eb56448f2169735c72
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
      global:
        ms inject socket failures: 5000
    fs: ext4
    log-whitelist:
    - slow request
    sha1: 4cdc30b94319ba1ef72bd2d976651dc889793d61
  s3tests:
    branch: next
  workunit:
    sha1: 4cdc30b94319ba1ef72bd2d976651dc889793d61
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana16.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDI5A8ibfCXo5uFpfa+hLZoUoMjRl54060vpyvkYSeFtVz0iX7rCJRkZBqHiwz940yqs2+qnoczg/SVucIFMZSv/c14V/8hhRNHDK7n0lADYF5tkTs9fupeAQSHEPytW/RxOLlNappd1rd7M56rVxeUMPgkVJAdJoQnkXyhw9KqizSj4vFej4qSKUMfaOlXwxCNjmUrtyBdPkk6kOgWODtgOn6d39Tva0hwn7jfeFaYc/ixNPFgiO5um2Fz3XHJFPMWPSLt5X3eEn54ZTO33B6ybYIgoYGn0kVBl+Wks/VTQkky5wJnbS5mghhPNK/+U0KJ5rcQlIgMGgTrBBROcFw3
  ubuntu@plana26.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx0US96hot7gygZ69W4nxJQ9myYnn3I22YtOaSPe+yFWOPJVQUuOST+aw5K6JDcjdO2Gq0aS6s01mgoWpZlO/FVDKss7vZ2KjMp3uPkGMpDZarNbR3QTe5YZYrl7Wfw4pMu4jh92hCWJEzy5nH0H3X2YJhOd5BdOYz0P97qsMSPQGxhlvDBYBhDl9MLgsS3lKm/Js/OPLO+Uf3/SZceCjUqO2m3WsrJSiQJKh8XUWUu3z+6C1Wg6TXSSlA/jdVCiokDg7WYwPN9zMwzzGkGv+GUGHKMZaPGRZb9LQJLTBf/OjwRSgclAVdDc3vnZeYAS5+sDnt2grnJnlBd1rBUj3n
  ubuntu@plana35.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtCEUU7ZjJxMBOYq9dlPZppwlahU15OIq1a/xqI0YIpAF7ZLf1eis04s6Kzvl45ooKP2J6DW598Lr9zTtFb6j9RoAkgIbIOkWZmLQjeZesi//7B8uXBw7niNzU1iBI83K8dPL2cwTsgwXXtoaD03pW9HCv4ADhH12/CsfoqNPfql7ze4JlI07seHmUqkBuBFxLVgdZDXb+Cjdg2sQnMS5qZo0OGBp2yfBla01Pk8V1sHGg39miF0EDJZyIOL2ziZOcm1f02WVkUAD8MMOtfQbf33DT/ya4Th20YtRMQxNLkUYG6rxcjutzILFKDr7ugbvOwtBhVS7qplqLWAwHdi0p
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

#9 Updated by Josh Durgin over 11 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF