Bug #3428
osdc/Striper.cc: 275: FAILED assert(p->first == end - p->second.second)
0%
Description
2012-10-30T15:13:01.009 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: In function 'void Striper::StripedReadResult::assemble_result(CephContext*, ceph::bufferlist&, bool)' thread 7f3c9e7fc700 time 2012-10-30 15:12:52.503629 2012-10-30T15:13:01.009 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: 275: FAILED assert(p->first == end - p->second.second) 2012-10-30T15:13:01.012 INFO:teuthology.orchestra.run.err: ceph version 0.53-541-g184a676 (184a676e64cdcd125f06904aa0fb93727381f1a5) 2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 1: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x351) [0x7f3ca9a74551] 2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 2: (librbd::AioCompletion::complete_request(CephContext*, long)+0x779) [0x7f3ca9a2d639] 2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 3: (librbd::C_AioRead::finish(int)+0x191) [0x7f3ca9a2dfd1] 2012-10-30T15:13:01.013 INFO:teuthology.orchestra.run.err: 4: (Context::complete(int)+0x12) [0x7f3ca9a2e492] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 5: (librbd::C_CacheRead::finish(int)+0x16) [0x7f3ca9a2ce96] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0x12) [0x7f3ca9a2e492] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 7: (librbd::ImageCtx::aio_read_from_cache(object_t, ceph::buffer::list*, unsigned long, unsigned long, Context*)+0x2f4) [0x7f3ca9a3fb24] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 8: (librbd::aio_read(librbd::ImageCtx*, std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > > const&, char*, ceph::buffer::list*, librbd::AioCompl etion*)+0x941) [0x7f3ca9a56801] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 9: (librbd::AioRequest::read_from_parent(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > >&)+0x9a) [0x7f3ca9a2ea1a] 2012-10-30T15:13:01.014 INFO:teuthology.orchestra.run.err: 10: (librbd::AbstractWrite::should_complete(int)+0x9d5) [0x7f3ca9a301b5] 2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 11: (librbd::rados_req_cb(void*, void*)+0x27) [0x7f3ca9a42957] 2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 12: (librados::C_AioSafe::finish(int)+0x1e) [0x7f3ca8dfbdae] 2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 13: (Finisher::finisher_thread_entry()+0x218) [0x7f3ca8e64bd8] 2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 14: (()+0x7e9a) [0x7f3ca8402e9a] 2012-10-30T15:13:01.015 INFO:teuthology.orchestra.run.err: 15: (clone()+0x6d) [0x7f3ca870a4bd] 2012-10-30T15:13:01.016 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. task is kernel: &id001 kdb: true sha1: 0ed7285e0001b960c888e5455ae982025210ed3d nuke-on-error: true overrides: ceph: conf: client: rbd cache: true global: ms inject socket failures: 5000 fs: btrfs log-whitelist: - slow request sha1: 184a676e64cdcd125f06904aa0fb93727381f1a5 s3tests: branch: master workunit: sha1: 184a676e64cdcd125f06904aa0fb93727381f1a5 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana36.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe7CpJbnd7W2/n42TTTjDArnVkyZfbRANfmkdgfDM+6AYg6qd9wUhes6LP++eMvhuM96Sz5W4380o8OME0cguG1LkkADbm8pQbPAPZwF1Fj28YxgZKpc2PTPsF+sjOujC+AaXaQ82ffSkLL0oElKZgAiFEGCytSdUNFHZxjztDIOoWlt7kylQCy4sJCEbND8JFwFfeGyyePvMl3CNdbnR7H5GuyIx70iglLBO/XFwArjeOUZ/FboRZWOBivpZQf9IMy8k2rQetzxTyugd7cTVdq1G5N5NeHpbQfv286G2oDaZj1HT252jDF04UP083zMxH1W9gmOoUKzIhl+iXaNLZ ubuntu@plana39.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9 ubuntu@plana40.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000 ubuntu@teuthology:/a/sage-2012-10-30_13:31:12-regression-master-testing-basic/5723 ubuntu@teuthology:/a/sage-2012-10-30_13:31:12-regression-master-testing-basic/5728
Related issues
Associated revisions
osdc/Striper: fix handling for sparse reads in add_partial_sparse_result()
If bl_map begins after the first item in buffer_extents, we want to
skip only the first buffer extent before doing 'continue' to loop to the
next one.
This fixes a crash caused by underflow with a pattern like:
2012-11-20 13:54:30.347861 7f9404ed6700 10 striper add_partial_sparse_result(0x1efa088) 192 covering {12288=192} (offset 2906) to [0,5286,38054,4288]
2012-11-20 13:54:30.347863 7f9404ed6700 20 striper t 0~5286 bl has 192 off 2906
2012-11-20 13:54:30.347866 7f9404ed6700 20 striper s gap 9382, skipping
2012-11-20 13:54:30.347867 7f9404ed6700 20 striper s has 192, copying
2012-11-20 13:54:30.347872 7f9404ed6700 20 striper t 9574~18446744073709547328 bl has 0 off 12480
2012-11-20 13:54:30.347874 7f9404ed6700 20 striper s at end
2012-11-20 13:54:30.347876 7f9404ed6700 20 striper t 38054~4288 bl has 0 off 12480
2012-11-20 13:54:30.347877 7f9404ed6700 20 striper s at end
Dan reproduced this with
./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx
(although I was unable to do so).
Re-fixes #3428.
Reported-and-tested-by: Dan Mick <dan.mick@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
History
#1 Updated by Sage Weil over 11 years ago
- Status changed from 12 to Resolved
fixed in wip-rbd-read branch, commit:40a5cebe26c216443a9c19477287644c22c5a845
#2 Updated by Dan Mick over 11 years ago
Running test_librbd_fsx I encountered this again (with master, including
the changes in wip-rbd-read):
272 read 0x6b5a thru 0x110bf (0xa566 bytes)
osdc/Striper.cc: In function 'void Striper::StripedReadResult::assemble_result(CephContext*, ceph::bufferlist&, bool)' thread 7fa644870700 time 2012-11-19 12:38:29.464006
osdc/Striper.cc: 278: FAILED assert(p->first == end - p->second.second)
ceph version 0.54-669-gde12ae9 (de12ae9803e7aeb19f94aa4c9c587b54b12e5c32)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7fa64a455569]
2: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x3c5) [0x7fa64b2e440d]
3: (librbd::AioCompletion::finalize(CephContext*, long)+0x1ec) [0x7fa64b27ff5a]
4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x295) [0x7fa64b2805a7]
5: (librbd::C_AioRead::finish(int)+0x3f2) [0x7fa64b2809e4]
6: (Context::complete(int)+0x2b) [0x7fa64b280ea9]
7: (librbd::AioRequest::complete(int)+0x6f) [0x7fa64b280f39]
8: (librbd::rados_req_cb(void*, void*)+0x34) [0x7fa64b2b0e14]
9: (librados::C_AioComplete::finish(int)+0x4d) [0x7fa64a3dde79]
10: (Finisher::finisher_thread_entry()+0x348) [0x7fa64a454554]
11: (Finisher::FinisherThread::entry()+0x1c) [0x7fa64a3c57d0]
12: (Thread::_entry_func(void*)+0x23) [0x7fa64a601011]
13: (()+0x7e9a) [0x7fa6493bee9a]
14: (clone()+0x6d) [0x7fa6496c7cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted
#3 Updated by Dan Mick over 11 years ago
...and this seems to reproduce it so far:
./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx
#4 Updated by Sage Weil over 11 years ago
- Status changed from Resolved to In Progress
#5 Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Resolved
commit:ba1e2cc68b236e2cd79dc2c5821ed47080746a86
#6 Updated by Sage Weil over 11 years ago
#7 Updated by Tamilarasi muthamizhan over 11 years ago
- Status changed from Resolved to In Progress
- Assignee changed from Sage Weil to Dan Mick
ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8665
#8 Updated by Tamilarasi muthamizhan over 11 years ago
Recent logs: ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8664
2012-12-04T23:44:34.023 INFO:teuthology.orchestra.run.err:osdc/Striper.cc: 281: FAILED assert(p->first == end - p->second.second) 2012-12-04T23:44:34.023 INFO:teuthology.orchestra.run.err: ceph version 0.55-135-g4cdc30b (4cdc30b94319ba1ef72bd2d976651dc889793d61) 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f6dea4b698d] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 2: (Striper::StripedReadResult::assemble_result(CephContext*, ceph::buffer::list&, bool)+0x38f) [0x7f6deb30e725] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::finalize(CephContext*, long)+0x1d7) [0x7f6deb2ae8af] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x27a) [0x7f6deb2aeea2] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x3c8) [0x7f6deb2af2b2] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0x2b) [0x7f6deb2af707] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 7: (librbd::AioRequest::complete(int)+0x6f) [0x7f6deb2af797] 2012-12-04T23:44:34.024 INFO:teuthology.orchestra.run.err: 8: (librbd::rbd_req_cb(void*, void*)+0x3c) [0x7f6deb2debf0] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 9: (librbd::AioCompletion::complete()+0xd4) [0x7f6deb2af906] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 10: (librbd::AioCompletion::complete_request(CephContext*, long)+0x289) [0x7f6deb2aeeb1] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 11: (librbd::C_AioRead::finish(int)+0x3c8) [0x7f6deb2af2b2] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 12: (Context::complete(int)+0x2b) [0x7f6deb2af707] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 13: (librbd::AioRequest::complete(int)+0x6f) [0x7f6deb2af797] 2012-12-04T23:44:34.025 INFO:teuthology.orchestra.run.err: 14: (librbd::C_CacheRead::finish(int)+0x24) [0x7f6deb2af31e] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 15: (Context::complete(int)+0x2b) [0x7f6deb2af707] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 16: (ObjectCacher::C_RetryRead::finish(int)+0xb4) [0x7f6deb2fd1b6] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 17: (Context::complete(int)+0x2b) [0x7f6deb2af707] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 18: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x267) [0x7f6deb2fa7b5] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 19: (ObjectCacher::bh_read_finish(long, sobject_t, long, unsigned long, ceph::buffer::list&, int)+0x13af) [0x7f6deb2ee785] 2012-12-04T23:44:34.026 INFO:teuthology.orchestra.run.err: 20: (ObjectCacher::C_ReadFinish::finish(int)+0x6e) [0x7f6deb2fcf10] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 21: (Context::complete(int)+0x2b) [0x7f6deb2af707] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 22: (librbd::C_Request::finish(int)+0x142) [0x7f6deb2e73f4] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 23: (librbd::context_cb(void*, void*)+0x40) [0x7f6deb2e693c] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 24: (librados::C_AioComplete::finish(int)+0x4d) [0x7f6dea440b41] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 25: (Finisher::finisher_thread_entry()+0x342) [0x7f6dea4b5afe] 2012-12-04T23:44:34.027 INFO:teuthology.orchestra.run.err: 26: (Finisher::FinisherThread::entry()+0x1c) [0x7f6dea42891a] 2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 27: (Thread::_entry_func(void*)+0x23) [0x7f6dea6571b9] 2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 28: (()+0x7e9a) [0x7f6de9424e9a] 2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: 29: (clone()+0x6d) [0x7f6de972c4bd] 2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2012-12-04T23:44:34.028 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion' ubuntu@teuthology:/a/teuthology-2012-12-04_19:00:05-regression-next-testing-basic/8664$ cat config.yaml kernel: &id001 kdb: true sha1: 9954d934f23883b95fade3eb56448f2169735c72 nuke-on-error: true overrides: ceph: conf: client: rbd cache: true global: ms inject socket failures: 5000 fs: ext4 log-whitelist: - slow request sha1: 4cdc30b94319ba1ef72bd2d976651dc889793d61 s3tests: branch: next workunit: sha1: 4cdc30b94319ba1ef72bd2d976651dc889793d61 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana16.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDI5A8ibfCXo5uFpfa+hLZoUoMjRl54060vpyvkYSeFtVz0iX7rCJRkZBqHiwz940yqs2+qnoczg/SVucIFMZSv/c14V/8hhRNHDK7n0lADYF5tkTs9fupeAQSHEPytW/RxOLlNappd1rd7M56rVxeUMPgkVJAdJoQnkXyhw9KqizSj4vFej4qSKUMfaOlXwxCNjmUrtyBdPkk6kOgWODtgOn6d39Tva0hwn7jfeFaYc/ixNPFgiO5um2Fz3XHJFPMWPSLt5X3eEn54ZTO33B6ybYIgoYGn0kVBl+Wks/VTQkky5wJnbS5mghhPNK/+U0KJ5rcQlIgMGgTrBBROcFw3 ubuntu@plana26.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx0US96hot7gygZ69W4nxJQ9myYnn3I22YtOaSPe+yFWOPJVQUuOST+aw5K6JDcjdO2Gq0aS6s01mgoWpZlO/FVDKss7vZ2KjMp3uPkGMpDZarNbR3QTe5YZYrl7Wfw4pMu4jh92hCWJEzy5nH0H3X2YJhOd5BdOYz0P97qsMSPQGxhlvDBYBhDl9MLgsS3lKm/Js/OPLO+Uf3/SZceCjUqO2m3WsrJSiQJKh8XUWUu3z+6C1Wg6TXSSlA/jdVCiokDg7WYwPN9zMwzzGkGv+GUGHKMZaPGRZb9LQJLTBf/OjwRSgclAVdDc3vnZeYAS5+sDnt2grnJnlBd1rBUj3n ubuntu@plana35.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtCEUU7ZjJxMBOYq9dlPZppwlahU15OIq1a/xqI0YIpAF7ZLf1eis04s6Kzvl45ooKP2J6DW598Lr9zTtFb6j9RoAkgIbIOkWZmLQjeZesi//7B8uXBw7niNzU1iBI83K8dPL2cwTsgwXXtoaD03pW9HCv4ADhH12/CsfoqNPfql7ze4JlI07seHmUqkBuBFxLVgdZDXb+Cjdg2sQnMS5qZo0OGBp2yfBla01Pk8V1sHGg39miF0EDJZyIOL2ziZOcm1f02WVkUAD8MMOtfQbf33DT/ya4Th20YtRMQxNLkUYG6rxcjutzILFKDr7ugbvOwtBhVS7qplqLWAwHdi0p tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
#9 Updated by Josh Durgin over 11 years ago
- Status changed from In Progress to Resolved