Project

General

Profile

Actions

Bug #8334

closed

osd: bug with pool snaps, ceph_test_rados

Added by Sage Weil almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-05-11_09:34:50-rados-firefly-testing-basic-plana/249051

fails with

2014-05-11T15:54:50.502 INFO:teuthology.task.rados.rados.0.out:[10.214.133.28]: 3352: read oid 469 snap 253
2014-05-11T15:54:50.502 INFO:teuthology.task.rados.rados.0.out:[10.214.133.28]: 3353: snap_create
2014-05-11T15:54:50.570 INFO:teuthology.orchestra.run.err:[10.214.133.28]: dumped all in format json
2014-05-11T15:54:50.598 DEBUG:teuthology.orchestra.run:Running [10.214.133.28]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'
2014-05-11T15:54:50.599 INFO:teuthology.task.rados.rados.0.out:[10.214.133.28]: 3352:  expect (ObjNum 31059424 snap 0 seq_num 3691041296)
2014-05-11T15:54:50.640 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]: ./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7ff2e6ffd700 time 2014-05-11 15:54:50.596480
2014-05-11T15:54:50.641 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]: ./test/osd/RadosModel.h: 1062: FAILED assert(!old_value.deleted())
2014-05-11T15:54:50.651 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  ceph version 0.80-12-gdb8873b (db8873b69c73b40110bf1512c114e4a0395671ab)
2014-05-11T15:54:50.651 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  1: (ReadOp::_finish(TestOp::CallbackInfo*)+0x169d) [0x41c5ad]
2014-05-11T15:54:50.651 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  2: (librados::C_AioComplete::finish(int)+0x1d) [0x7ff2f109d14d]
2014-05-11T15:54:50.651 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  3: (Context::complete(int)+0x9) [0x7ff2f107a839]
2014-05-11T15:54:50.651 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  4: (Finisher::finisher_thread_entry()+0x1c0) [0x7ff2f112d8f0]
2014-05-11T15:54:50.652 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  5: (()+0x7e9a) [0x7ff2f0cd2e9a]
2014-05-11T15:54:50.652 INFO:teuthology.task.rados.rados.0.err:[10.214.133.28]:  6: (clone()+0x6d) [0x7ff2f04e93fd]

but the osd trace shows

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-05-11_09:34:50-rados-firefly-testing-basic-plana/249051$ grep -h osd_op remote/*/log/*osd* | grep -- -- | grep -- -469 | sort | grep 10.214.133.28:0/1028284
2014-05-11 15:35:57.215248 7f14a1bd6700  1 -- 10.214.133.28:6800/26920 <== client.4139 10.214.133.28:0/1028284 529 ==== osd_op(client.4139.0:1981 plana8828284-469 [write 463584~1] 4.cb53cade ack+ondisk+write e24) v4 ==== 167+0+1 (348395581 0 4101499445) 0x27f5480 con 0x2876160
2014-05-11 15:35:57.215443 7f14a1bd6700  1 -- 10.214.133.28:6800/26920 <== client.4139 10.214.133.28:0/1028284 530 ==== osd_op(client.4139.0:1982 plana8828284-469 [setxattr _header (55),truncate 463585] 4.cb53cade ack+ondisk+write e24) v4 ==== 205+0+62 (1779944182 0 3153413204) 0x27f5d80 con 0x2876160
2014-05-11 15:35:57.215636 7f14a1bd6700  1 -- 10.214.133.28:6800/26920 <== client.4139 10.214.133.28:0/1028284 531 ==== osd_op(client.4139.0:1983 plana8828284-469 [read 0~1] 4.cb53cade ack+read+rwordered e24) v4 ==== 167+0+0 (1552051804 0 0) 0x27d1d80 con 0x2876160
2014-05-11 15:35:57.368831 7f149abc8700  1 -- 10.214.133.28:6800/26920 --> 10.214.133.28:0/1028284 -- osd_op_reply(1981 plana8828284-469 [write 463584~1] v24'708 uv708 ack = 0) v6 -- ?+0 0x2a49780 con 0x2876160
2014-05-11 15:35:57.369010 7f149abc8700  1 -- 10.214.133.28:6800/26920 --> 10.214.133.28:0/1028284 -- osd_op_reply(1981 plana8828284-469 [write 463584~1] v24'708 uv708 ondisk = 0) v6 -- ?+0 0x2e10c80 con 0x2876160
2014-05-11 15:35:57.369518 7f149b3c9700  1 -- 10.214.133.28:6800/26920 --> 10.214.133.28:0/1028284 -- osd_op_reply(1982 plana8828284-469 [setxattr _header (55),truncate 463585] v24'709 uv709 ack = 0) v6 -- ?+0 0x2bf2780 con 0x2876160
2014-05-11 15:35:57.369687 7f149b3c9700  1 -- 10.214.133.28:6800/26920 --> 10.214.133.28:0/1028284 -- osd_op_reply(1982 plana8828284-469 [setxattr _header (55),truncate 463585] v24'709 uv709 ondisk = 0) v6 -- ?+0 0x2cab000 con 0x2876160
2014-05-11 15:35:57.370408 7f149abc8700  1 -- 10.214.133.28:6800/26920 --> 10.214.133.28:0/1028284 -- osd_op_reply(1983 plana8828284-469 [read 0~1] v0'0 uv709 ondisk = 0) v6 -- ?+0 0x3174c80 con 0x2876160
2014-05-11 15:42:35.636606 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 75 ==== osd_op(client.4139.0:3827 plana8828284-469 [copy-from ver 1978] 4.cb53cade snapc bc=[bc,ba,b6,b4,b2,af,ae,ab,aa,a6,a4,a2,9f,96,94,91,8b,89,83,78,65,62] ack+ondisk+write e249) v4 ==== 343+0+54 (1142955586 0 2801762402) 0x3194480 con 0x29329a0
2014-05-11 15:42:35.636767 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 76 ==== osd_op(client.4139.0:3828 plana8828284-469 [stat] 4.cb53cade ack+read+rwordered e249) v4 ==== 167+0+0 (541694448 0 0) 0x3194d80 con 0x29329a0
2014-05-11 15:42:36.155926 7fef37c54700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(3827 plana8828284-469 [copy-from ver 1978] v249'2404 uv1978 ack = 0) v6 -- ?+0 0x2705280 con 0x29329a0
2014-05-11 15:42:36.156037 7fef37c54700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(3827 plana8828284-469 [copy-from ver 1978] v249'2404 uv1978 ondisk = 0) v6 -- ?+0 0x33a0500 con 0x29329a0
2014-05-11 15:42:36.156693 7fef37453700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(3828 plana8828284-469 [stat] v0'0 uv1978 ondisk = 0) v6 -- ?+0 0x24faa00 con 0x29329a0
2014-05-11 15:44:01.215327 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 192 ==== osd_op(client.4139.0:4314 plana8828284-469 [rollback de] 4.cb53cade snapc eb=[eb,ea,e7,e6,e5,e2,de,dd,d1,c8,c6,be,b4,91] ack+ondisk+write e301) v4 ==== 279+0+0 (3344968995 0 0) 0x2f696c0 con 0x29329a0
2014-05-11 15:44:01.713158 7fef37c54700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(4314 plana8828284-469 [rollback de] v301'2834 uv2834 ack = 0) v6 -- ?+0 0x4135c80 con 0x29329a0
2014-05-11 15:44:01.713345 7fef37c54700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(4314 plana8828284-469 [rollback de] v301'2834 uv2834 ondisk = 0) v6 -- ?+0 0x396cc80 con 0x29329a0
2014-05-11 15:46:28.277491 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 558 ==== osd_op(client.4139.0:5118 plana8828284-469 [delete] 4.cb53cade snapc 13e=[13e,13d,13c,13b,138,137,136,134,132,131,12e,12d,12c,128,126,121] ondisk+write e384) v4 ==== 295+0+0 (295068999 0 0) 0x30d2480 con 0x29329a0
2014-05-11 15:46:28.294883 7fef37453700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(5118 plana8828284-469 [delete] v384'3517 uv3517 ondisk = 0) v6 -- ?+0 0x6e1f500 con 0x29329a0
2014-05-11 15:46:29.561146 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 561 ==== osd_op(client.4139.0:5126 plana8828284-469 [delete] 4.cb53cade snapc 13e=[13e,13c,13b,138,137,136,134,132,131,12e,12d,12c,128,126,121] ondisk+write e386) v4 ==== 287+0+0 (696255664 0 0) 0x2ab4b40 con 0x29329a0
2014-05-11 15:46:29.561871 7fef37453700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(5126 plana8828284-469 [delete] v0'0 uv3520 ondisk = -2 ((2) No such file or directory)) v6 -- ?+0 0x3bd3280 con 0x29329a0
2014-05-11 15:50:31.986588 7fef3e461700  1 -- 10.214.133.28:6800/30041 <== client.4139 10.214.133.28:0/1028284 552 ==== osd_op(client.4139.0:6254 plana8828284-469 [delete] 4.cb53cade snapc 1cc=[1cc,1cb,1ca,1b8,1b7,1b4,198,196,195,193,18a,16a,166,165,13e] ondisk+write e545) v4 ==== 287+0+0 (647741483 0 0) 0x2b45900 con 0x31b66e0
2014-05-11 15:50:31.999581 7fef37c54700  1 -- 10.214.133.28:6800/30041 --> 10.214.133.28:0/1028284 -- osd_op_reply(6254 plana8828284-469 [delete] v0'0 uv4348 ondisk = -2 ((2) No such file or directory)) v6 -- ?+0 0x770b780 con 0x31b66e0
2014-05-11 15:53:12.660730 7f9681056700  1 -- 10.214.133.29:6800/4663 <== client.4139 10.214.133.28:0/1028284 443 ==== osd_op(client.4139.0:6889 plana8828284-469 [write 177540~1] 4.cb53cade snapc 226=[226,222,220,21f,21e,21d,217,215,214,212,20e,20d,1fc,1f1,1f0,1ee,1e9,1e0,1df,1d8,1cb,16a] ack+ondisk+write e648) v4 ==== 343+0+1 (2651450 0 3037525897) 0x281b240 con 0x40fa6e0
2014-05-11 15:53:12.660903 7f9681056700  1 -- 10.214.133.29:6800/4663 <== client.4139 10.214.133.28:0/1028284 444 ==== osd_op(client.4139.0:6890 plana8828284-469 [setxattr _header (57),truncate 177541] 4.cb53cade snapc 226=[226,222,220,21f,21e,21d,217,215,214,212,20e,20d,1fc,1f1,1f0,1ee,1e9,1e0,1df,1d8,1cb,16a] ack+ondisk+write e648) v4 ==== 381+0+64 (1861084997 0 2806459310) 0x281bd80 con 0x40fa6e0
2014-05-11 15:53:12.661058 7f9681056700  1 -- 10.214.133.29:6800/4663 <== client.4139 10.214.133.28:0/1028284 445 ==== osd_op(client.4139.0:6891 plana8828284-469 [read 0~1] 4.cb53cade ack+read+rwordered e648) v4 ==== 167+0+0 (3499088348 0 0) 0x281b000 con 0x40fa6e0
2014-05-11 15:53:12.766911 7f9683989700  1 -- 10.214.133.29:6800/4663 --> 10.214.133.28:0/1028284 -- osd_op_reply(6889 plana8828284-469 [write 177540~1] v648'4882 uv4882 ondisk = 0) v6 -- ?+0 0x3565a00 con 0x40fa6e0
2014-05-11 15:53:12.767060 7f9683989700  1 -- 10.214.133.29:6800/4663 --> 10.214.133.28:0/1028284 -- osd_op_reply(6890 plana8828284-469 [setxattr _header (57),truncate 177541] v648'4883 uv4883 ondisk = 0) v6 -- ?+0 0x2e47780 con 0x40fa6e0
2014-05-11 15:53:12.952961 7f967a048700  1 -- 10.214.133.29:6800/4663 --> 10.214.133.28:0/1028284 -- osd_op_reply(6891 plana8828284-469 [read 0~1] v0'0 uv4883 ondisk = 0) v6 -- ?+0 0x5a2c000 con 0x40fa6e0
2014-05-11 15:54:50.501917 7f9a5e1cc700  1 -- 10.214.133.29:6806/5064 <== client.4139 10.214.133.28:0/1028284 231 ==== osd_op(client.4139.0:7325 plana8828284-469@1e9 [read 0~0,omap-get-vals-by-keys 0~4,omap-get-keys 0~12,omap-get-vals 0~16,omap-get-header 0~0,getxattrs] 4.cb53cade ack+read e711) v4 ==== 357+0+32 (1353414462 0 2675358983) 0x5424900 con 0x2777580
2014-05-11 15:54:50.592840 7f9a579bf700  1 -- 10.214.133.29:6806/5064 --> 10.214.133.28:0/1028284 -- osd_op_reply(7325 plana8828284-469 [read 0~350594,omap-get-vals-by-keys 0~4,omap-get-keys 0~12,omap-get-vals 0~16,omap-get-header 0~0,getxattrs] v0'0 uv1978 ondisk = 0) v6 -- ?+0 0x5430000 con 0x2777580

either there is a pool snap rollback in there that is a no-op (there is no rollback for pool snaps!) or ceph_test_rados is broken or something.

Actions #1

Updated by Samuel Just almost 10 years ago

  • Status changed from New to 7
Actions #2

Updated by Samuel Just almost 10 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just almost 10 years ago

found another related flaw, testing more

Actions #4

Updated by Samuel Just almost 10 years ago

  • Status changed from 7 to Pending Backport
Actions #5

Updated by Samuel Just almost 10 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF