Actions
Bug #47003
closedceph_test_rados test error. Reponses out of order due to the connection drops data.
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I found logs below:
2020-08-17T01:36:37.872 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 692653 to 1179630 tid 1 2020-08-17T01:36:37.878 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 1602061 to 2249664 tid 2 2020-08-17T01:36:37.883 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 2992629 to 3092653 tid 3 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 1 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 3 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 5 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 2 to smithi15817201-780 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:Error: finished tid 2 when last_acked_tid was 5 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f3ab6ffd700 time 2020-08-17T01:36:38.435642+0000 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: 907: ceph_abort_msg("abort() called") 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: ceph version 16.0.0-4262-g9943626 (9943626c67ac05201c815e9e878fd203ee096d94) pacific (dev) 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7f3aca390997] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x59c) [0x55d5add9c94c] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 3: (write_callback(void*, void*)+0x19) [0x55d5addb9009] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 4: (()+0x98897) [0x7f3ad3068897] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 5: (()+0xb2665) [0x7f3ad3082665] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 6: (()+0xb54ff) [0x7f3ad30854ff] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 7: (()+0xb5846) [0x7f3ad3085846] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 8: (()+0xbd6df) [0x7f3ac9c426df] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 9: (()+0x76db) [0x7f3ac9f156db] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 10: (clone()+0x3f) [0x7f3ac969da3f]
Then I grep ceph-osd.5.log, which is primary osd's log, and found logs below:
375153:2020-08-17T01:36:37.882+0000 7f551125e700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 -- 0x561fb0ce2880 con 0x561fb3824000 375156:2020-08-17T01:36:37.882+0000 7f551125e700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce2880 type=43 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375170:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375172:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce2880 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375174:2020-08-17T01:36:37.882+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce2880 seq=91 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 376811:2020-08-17T01:36:37.886+0000 7f550d256700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561fb3824000 376812:2020-08-17T01:36:37.886+0000 7f550d256700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 unknown :-1 s=CLOSED pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message connection closed. Drop message 0x561fb0ce1440 376860:2020-08-17T01:36:37.886+0000 7f551125e700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 -- 0x561fb38bc400 con 0x561faac7bc00 376861:2020-08-17T01:36:37.886+0000 7f551125e700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb38bc400 type=43 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376868:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376869:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb38bc400 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376870:2020-08-17T01:36:37.886+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb38bc400 seq=1 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376972:2020-08-17T01:36:37.890+0000 7f550d256700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561faac7bc00 376973:2020-08-17T01:36:37.890+0000 7f550d256700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce1440 type=43 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376978:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376979:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce1440 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376980:2020-08-17T01:36:37.890+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce1440 seq=3 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8
It seems like at the point of sending tid2's data, connection dropped the data and cloed.
Actions