Bug #47003
closedceph_test_rados test error. Reponses out of order due to the connection drops data.
0%
Description
I found logs below:
2020-08-17T01:36:37.872 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 692653 to 1179630 tid 1 2020-08-17T01:36:37.878 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 1602061 to 2249664 tid 2 2020-08-17T01:36:37.883 INFO:tasks.rados.rados.0.smithi158.stdout:780: writing smithi15817201-780 from 2992629 to 3092653 tid 3 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 1 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 3 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 5 to smithi15817201-780 2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780: finishing write tid 2 to smithi15817201-780 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:Error: finished tid 2 when last_acked_tid was 5 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f3ab6ffd700 time 2020-08-17T01:36:38.435642+0000 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: 907: ceph_abort_msg("abort() called") 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: ceph version 16.0.0-4262-g9943626 (9943626c67ac05201c815e9e878fd203ee096d94) pacific (dev) 2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7f3aca390997] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x59c) [0x55d5add9c94c] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 3: (write_callback(void*, void*)+0x19) [0x55d5addb9009] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 4: (()+0x98897) [0x7f3ad3068897] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 5: (()+0xb2665) [0x7f3ad3082665] 2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 6: (()+0xb54ff) [0x7f3ad30854ff] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 7: (()+0xb5846) [0x7f3ad3085846] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 8: (()+0xbd6df) [0x7f3ac9c426df] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 9: (()+0x76db) [0x7f3ac9f156db] 2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 10: (clone()+0x3f) [0x7f3ac969da3f]
Then I grep ceph-osd.5.log, which is primary osd's log, and found logs below:
375153:2020-08-17T01:36:37.882+0000 7f551125e700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 -- 0x561fb0ce2880 con 0x561fb3824000 375156:2020-08-17T01:36:37.882+0000 7f551125e700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce2880 type=43 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375170:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375172:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce2880 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 375174:2020-08-17T01:36:37.882+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce2880 seq=91 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 376811:2020-08-17T01:36:37.886+0000 7f550d256700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561fb3824000 376812:2020-08-17T01:36:37.886+0000 7f550d256700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 unknown :-1 s=CLOSED pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message connection closed. Drop message 0x561fb0ce1440 376860:2020-08-17T01:36:37.886+0000 7f551125e700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 -- 0x561fb38bc400 con 0x561faac7bc00 376861:2020-08-17T01:36:37.886+0000 7f551125e700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb38bc400 type=43 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376868:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376869:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb38bc400 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376870:2020-08-17T01:36:37.886+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb38bc400 seq=1 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 376972:2020-08-17T01:36:37.890+0000 7f550d256700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561faac7bc00 376973:2020-08-17T01:36:37.890+0000 7f550d256700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce1440 type=43 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376978:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376979:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce1440 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 376980:2020-08-17T01:36:37.890+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce1440 seq=3 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8
It seems like at the point of sending tid2's data, connection dropped the data and cloed.
Updated by Zhenyi Shu over 3 years ago
The logs I grep "0x561fb3824000" below shows that the connection was reset.
375180:2020-08-17T01:36:37.882+0000 7f552d12d700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending 0x561fb0ce2880 done. 375187:2020-08-17T01:36:37.882+0000 7f552d12d700 20 -- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 msgr2=0x561fb0da3900 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process 375189:2020-08-17T01:36:37.882+0000 7f552d12d700 20 -- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 msgr2=0x561fb0da3900 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read continue len=32 375190:2020-08-17T01:36:37.882+0000 7f552d12d700 1 -- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 msgr2=0x561fb0da3900 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 93 375191:2020-08-17T01:36:37.882+0000 7f552d12d700 1 -- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 msgr2=0x561fb0da3900 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 375193:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main r=-1 375195:2020-08-17T01:36:37.882+0000 7f552d12d700 1 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) 375196:2020-08-17T01:36:37.882+0000 7f552d12d700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0)._fault 375197:2020-08-17T01:36:37.882+0000 7f552d12d700 2 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0)._fault on lossy channel, failing 375198:2020-08-17T01:36:37.882+0000 7f552d12d700 1 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).stop 375200:2020-08-17T01:36:37.882+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).reset_recv_state 375201:2020-08-17T01:36:37.882+0000 7f552d12d700 5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).reset_security 375202:2020-08-17T01:36:37.882+0000 7f552d12d700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 unknown :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).discard_out_queue started 375206:2020-08-17T01:36:37.882+0000 7f551c274700 2 osd.5 29 ms_handle_reset con 0x561fb3824000 session 0x561fb2db2000 376811:2020-08-17T01:36:37.886+0000 7f550d256700 1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561fb3824000 376812:2020-08-17T01:36:37.886+0000 7f550d256700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 unknown :-1 s=CLOSED pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message connection closed. Drop message 0x561fb0ce1440 376988:2020-08-17T01:36:37.890+0000 7f550d256700 15 osd.5 pg_epoch: 29 pg[3.0( v 29'173 (29'170,29'173] local-lis/les=19/20 n=50 ec=19/19 lis/c=19/19 les/c/f=20/20/0 sis=19) [5,7,2] r=0 lpr=19 luod=29'171 crt=29'173 lcod 29'170 mlcod 29'170 active+clean] do_osd_op_effects client.4600 con 0x561fb3824000
Updated by Neha Ojha over 3 years ago
- Project changed from Ceph to RADOS
- Priority changed from Normal to Urgent
rados/thrash-erasure-code/{ceph clusters/{fixed-2 openstack} fast/fast mon_election/classic msgr-failures/few objectstore/bluestore-comp-zlib rados recovery-overrides/{more-active-recovery} supported-random-distro$/{centos_8} thrashers/minsize_recovery thrashosds-health workloads/ec-small-objects-many-deletes}
/a/yuriw-2021-01-15_19:06:33-rados-wip-yuri8-testing-master-2021-01-15-0935-distro-basic-smithi/5789308
Updated by Kefu Chai about 3 years ago
/a//kchai-2021-01-28_03:28:19-rados-wip-kefu-testing-2021-01-27-1353-distro-basic-smithi/5834177
Updated by Neha Ojha about 3 years ago
- Is duplicate of Bug #48793: out of order op added