Project

General

Profile

Actions

Bug #47003

closed

ceph_test_rados test error. Reponses out of order due to the connection drops data.

Added by Zhenyi Shu over 3 years ago. Updated about 3 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/shuzhenyi-2020-08-17_01:22:26-rados:thrash-wip-shuzhenyi-testing-2020-08-15-1304-distro-basic-smithi/5349936/

I found logs below:

2020-08-17T01:36:37.872 INFO:tasks.rados.rados.0.smithi158.stdout:780:  writing smithi15817201-780 from 692653 to 1179630 tid 1
2020-08-17T01:36:37.878 INFO:tasks.rados.rados.0.smithi158.stdout:780:  writing smithi15817201-780 from 1602061 to 2249664 tid 2
2020-08-17T01:36:37.883 INFO:tasks.rados.rados.0.smithi158.stdout:780:  writing smithi15817201-780 from 2992629 to 3092653 tid 3
2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780:  finishing write tid 1 to smithi15817201-780
2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780:  finishing write tid 3 to smithi15817201-780
2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780:  finishing write tid 5 to smithi15817201-780
2020-08-17T01:36:38.478 INFO:tasks.rados.rados.0.smithi158.stdout:780:  finishing write tid 2 to smithi15817201-780
2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:Error: finished tid 2 when last_acked_tid was 5
2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f3ab6ffd700 time 2020-08-17T01:36:38.435642+0000
2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr:/build/ceph-16.0.0-4262-g9943626/src/test/osd/RadosModel.h: 907: ceph_abort_msg("abort() called")
2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: ceph version 16.0.0-4262-g9943626 (9943626c67ac05201c815e9e878fd203ee096d94) pacific (dev)
2020-08-17T01:36:38.479 INFO:tasks.rados.rados.0.smithi158.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7f3aca390997]
2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x59c) [0x55d5add9c94c]
2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 3: (write_callback(void*, void*)+0x19) [0x55d5addb9009]
2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 4: (()+0x98897) [0x7f3ad3068897]
2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 5: (()+0xb2665) [0x7f3ad3082665]
2020-08-17T01:36:38.480 INFO:tasks.rados.rados.0.smithi158.stderr: 6: (()+0xb54ff) [0x7f3ad30854ff]
2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 7: (()+0xb5846) [0x7f3ad3085846]
2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 8: (()+0xbd6df) [0x7f3ac9c426df]
2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 9: (()+0x76db) [0x7f3ac9f156db]
2020-08-17T01:36:38.481 INFO:tasks.rados.rados.0.smithi158.stderr: 10: (clone()+0x3f) [0x7f3ac969da3f]

Then I grep ceph-osd.5.log, which is primary osd's log, and found logs below:

375153:2020-08-17T01:36:37.882+0000 7f551125e700  1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8 -- 0x561fb0ce2880 con 0x561fb3824000
375156:2020-08-17T01:36:37.882+0000 7f551125e700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce2880 type=43 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8
375170:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8
375172:2020-08-17T01:36:37.882+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce2880 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8
375174:2020-08-17T01:36:37.882+0000 7f552d12d700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 crc :-1 s=READY pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce2880 seq=91 osd_op_reply(3505 smithi15817201-780 [write 692653~486977,stat] v29'170 uv170 ondisk = 0) v8
376811:2020-08-17T01:36:37.886+0000 7f550d256700  1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561fb3824000
376812:2020-08-17T01:36:37.886+0000 7f550d256700 10 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561fb3824000 0x561fb0da3900 unknown :-1 s=CLOSED pgs=19 cs=0 l=1 rev1=1 rx=0 tx=0).send_message connection closed. Drop message 0x561fb0ce1440
376860:2020-08-17T01:36:37.886+0000 7f551125e700  1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8 -- 0x561fb38bc400 con 0x561faac7bc00
376861:2020-08-17T01:36:37.886+0000 7f551125e700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb38bc400 type=43 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8
376868:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8
376869:2020-08-17T01:36:37.886+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb38bc400 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8
376870:2020-08-17T01:36:37.886+0000 7f552d12d700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb38bc400 seq=1 osd_op_reply(3507 smithi15817201-780 [write 2992629~100024] v29'172 uv172 ondisk = 0) v8
376972:2020-08-17T01:36:37.890+0000 7f550d256700  1 -- v2:172.21.15.66:6800/13577 --> 172.21.15.158:0/3294166163 -- osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8 -- 0x561fb0ce1440 con 0x561faac7bc00
376973:2020-08-17T01:36:37.890+0000 7f550d256700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).send_message enqueueing message m=0x561fb0ce1440 type=43 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8
376978:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message m=osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8
376979:2020-08-17T01:36:37.890+0000 7f552d12d700 20 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).prepare_send_message encoding features 4540138292837744639 0x561fb0ce1440 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8
376980:2020-08-17T01:36:37.890+0000 7f552d12d700  5 --2- v2:172.21.15.66:6800/13577 >> 172.21.15.158:0/3294166163 conn(0x561faac7bc00 0x561facccca00 crc :-1 s=READY pgs=26 cs=0 l=1 rev1=1 rx=0 tx=0).write_message sending message m=0x561fb0ce1440 seq=3 osd_op_reply(3506 smithi15817201-780 [write 1602061~647603] v29'171 uv171 ondisk = 0) v8

It seems like at the point of sending tid2's data, connection dropped the data and cloed.


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #48793: out of order opResolvedRonen Friedman

Actions
Actions

Also available in: Atom PDF