Project

General

Profile

Actions

Bug #10118

closed

messenger drops messages between osds

Added by Guang Yang over 9 years ago. Updated almost 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Log snippets before the daemon crash:

2014-10-27 20:27:32.320777 7f2d43661700  0 osd.165 pg_epoch: 1166 pg[3.1e8ds0(
v 1165'13077 lc 1018'6596 (1004'3595,1165'13077] l
ocal-les=1150 n=13058 ec=255 les/c 1150/1020 1148/1148/1038)
[165,383,266,362,367,46,12,449,187,153,338] r=0 lpr=1148 pi=985-1147
/4 crt=1030'11674 mlcod 1018'6596 active+recovery_wait m=5070]  removing
repgather(0x27a84e40 1165'13077 rep_tid=53667 committed?
=1 applied?=1 op=osd_op(client.12473.0:48029435
default.12181.368_12227485465_50c2526085_o.jpg [create 0~0,setxattr
user.rgw.idta
g (23),writefull 0~1900011,setxattr user.rgw.manifest (472),setxattr
user.rgw.acl (133),setxattr user.rgw.content_type (25),setxa
ttr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (57)] 3.21a69e8d
ondisk+write e1165) v4)

2014-10-27 20:27:32.352325 7f2d43661700  0 osd.165 pg_epoch: 1166 pg[3.1e8ds0(
v 1165'13077 lc 1018'6596 (1004'3595,1165'13077] local-les=1150 n=13058 ec=255
les/c 1150/1020 1148/1148/1038) [165,383,266,362,367,46,12,449,187,153,338] r=0
lpr=1148 pi=985-1147/4 crt=1030'11674 mlcod 1018'6596 active+recovery_wait
m=5070]    q front is repgather(0x99eaaf00 1165'13076 rep_tid=53646
committed?=0 applied?=1 lock=0 op=osd_op(client.12485.0:46420099
default.12402.391_14252946456_ff2cc332ba_o.jpg [create 0~0,setxattr
user.rgw.idtag (23),writefull 0~5898240,setxattr user.rgw.manifest
(472),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr
user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (57)] 3.33e9be8d
ondisk+write e1165) v4)

2014-10-27 20:27:32.431388 7f2d43661700 -1 osd/ReplicatedPG.cc: In function
'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)' thread 7f2d43661700
time 2014-10-27 20:27:32.378207
osd/ReplicatedPG.cc: 6670: FAILED assert(repop_queue.front() == repop)

Root cause is under investigation...

Ceph version: v0.80.4
Platform: RHEL6.5


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #10057: msgr: skipped message on peer reconnectCan't reproduce11/10/2014

Actions
Actions

Also available in: Atom PDF