Project

General

Profile

Actions

Bug #9971

closed

OSD crashes again after restarting due to op thread time out at writing pg dirty log

Added by Zhi Zhang over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This crashes observed when one OSD was restarted after being down for a long time, it crashed again
because its op thread timed out.

Restarting it multiple times seems to workaround it.

------------------------------------------------
2014-10-30 04:48:35.781346 7f69bdf1d700 10 osd.73 23999 dequeue_op 0x20360c30
prio 63 cost 0 latency 239.712660 MOSDECSubOpWrite(3.1ebcs7 23997
ECSubWrite(tid=636693, reqid=client.128441.0:3087729, at_version=23997'16994,
trim_to=0'0, trim_rollback_to=23914'16165)) v1 pg pg[3.1ebcs7( v 23997'16993 lc
23914'16165 (1109'13061,23997'16993] local-les=23994 n=16903 ec=255 les/c
23994/23932 23993/23993/20676) [387,136,140,60,86,248,419,73,179,66,474] r=7
lpr=23993 pi=255-23992/260 luod=0'0 crt=23992'16986 active m=821]
2014-10-30 04:48:35.781401 7f69bdf1d700 10 osd.73 pg_epoch: 23997 pg[3.1ebcs7(
v 23997'16993 lc 23914'16165 (1109'13061,23997'16993] local-les=23994 n=16903
ec=255 les/c 23994/23932 23993/23993/20676)
[387,136,140,60,86,248,419,73,179,66,474] r=7 lpr=23993 pi=255-23992/260
luod=0'0 crt=23992'16986 active m=821] handle_message:
MOSDECSubOpWrite(3.1ebcs7 23997 ECSubWrite(tid=636693,
reqid=client.128441.0:3087729, at_version=23997'16994, trim_to=0'0,
trim_rollback_to=23914'16165)) v1
2014-10-30 04:48:35.781465 7f69bdf1d700 10 osd.73 pg_epoch: 23997 pg[3.1ebcs7(
v 23997'16993 lc 23914'16165 (1109'13061,23997'16993] local-les=23994 n=16904
ec=255 les/c 23994/23932 23993/23993/20676)
[387,136,140,60,86,248,419,73,179,66,474] r=7 lpr=23993 pi=255-23992/260
luod=0'0 crt=23992'16986 active m=821] append_log log((1109'13061,23997'16993],
crt=23992'16986) [23997'16994 (0'0) modify
c716febc/default.12479.271_4132633829_ef1cacbaaf_o.jpg/head//3 by
client.128441.0:3087729 2014-10-30 04:44:36.000000]
2014-10-30 04:48:35.781511 7f69bdf1d700 10 osd.73 pg_epoch: 23997 pg[3.1ebcs7(
v 23997'16994 lc 23914'16165 (1109'13061,23997'16994] local-les=23994 n=16904
ec=255 les/c 23994/23932 23993/23993/20676)
[387,136,140,60,86,248,419,73,179,66,474] r=7 lpr=23993 pi=255-23992/260
luod=0'0 crt=23992'16986 active m=821] add_log_entry 23997'16994 (0'0) modify
c716febc/default.12479.271_4132633829_ef1cacbaaf_o.jpg/head//3 by
client.128441.0:3087729 2014-10-30 04:44:36.000000
2014-10-30 04:48:35.781541 7f69bdf1d700 10 osd.73 pg_epoch: 23997 pg[3.1ebcs7(
v 23997'16994 lc 23914'16165 (1109'13061,23997'16994] local-les=23994 n=16904
ec=255 les/c 23994/23932 23993/23993/20676)
[387,136,140,60,86,248,419,73,179,66,474] r=7 lpr=23993 pi=255-23992/260
luod=0'0 crt=23992'16986 active m=821] append_log adding 1 keys
2014-10-30 04:48:35.781564 7f69bdf1d700 10 osd.73 pg_epoch: 23997 pg[3.1ebcs7(
v 23997'16994 lc 23914'16165 (1109'13061,23997'16994] local-les=23994 n=16904
ec=255 les/c 23994/23932 23993/23993/20676)
[387,136,140,60,86,248,419,73,179,66,474] r=7 lpr=23993 pi=255-23992/260
luod=0'0 crt=23992'16986 active m=821] append_log: trimming to 23914'16165
entries
2014-10-30 04:48:35.781609 7f69bdf1d700 10 write_log with: dirty_to: 0'0,
dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: 0,
writeout_from: 23997'16994, trimmed:
...
2014-10-30 04:48:54.120749 7f69ef67e700 1 heartbeat_map is_healthy 'OSD::op_tp
thread 0x7f69bdf1d700' had timed out after 15
...



Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #10157: PGLog::(read|write)_log don't write out rollback_info_trimmed_toResolvedSamuel Just11/20/2014

Actions
Actions

Also available in: Atom PDF