Project

General

Profile

Actions

Bug #37911

closed

osd dequeue misorder

Added by Sage Weil over 5 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-01-13 08:09:22.322 7f546c234700  1 -- v1:172.21.15.27:6810/37346 <== osd.1 v1:172.21.15.97:6805/92918 1132 ==== osd_repop(client.4478.0:9420 2.17 e937/936) v2 ==== 1021+0+657 (765309369 0 778914224) 0x55f428e17100 con 0x55f429859400
2019-01-13 08:09:22.322 7f546c234700  1 -- v1:172.21.15.27:6810/37346 <== osd.1 v1:172.21.15.97:6805/92918 1133 ==== pg_backfill(finish 2.17 e 937/937 lb MAX) v3 ==== 951+0+0 (2431142910 0 0) 0x55f42a788300 con 0x55f429859400
2019-01-13 08:09:22.325 7f5446755700 10 osd.6 937 dequeue_op 0x55f429ee5760 prio 127 cost 0 latency 0.002828 pg_backfill(finish 2.17 e 937/937 lb MAX) v3 pg pg[2.17( v 937'925 (322'404,937'925] lb MIN (bitwise) local-lis/les=936/937 n=0 ec=561/17 lis/c 936/858 les/c/f 937/859/0 935/936/936) [6,1]/[1,4] r=-1 lpr=936 pi=[858,936)/1 luod=0'0 lua=914'922 crt=937'925 lcod 914'922 active+remapped mbc={} ps=[304~1,307~1]]
2019-01-13 08:09:22.325 7f5446755700 10 osd.6 937 dequeue_op 0x55f4296f8b00 prio 127 cost 657 latency 0.003203 osd_repop(client.4478.0:9420 2.17 e937/936) v2 pg pg[2.17( v 937'925 (322'404,937'925] local-lis/les=936/937 n=2 ec=561/17 lis/c 936/858 les/c/f 937/859/0 935/936/936) [6,1]/[1,4] r=-1 lpr=936 pi=[858,936)/1 luod=0'0 lua=914'922 crt=937'925 lcod 914'922 active+remapped mbc={} ps=[304~1,307~1]]

/a/sage-2019-01-12_18:28:26-rados-wip-sage-testing-2019-01-12-1028-distro-basic-smithi/3455807

the symptom here was a scrub mismatch. because we merged 2.7 with 2.17, and 2.17 had n=-1. because the backfill completion above misordered with a write, screwing up the pg stats.

Actions #1

Updated by Greg Farnum over 4 years ago

  • Status changed from 12 to Can't reproduce
  • Priority changed from Urgent to Normal

There have been pg merge fixes since then...

Actions

Also available in: Atom PDF