Project

General

Profile

Actions

Bug #15523

closed

osd: acting_primary not updated on split

Added by Sage Weil about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
jewel,infernalis,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2016-04-15_05:22:29-rados-master-distro-basic-smithi/131192

a pg is stuck stale,

0.129   0       0       0       0       0       0       0       0       stale+active+clean      2016-04-15 14:32:04.537262      0'0     549:7   [0,1,3] 0       [4,1,3] 4       0'0     2016-04-15 14:26:36.590356      0'0     2016-04-15 14:26:36.590356

because the mon wrongly marks it that way
2016-04-15 14:32:01.120114 7fd35b482700 10 mon.a@0(leader).pg v731 register_pg  will create 0.129 primary 0 acting [0,1,3] parent 0.29 by 1 bits
...
2016-04-15 14:32:01.239533 7fd35b482700 20 mon.a@0(leader).pg v731  refreshing pg 0.129 0:0 creating
...
2016-04-15 14:32:02.706037 7fd359c7f700 15 mon.a@0(leader).pg v733  got 0.129 reported at 548:1 state creating -> peering
(osdmap is 548)
...
2016-04-15 14:32:03.408916 7fd35b482700 20 mon.a@0(leader).pg v733  refreshing pg 0.129 548:1 peering
(osdmap is 548)
...
2016-04-15 14:32:07.711309 7fd359c7f700 15 mon.a@0(leader).pg v738  got 0.129 reported at 549:7 state peering -> active+clean
...
2016-04-15 14:32:07.979124 7fd35b482700 10 mon.a@0(leader).pg v738 check_down_pgs last_osdmap_epoch 552
(note: epoch 552, osd.4 is down, pg not marked stale here, ergo acting_primary != 4.. presumably 0 as intended)
2016-04-15 14:32:07.979584 7fd35b482700 10 mon.a@0(leader).paxosservice(pgmap 1..738) propose_pending
...
2016-04-15 14:32:08.097296 7fd35b482700 20 mon.a@0(leader).pg v738  refreshing pg 0.129 549:7 active+clean
...
2016-04-15 14:32:09.239834 7fd35b482700 10 mon.a@0(leader).pg v739 check_down_pgs last_osdmap_epoch 553
2016-04-15 14:32:09.239984 7fd35b482700 10 mon.a@0(leader).pg v739  marking pg 0.129 stale (acting_primary 4)
(now acting_primary is 4, but shouldn't be)

either the osd corrupted it's stats.acting_primary value (it's only set by init and start_peering_interval, which didn't happen between the reports.. maybe some race? memory corruption?), or the mon did something similarly stupid. :/


Related issues 3 (0 open3 closed)

Copied to Ceph - Backport #15728: jewel: osd: acting_primary not updated on splitResolvedAbhishek VarshneyActions
Copied to Ceph - Backport #15729: infernalis: osd: acting_primary not updated on splitRejectedActions
Copied to Ceph - Backport #15730: hammer: osd: acting_primary not updated on splitResolvedWei-Chung ChengActions
Actions #1

Updated by Sage Weil almost 8 years ago

maybe /a/teuthology-2016-04-24_22:00:02-rados-jewel-distro-basic-smithi/147520

Actions #2

Updated by Samuel Just almost 8 years ago

sjust@teuthology:/a/samuelj-2016-04-28_23:23:57-rados-wip-sam-testing-distro-basic-smithi/155488/remote also possibly

Actions #3

Updated by Sage Weil almost 8 years ago

  • Status changed from Need More Info to Pending Backport
  • Backport set to jewel,infernalis,hammer
Actions #4

Updated by Sage Weil almost 8 years ago

  • Subject changed from osd: info.stat.acting_primary corrupted? to osd: acting_primary not updated on split
Actions #5

Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #15728: jewel: osd: acting_primary not updated on split added
Actions #6

Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #15729: infernalis: osd: acting_primary not updated on split added
Actions #7

Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #15730: hammer: osd: acting_primary not updated on split added
Actions #8

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF