Project

General

Profile

Bug #13116

osd: pg stuck in replay

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

"description": "osd_op(mds.0.172:22092466 1000fb5427c.00000000 [create 0~0,setxattr parent (347)] 0.e463883e RETRY=6 ondisk+retry+write+known_if_redirected e648294)",
"initiated_at": "2015-09-15 19:42:36.227613",
"age": 35796.821774,
"duration": 210.344416,
"type_data": [
"delayed",
[ {
"time": "2015-09-15 19:42:36.227613",
"event": "initiated"
}, {
"time": "2015-09-15 19:42:36.377110",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:42:37.338774",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:42:38.260745",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:44:08.373928",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:44:08.541626",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:44:08.635922",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:44:53.262739",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:45:04.467969",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:46:06.571989",
"event": "reached_pg"
}, {
"time": "2015-09-15 19:46:06.572029",
"event": "waiting for replay end"
}
]
]

and now is Wed Sep 16 05:40:28 PDT 2015 >10 hrs later

pg appears to be still stuck in replay.


Related issues

Copied to Ceph - Backport #13620: osd: pg stuck in replay Resolved

Associated revisions

Revision d18cf51d (diff)
Added by Sage Weil about 7 years ago

osd: fix requeue of replay requests during activating

If the replay period expires while we are still in the activating
state, we can simply insert our list of requests at the front of
the waiting_for_active list.

Fixes: #13116
Signed-off-by: Sage Weil <>

Revision 9f3aebee (diff)
Added by Sage Weil about 7 years ago

osd: fix requeue of replay requests during activating

If the replay period expires while we are still in the activating
state, we can simply insert our list of requests at the front of
the waiting_for_active list.

Fixes: #13116
Signed-off-by: Sage Weil <>
(cherry picked from commit d18cf51d9419819cdda3782b188b010969288911)

History

#1 Updated by Sage Weil about 7 years ago

  • Subject changed from hammer: pg stuck in replay to osd: pg stuck in replay
  • Status changed from In Progress to 12
  • Assignee deleted (Sage Weil)
2015-09-15 7ff7eb936700 10 osd.79 pg_epoch: 648335 pg[0.83e( v 648179'693979 (623242'690950,648179'693979] local-les=633480 n=7438 ec=1 les/c 633480/636597 648320/648320/648294) [79,101,3] r=0 lpr=648320 pi=633391-648319/10 crt=648049'693976 lcod 0'0 mlcod 0'0 inactive] activate starting replay interval for 45 until 2015-09-15 19:45:49.466019
...
2015-09-15 19:45:50.351316 7ff80498a700 10 osd.79 648335 check_replay_queue pg[0.83e( v 648179'693979 (623242'690950,648179'693979] local-les=648335 n=7438 ec=1 les/c 633480/636597 648320/648320/648294) [79,101,3] r=0 lpr=648320 pi=633391-648319/10 crt=648049'693976 lcod 0'0 mlcod 0'0 activating+replay+degraded]

the pg isn't active, so it fails this test:
      dout(10) << "check_replay_queue " << *pg << dendl;
      if (pg->is_active() &&
          pg->is_replay() &&
          pg->is_primary() &&
          pg->replay_until == p->second) {
        pg->replay_queued_ops();
      }

we need to requeue everything on waiting_for_active, probably?

#2 Updated by Sage Weil about 7 years ago

To get good coverage of this case we shoudl set replay interval for test pools to something short (5 or 10 seconds)

#3 Updated by Sage Weil about 7 years ago

  • Status changed from 12 to Fix Under Review

#4 Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to 7

#5 Updated by Samuel Just about 7 years ago

  • Status changed from 7 to Resolved

#6 Updated by Greg Farnum about 7 years ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from Urgent to Normal

Can we get this put in hammer as well when it's convenient?

#7 Updated by Nathan Cutler about 7 years ago

  • Backport set to hammer

#8 Updated by Nathan Cutler about 7 years ago

#9 Updated by Loïc Dachary almost 7 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF