Project

General

Profile

Actions

Bug #3658

closed

osd/mon: stops processing pg stat messages

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

nuke-on-error: true
overrides:
  ceph:
    branch: next
    conf:
      client:
        debug ms: 1
        log max new: 1
        rbd cache: true
      global:
        ms inject socket failures: 5000
      osd:
        debug ms: 1
        debug osd: 20
    fs: ext4
    log-whitelist:
    - slow request
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 500

looks like osd is sending stat messages, at some point the mon stops receiving and acking them. eventually marks the osd down (after the 900s timeout)

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #3661: mon: idle/empty osds marked down after 15 minResolvedSage Weil12/20/2012

Actions
Actions #1

Updated by Sage Weil over 11 years ago

see /a/sage-ooo2, /a/sam-ooo3

Actions #2

Updated by Sage Weil over 11 years ago

  • Assignee set to Sage Weil
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Resolved

pretty sure this was caused by the log bug and 'log max new = 1', fixed by 50914e7a429acddb981bc3344f51a793280704e6.

still saw this on masseffect, tracking that in #3661

Actions

Also available in: Atom PDF