Project

General

Profile

Actions

Bug #12181

closed

test: indep mapping fails because an osd is down

Added by Loïc Dachary almost 9 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When running test-erasure-code.sh the mapping of pg 2.7 fails:

$ gzip -d < /tmp/bad-report.txt.gz | jq '.pgmap.pg_stats[] | select(.state != "active+clean") | [.pgid, .acting]'
[
  "2.7",
  [
    2147483647,
    0,
    4
  ]
]
<pre>
because it misses osd.6 as shown by a report from a good run of the same test:
</pre>
$ gzip -d < /tmp/good-report.txt.gz | jq '.pgmap.pg_stats[] | select(.pgid == "2.7") | .acting'
[
  6,
  0,
  4
]

and the osd map shows it as out/down
gzip -d < /tmp/bad-report.txt.gz | jq '.osdmap.osds[] | select(.osd == 6)'
{
  "osd": 6,
  "uuid": "913da64e-3527-4d06-9441-62e8d1145356",
  "up": 0,
  "in": 0,
  "weight": 0,
  "primary_affinity": 1,
  "last_clean_begin": 0,
  "last_clean_end": 0,
  "up_from": 26,
  "up_thru": 0,
  "down_at": 28,
  "lost_at": 0,
  "public_addr": "127.0.0.1:6889/29036",
  "cluster_addr": "127.0.0.1:6890/29036",
  "heartbeat_back_addr": "127.0.0.1:6891/29036",
  "heartbeat_front_addr": "127.0.0.1:6892/29036",
  "state": [
    "autoout",
    "exists" 
  ]
}

nothing in the bad.log.gz explains why the osd.6 has failed. It could just be the host running the test failing although dmesg did not show any sign of memory starvation or disk troubles.


Files

bad.log.gz (59.1 KB) bad.log.gz bad.log Loïc Dachary, 06/27/2015 04:29 PM
bad-report.txt.gz (7.29 KB) bad-report.txt.gz ceph report for the bad run Loïc Dachary, 06/27/2015 04:29 PM
good-report.txt.gz (7.28 KB) good-report.txt.gz ceph report for the good run Loïc Dachary, 06/27/2015 04:29 PM
Actions

Also available in: Atom PDF