Project

General

Profile

Actions

Bug #1106

closed

crush/osd: inconsistent mapping values

Added by Sage Weil almost 13 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm getting different results for the crush mapping on different nodes. md5sum of the on-disk osdmaps match up. the weird thing is there isn't a consistenly wrong node.. they all just vary.

Most inputs don't seem to have the same problem...

root@peon5752:~# dsh -g osd grep handle_advance_map /var/log/ceph/osd\*.log | grep ' 11804 ' | grep 60\\.1
/var/log/ceph/osd.40.log:2011-05-20 17:47:01.379378 7f30565c3700 osd40 11804 pg[60.1( v 705'4 lc 0'0 (0'0,705'4] n=1 ec=705 les=11800 11801/11803/11801) [165]/[165,40] r=1 luod=0'0 active m=1] handle_advance_map [165]/[165,40]
/var/log/ceph/osd.61.log:2011-05-20 17:47:02.700614 7f580eb5d700 osd61 11804 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11349 11801/11803/11803) [40,165]/[165,40] r=-1 lcod 0'0 stray] handle_advance_map [40,165]/[165,40]
/var/log/ceph/osd.165.log:2011-05-20 17:47:01.374099 7fb3d9624700 osd165 11804 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11803 11801/11803/11801) [165]/[165,40] r=0 lcod 0'0 mlcod 0'0 !hml active+degraded] handle_advance_map [40,165]/[165,40]
root@peon5752:~# dsh -c -g osd grep handle_advance_map /var/log/ceph/osd\*.log | grep ' 11803 ' | grep 60\\.1
/var/log/ceph/osd.40.log:2011-05-20 17:47:00.339477 7f3056dc4700 osd40 11803 pg[60.1( v 705'4 lc 0'0 (0'0,705'4] n=1 ec=705 les=11800 11801/11801/11801) [165] r=-1 stray m=1] handle_advance_map [165]/[165,40]
/var/log/ceph/osd.165.log:2011-05-20 17:47:00.308151 7fb3d9624700 osd165 11803 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11800 11801/11801/11801) [165] r=0 lcod 0'0 mlcod 0'0 !hml crashed+degraded+peering] handle_advance_map [165]/[165,40]
/var/log/ceph/osd.61.log:2011-05-20 17:47:02.698845 7f580eb5d700 osd61 11803 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11349 11801/11801/11801) [40,165] r=-1 lcod 0'0 stray] handle_advance_map [40,165]/[165,40]
root@peon5752:~# dsh -c -g osd grep handle_advance_map /var/log/ceph/osd\*.log | grep ' 11802 ' | grep 60\\.1
/var/log/ceph/osd.165.log:2011-05-20 17:46:59.384890 7fb3d9624700 osd165 11802 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11800 11801/11801/11801) [165] r=0 lcod 0'0 mlcod 0'0 !hml crashed+degraded+peering] handle_advance_map [165]/[165]
/var/log/ceph/osd.40.log:2011-05-20 17:46:59.418198 7f3056dc4700 osd40 11802 pg[60.1( v 705'4 lc 0'0 (0'0,705'4] n=1 ec=705 les=11800 11801/11801/11801) [165] r=-1 stray m=1] handle_advance_map [165]/[165]
/var/log/ceph/osd.61.log:2011-05-20 17:47:02.697952 7f580eb5d700 osd61 11802 pg[60.1( v 705'4 (0'0,705'4] n=1 ec=705 les=11349 11801/11801/11801) [40,165] r=-1 lcod 0'0 stray] handle_advance_map [40,165]/[40,165]
Actions #1

Updated by Sage Weil almost 13 years ago

  • Status changed from New to Resolved

This was because crush max_devices was osdmap.max_osd - 1. Need to add some loud warnings and checks for this.

Actions #2

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (10)
  • Target version deleted (v0.29)
Actions

Also available in: Atom PDF