Project

General

Profile

Actions

Bug #3515

closed

mon: segfault when 'crush set' with different buckets with the same name

Added by Joao Eduardo Luis over 11 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Initially reported by user 'LeaChim' in IRC.

Tracked it down to CrushWrapper::insert_item() that will eventually enter in an infinite (afaict) recursion in CrushWrapper::adjust_item_weight().

Take for instance the following example:

ceph osd crush set 0 osd.0 1.0 rack=a root=a

It will initially trigger the creation of a new bucket 'a'. The recursion will happen when taking care of adding the item to the already existing bucket 'a'.

2012-11-21 06:53:09.793660 7f80a93c2700  0 mon.a@0(leader).osd e15 adding/updating crush item id 0 name 'osd.0' weight 1 at location {rack=a,root=a}
2012-11-21 06:53:09.793701 7f80a93c2700  5 update_item item 0 weight 1 name osd.0 loc {rack=a,root=a}
2012-11-21 06:53:09.793706 7f80a93c2700  5 check_item_loc item 0 loc {rack=a,root=a}
2012-11-21 06:53:09.793708 7f80a93c2700  2 warning: did not specify location for 'host' level (levels are {0=osd,1=host,2=rack,3=row,4=room,5=datacenter,6=root})
2012-11-21 06:53:09.793712 7f80a93c2700  5 check_item_loc bucket a dne
2012-11-21 06:53:09.793713 7f80a93c2700  5 remove_item 0
2012-11-21 06:53:09.793715 7f80a93c2700  5 adjust_item_weight 0 weight 0
2012-11-21 06:53:09.793717 7f80a93c2700  5 adjust_item_weight 0 diff -65536
2012-11-21 06:53:09.793718 7f80a93c2700  5 adjust_item_weight -2 weight 65536
2012-11-21 06:53:09.793719 7f80a93c2700  5 adjust_item_weight -2 diff -65536
2012-11-21 06:53:09.793721 7f80a93c2700  5 adjust_item_weight -3 weight 65536
2012-11-21 06:53:09.793722 7f80a93c2700  5 adjust_item_weight -3 diff -65536
2012-11-21 06:53:09.793723 7f80a93c2700  5 adjust_item_weight -1 weight 131072
2012-11-21 06:53:09.793725 7f80a93c2700  5 remove_device removing item 0 from bucket -2
2012-11-21 06:53:09.793728 7f80a93c2700  5 update_item adding 0 weight 1 at {rack=a,root=a}
2012-11-21 06:53:09.793731 7f80a93c2700  5 insert_item item 0 weight 1 name osd.0 loc {rack=a,root=a}
2012-11-21 06:53:09.793743 7f80a93c2700  2 warning: did not specify location for 'host' level (levels are {0=osd,1=host,2=rack,3=row,4=room,5=datacenter,6=root})
2012-11-21 06:53:09.793750 7f80a93c2700  5 insert_item creating bucket a
2012-11-21 06:53:09.793755 7f80a93c2700  2 warning: did not specify location for 'row' level (levels are {0=osd,1=host,2=rack,3=row,4=room,5=datacenter,6=root})
2012-11-21 06:53:09.793762 7f80a93c2700  2 warning: did not specify location for 'room' level (levels are {0=osd,1=host,2=rack,3=row,4=room,5=datacenter,6=root})
2012-11-21 06:53:09.793767 7f80a93c2700  2 warning: did not specify location for 'datacenter' level (levels are {0=osd,1=host,2=rack,3=row,4=room,5=datacenter,6=root})
2012-11-21 06:53:09.793773 7f80a93c2700  5 insert_item adding -4 weight 1 to bucket -4
2012-11-21 06:53:09.793781 7f80a93c2700  5 adjust_item_weight 0 weight 65536
2012-11-21 06:53:09.793785 7f80a93c2700  5 adjust_item_weight 0 diff 65536
2012-11-21 06:53:09.793790 7f80a93c2700  5 adjust_item_weight -4 weight 65536
2012-11-21 06:53:09.793792 7f80a93c2700  5 adjust_item_weight -4 diff 65536
2012-11-21 06:53:09.793794 7f80a93c2700  5 adjust_item_weight -4 weight 131072

[continues until the monitor segfaults]

This won't however happen if the buckets already exist:

ubuntu@plana41:~/master-ceph/src$ ./ceph osd crush set 0 osd.0 1.0 root=default rack=default
updated item id 0 name 'osd.0' weight 1 at location {rack=default,root=default} to crush map
ubuntu@plana41:~/master-ceph/src$ ./ceph osd tree

# id    weight    type name    up/down    reweight
-1    3    root default
-3    1        rack localrack
-2    1            host localhost
1    1                osd.1    up    1    
2    1        osd.2    down    0    
0    1        osd.0    up    1    

This is how far I've tracked it. Will put it on hold to deal with pending stuff.

Actions #1

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Resolved
Actions #2

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (10)
Actions

Also available in: Atom PDF