Project

General

Profile

Actions

Bug #4122

closed

osd: possible corruption of osd caps

Added by Josh Durgin about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
bobtail
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Three users have reported some kind of caps problem I haven't been able to reproduce (EPERM on some operation dependent on caps), all running bobtail. On IRC today, mjevans got osd logs
of this happening, and it appears the caps were corrupted:

2013-02-13 19:39:25.467916 7f766fdb4700 10 osd.0 10  session 0x2c8cc60 client.libvirt has caps osdcap[grant(object_prefix rbd^@children  class-read),grant(pool libvirt^@pool^@test rwx)] 'allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx'
2013-02-13 19:39:25.468162 7f76787c7700 20 osd.0 10 _dispatch 0x3960240 osd_op(client.4254.0:1 rbd_directory [read 0~0] 3.30a98c1c) v4
2013-02-13 19:39:25.468218 7f76787c7700 15 osd.0 10 require_same_or_newer_map 10 (i am 10) 0x3960240
2013-02-13 19:39:25.468223 7f76787c7700 20 osd.0 10 _share_map_incoming client.4254 10.x.y.z:0/1004293 10
2013-02-13 19:39:25.468232 7f76787c7700 15 osd.0 10 enqueue_op 0x334cb40 prio 63 cost 0 latency 0.000160 osd_op(client.4254.0:1 rbd_directory [read 0~0] 3.30a98c1c) v4
2013-02-13 19:39:25.468266 7f76737bd700 10 osd.0 10 dequeue_op 0x334cb40 prio 63 cost 0 latency 0.000195 osd_op(client.4254.0:1 rbd_directory [read 0~0] 3.30a98c1c) v4 pg pg[3.1c( v 10'2 (0'0,10'2] local-les=8 n=1 ec=4 les/c 8/8 7/7/7) [0] r=0 lpr=7 mlcod 10'2 active+degraded]
2013-02-13 19:39:25.468312 7f76737bd700 20 osd.0 pg_epoch: 10 pg[3.1c( v 10'2 (0'0,10'2] local-les=8 n=1 ec=4 les/c 8/8 7/7/7) [0] r=0 lpr=7 mlcod 10'2 active+degraded] op_has_sufficient_caps pool=3 (libvirt-pool-test) owner=0 need_read_cap=1 need_write_cap=0 need_class_read_cap=0 need_class_write_cap=0 -> NO

The later comments on https://bugs.launchpad.net/glance/+bug/1077045 may be the same problem as well.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #4287: Rbd volumes from pools with hyphen in name are not mappable if cephx is usedDuplicate02/27/2013

Actions
Actions #1

Updated by Ian Colle about 11 years ago

  • Assignee set to Josh Durgin
Actions #2

Updated by Sage Weil about 11 years ago

  • Category set to OSD
  • Status changed from New to Need More Info

That output line is right after OSDCap::parse() is called, and the result appears to be wrong. My guess is that there is some subtle problem with the version of boost you are using. I've added a unit test for exactly this case in 2ce28ef1d7f95e71e1043912dfa269ea3b0d1599 (current master) and it's passing on all of our build machines.

What OS are you running? Where did you get the packages? What is 'dpkg -l | grep boost' (or equivalent) say?

Actions #3

Updated by Sage Weil about 11 years ago

AHA, fails on quantal:

Testing input 'allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx'
test/osd/osdcap.cc:435: Failure
Value of: stringify(cap)
Actual: "osdcap[grant(object_prefix rbd\0children class-read),grant(pool libvirt\0pool\0test rwx)]" 
Expected: test_values[i].output
Which is: "osdcap[grant(object_prefix rbd_children class-read),grant(pool libvirt-pool-test rwx)]" 

http://gitbuilder.sepia.ceph.com/gitbuilder-quantal-amd64/log.cgi?log=2ce28ef1d7f95e71e1043912dfa269ea3b0d1599

Actions #4

Updated by Sage Weil about 11 years ago

  • Status changed from Need More Info to Fix Under Review

wip-4122

Actions #5

Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved
  • Backport set to bobtail
Actions

Also available in: Atom PDF