Project

General

Profile

Actions

Bug #13372

closed

librbd: object map invalid but not flagged as such

Added by Josh Durgin over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2015-10-03_23:00:08-rbd-infernalis-distro-basic-multi/1087438/

Reproduced by running the same job, and paused to be able to examine the images.

During the rbd cp from testimg1@snap to testimg2, there are some warnings about object map:

2015-10-05T16:53:12.606 INFO:tasks.workunit.client.0.burnupi10.stderr:+ rbd copy testimg1 --snap=snap1 testimg2
2015-10-05T16:53:12.898 INFO:tasks.workunit.client.0.burnupi10.stderr:^MImage copy: 1% complete...^MImage copy: 3% complete...^MImag
e copy: 4% complete...^MImage copy: 6% complete...^MImage copy: 7% complete...^MImage copy: 9% complete...^MImage copy: 10% complete
...^MImage copy: 12% complete...^MImage copy: 14% complete...^MImage copy: 15% complete...^MImage copy: 17% complete...^MImage copy:
 18% complete...^MImage copy: 20% complete...^MImage copy: 21% complete...^MImage copy: 23% complete...^MImage copy: 25% complete...
^MImage copy: 26% complete...^MImage copy: 28% complete...^MImage copy: 29% complete...^MImage copy: 31% complete...^MImage copy: 32
% complete...^MImage copy: 34% complete...^MImage copy: 35% complete...^MImage copy: 37% complete...^MImage copy: 39% complete...^MI
mage copy: 40% complete...^MImage copy: 42% complete...^MImage copy: 43% complete...^MImage copy: 45% complete...^MImage copy: 46% c
omplete...^MImage copy: 48% complete...^MImage copy: 50% complete...^MImage copy: 51% complete...^MImage copy: 53% complete...^MImag
e copy: 54% complete...^MImage copy: 56% complete...^MImage copy: 57% complete...^MImage copy: 59% complete...^MImage copy: 60% comp
lete...^MImage copy: 62% complete...^MImage copy: 64% complete...^MImage copy: 65% complete...^MImage copy: 67% complete...^MImage c
opy: 68% complete...^MImage copy: 70% complete...^MImage copy: 71% complete...^MImage copy: 73% complete...^MImage copy: 75% complet
e...^MImage copy: 76% complete...^MImage copy: 78% complete...^MImage copy: 79% complete...^MImage copy: 81% complete...^MImage copy
: 82% complete...^MImage copy: 84% complete...^MImage copy: 85% complete...^MImage copy: 87% complete...^MImage copy: 89% complete..
.^MImage copy: 90% complete...^MImage copy: 92% complete...^MImage copy: 93% complete...^MImage copy: 95% complete...^MImage copy: 9
6% complete...^MImage copy: 98% complete...^MImage copy: 100% complete...2015-10-05 16:53:12.890085 7fcd0f7fe700 -1 librbd::ObjectMa
p: object map lock not owned by client
2015-10-05T16:53:12.901 INFO:tasks.workunit.client.0.burnupi10.stderr:2015-10-05 16:53:12.893055 7fcd0f7fe700 -1 librbd::ObjectMap: 
object map lock not owned by client
2015-10-05T16:53:12.902 INFO:tasks.workunit.client.0.burnupi10.stderr:2015-10-05 16:53:12.894065 7fcd0f7fe700 -1 librbd::ObjectMap: 
object map lock not owned by client
2015-10-05T16:53:13.037 INFO:tasks.workunit.client.0.burnupi10.stderr:^MImage copy: 100% complete...done.

The rados objects 0 and 2 for both testimg1 and testimg2 match, but the exported versions differ - testimg2 is all zeroes, suggesting an invalid object map that was not marked invalid. The cluster is still up for further investigation on burnupi10 currently.


Related issues 1 (0 open1 closed)

Copied to rbd - Backport #13461: librbd: object map invalid but not flagged as suchResolvedJason DillamanActions
Actions #1

Updated by Josh Durgin over 8 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Jason Dillaman over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
Actions #3

Updated by Jason Dillaman over 8 years ago

The lock is held by the OSD!

# rados -p rbd lock info rbd_object_map.12d323e2e3c0 rbd_lock
{"name":"rbd_lock","type":"exclusive","tag":"","lockers":[{"name":"osd.0","cookie":"","description":"","expiration":"0.000000","addr":"10.214.134.32:0\/12881"}]}

Client requests lock, the assert_lock fails because the locker is incorrectly listed as the OSD, and the unlock fails for the same reason.

ceph-osd.0.log:2015-10-05 16:53:12.764864 7fcfeaeff700  1 -- 10.214.134.32:6800/12881 --> 10.214.134.32:0/3469838287 -- osd_op_reply(35 rbd_object_map.12d323e2e3c0 [call lock.lock] v0'0 uv1022 ondisk = 0) v6 -- ?+0 0x7fd015207b80 con 0x7fd014cca840

ceph-osd.0.log:2015-10-05 16:53:12.890261 7fcfeef07700  1 -- 10.214.134.32:6800/12881 --> 10.214.134.32:0/3469838287 -- osd_op_reply(41 rbd_object_map.12d323e2e3c0 [call lock.assert_locked,call rbd.object_map_update] v0'0 uv0 ondisk = -16 ((16) Device or resource busy)) v6 -- ?+0 0x7fd013c76680 con 0x7fd014cca840

ceph-osd.0.log:2015-10-05 16:53:13.005884 7fcfeef07700  1 -- 10.214.134.32:6800/12881 --> 10.214.134.32:0/3469838287 -- osd_op_reply(47 rbd_object_map.12d323e2e3c0 [call lock.unlock] v0'0 uv1743 ondisk = -2 ((2) No such file or directory)) v6 -- ?+0 0x7fd014cdcc00 con 0x7fd014cca840

This was a cache tier test case, so this is most likely another proxied write issue being discovered.

Actions #4

Updated by Jason Dillaman over 8 years ago

Related to OSD issue: #13380

Actions #5

Updated by Jason Dillaman over 8 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to hammer
Actions #6

Updated by Jason Dillaman over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Jason Dillaman over 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF