Bug #2311
rbd: delete + create image led to EEXIST
0%
Description
Here is a sequence copy-n-pasted:
rbd rm data/905-testdisk.rbd
Removing image: 100% complete...done.
rbd create --size 2048 data/905-testdisk.rbd
rbd rm data/906-testdisk.rbd
Removing image: 100% complete...done.
rbd create --size 2048 data/906-testdisk.rbd
2012-04-18 15:40:55.321093 7fbdcae3d760 librbd: rbd image header 906-testdisk.rbd.rbd already exists
create error: (17) File exists
Error 0 creating data/906-testdisk.rbd
So now I stopped testing. I should have attached two logfiles where former delete of 907-testdisk.rbd failed...
Hope it helps,
Oliver.
[see attachments on #2178]
Associated revisions
librbd: pass errors removing head back to user
In particular, the OSD may return EBUSY if there are still watchers.
Ignore ENOENT, as that may indicate we are cleaning up a previously
aborted removal.
Fixes: #2311
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
History
#1 Updated by Sage Weil over 11 years ago
Is it possible there is some other user, or the logs are from the wrong cluster?
I see:
- client.13507 deletes 906-testdisk.rbd (successfully)
- client.13508 creates 906-testdisk.rbd (presumably successfully)
I went looking for ways that 13508 could have gotten that error, and it is only generated at the beginning of the create sequence, before it does some subsequent work.. and I've verified that the subsequent work was done.
What commit are you running? Maybe i'm looking at the wrong version of the code.
#2 Updated by Sage Weil over 11 years ago
- Description updated (diff)
#3 Updated by Oliver Francke over 11 years ago
Congrats for closing the annoying ticket #2178 :-D
Fair enough, to have a new one on this issue, here my last notes from #2178:
---
Hi Sage,
sorry, was not clear enough. The logfiles provide informations for "907-testdisk.rbd..." not "906..."
The version was from the continued tests with former 0.44-2 wip-reorder...
Hope it helps,
Oliver.
#4 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved
this is 'rbd writeback window' at its best. long live 'rbd cache'!
#5 Updated by Oliver Francke over 11 years ago
Hi Sage,
uhm, not solved yet as per ceph version 0.45-207-g3053e47 (3053e4773bae93cfa3158882aa4963803862f9b2):
rbd rm data/907-testdisk.rbd
Removing image: 100% complete...done.
rbd create --size 2048 data/907-testdisk.rbd
create error: 2012-04-19 10:42:20.857085 7f5c140c5760 -1 librbd: rbd image header 907-testdisk.rbd.rbd already exists
(17) File exists
Error 0 creating data/907-testdisk.rbd
No entries in the logs.
Oliver.
#6 Updated by Sage Weil over 11 years ago
- Status changed from Resolved to In Progress
Can you generate a log? Ideally 'debug ms = 1'?
Also, attach the output of 'ceph --show-config'?
Thanks!
#7 Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Need More Info
#8 Updated by Oliver Francke over 11 years ago
- File ceph--show-config.log.bz2 added
- File ceph.osd.0-906-testdisk-exists.log.bz2 added
- File ceph.osd.1-906-testdisk-exists.log.bz2 added
Well, here we go with some output:
rbd create --size 2048 data/906-testdisk.rbd
create error: 2012-04-24 09:36:01.594379 7f53fc840760 -1 librbd: rbd image header 906-testdisk.rbd.rbd already exists(17) File exists
Error 0 creating data/906-testdisk.rbd
logfiles and --show-config output attached.
Hope this helps,
Oliver.
P.S.: Plz have a look at http://tracker.newdream.net/issues/2316, cause this one is a complete show-stopper for us. Thnx.
#9 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved
- Priority changed from Normal to High
Ah, the problem is that the rbd head object has watchers (it is mounted) and the delete request returned EBUSY, but librbd wasn't propogating the error. That second part is fixed... see #2339 for the osd piece.
#10 Updated by Oliver Francke over 11 years ago
- File ceph.osd.0-908-testdisk-exists.log.bz2 added
- File ceph.osd.1-908-testdisk-exists.log.bz2 added
Hi Sage,
yeah, this time might have been from a stale VM, but the other tests should have shown, that I normally stop all VM's. It happened there, too...
Ooops, just happened from a def. stopped VM, could have been stale though up to this morning, where I killed all qemu-processes...
Just restarted ceph with debug ms=1, perhaps s/t valuable is shown in the logs after trying to remove...?
Kind regards,
Oliver.