Project

General

Profile

Actions

Bug #6047

closed

mon: Assert and monitor-crash when attemting to create pool-snapshots while rbd-snapshots are in use or have been used on a pool

Added by Oliver Daudey over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While playing around on my test-cluster, I ran into a problem that I've seen before, but have never been able to reproduce until now. The use of pool-snapshots and rbd-snapshots seems to be mutually exclusive within the same pool, even if you have used one type of snapshot before and have since deleted all snapshots of that type. Unfortunately, the condition doesn't appear to be handled gracefully yet, leading, in one case, to monitors crashing. I think this one goes back at least as far as Bobtail and still exists in Dumpling. My cluster is a straightforward one with 3 Debian Squeeze-nodes, each running a mon, mds and osd. To reproduce:

  1. ceph osd pool create test 256 256
    pool 'test' created
  2. ceph osd pool mksnap test snapshot
    created pool test snap snapshot
  3. ceph osd pool rmsnap test snapshot
    removed pool test snap snapshot

So far, so good. Now we try to create an rbd-snapshot in the same pool:

  1. rbd --pool=test create --size=102400 image
  2. rbd --pool=test snap create image@snapshot
    rbd: failed to create snapshot: (22) Invalid argument
    2013-08-18 19:27:50.892291 7f983bc10780 -1 librbd: failed to create snap id: (22) Invalid argument

That failed, but at least the cluster is OK. Now we start over again and create the rbd-snapshot first:

  1. ceph osd pool delete test test --yes-i-really-really-mean-it
    pool 'test' deleted
  2. ceph osd pool create test 256 256
    pool 'test' created
  3. rbd --pool=test create --size=102400 image
  4. rbd --pool=test snap create image@snapshot
  5. rbd --pool=test snap ls image
    SNAPID NAME SIZE
    2 snapshot 102400 MB
  6. rbd --pool=test snap rm image@snapshot
  7. ceph osd pool mksnap test snapshot
    2013-08-18 19:35:59.494551 7f48d75a1700 0 monclient: hunting for new mon
    ^CError EINTR: (I pressed CTRL-C)

My leader monitor crashed at that last command, here's the apparent critical point in the logs:

3> 2013-08-18 19:35:59.315956 7f9b870b1700  1 - 194.109.43.18:6789/0 <== c
lient.5856 194.109.43.18:0/1030570 8 ==== mon_command({"snap": "snapshot", "pref
ix": "osd pool mksnap", "pool": "test"} v 0) v1 ==== 107+0+0 (1111983560 0 0) 0x23e4200 con 0x2d202c0
-2> 2013-08-18 19:35:59.316020 7f9b870b1700 0 mon.a@0(leader) e1 handle_command mon_command({"snap": "snapshot", "prefix": "osd pool mksnap", "pool": "test"} v 0) v1
-1> 2013-08-18 19:35:59.316033 7f9b870b1700 1 mon.a@0(leader).paxos(paxos active c 1190049..1190629) is_readable now=2013-08-18 19:35:59.316034 lease_expire=2013-08-18 19:36:03.535809 has v0 lc 1190629
0> 2013-08-18 19:35:59.317612 7f9b870b1700 -1 osd/osd_types.cc: In function 'void pg_pool_t::add_snap(const char*, utime_t)' thread 7f9b870b1700 time 2013-08-18 19:35:59.316102
osd/osd_types.cc: 682: FAILED assert(!is_unmanaged_snaps_mode())

Related issues 1 (0 open1 closed)

Related to Ceph - Fix #4635: mon: many ops expose uncommitted stateResolvedJoao Eduardo Luis04/02/2013

Actions
Actions

Also available in: Atom PDF