Project

General

Profile

Actions

Bug #16005

closed

rbd: client coredumps with io exceeding pool-quota after pool is already full

Added by shun song almost 8 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  1. description * when using rbd due to thin provision in ceph, it's easy to produce "image size is bigger than pool quota".as writing to image, if io offset exceeds pool quota but less than iamge size, this kind of operation in cli or lirbd will be allowed even after osdmap knows pool full. As a result, rbd client coredumps.
  2. cat /etc/redhat-release
    Red Hat Enterprise Linux Server release 7.0 (Maipo)
  3. ceph -v
    ceph version 0.94.7.1 (740be021de39c96a7ddfecae3482c177471798fc)
  4. ceph osd pool set-quota xx6 max_bytes 60M
    set-quota max_bytes = 60M for pool xx6
  5. rbd create xx6/xx --size 1024
  6. rbd info xx6/xx
    rbd image 'xx':
    size 1024 MB in 256 objects
    order 22 (4096 kB objects)
    block_name_prefix: rb.0.164fb6.238e1f29
    format: 1
  7. python rbdc.py i /dev/zero -o rbd:xx6/xx 68157440(60M) <------ call rbd.Image.write(data,offset)
    OK!
    Copied 68157440 bytes.
    Elapsed time: 0:00:04.432334.
  8. rados -p xx6 ls
    rb.0.164fb6.238e1f29.00000000000b
    rb.0.164fb6.238e1f29.00000000000c
    rb.0.164fb6.238e1f29.00000000000a
    ...
    rbd_directory
    xx.rbd
    rb.0.164fb6.238e1f29.00000000000e
    rb.0.164fb6.238e1f29.00000000000f
  9. ceph -s
    cluster 72add30f-e9fe-3934-0cb8-d6cd5d952f1e
    health HEALTH_WARN
    too many PGs per OSD (486 > max 300)
    pool rbd has too few pgs
    1 cache pools are missing hit_sets
    pool 'xx6' is full
    monmap e9: 3 mons at {192.9.9.82=192.9.9.82:6789/0,192.9.9.83=192.9.9.83:6789/0,192.9.9.84=192.9.9.84:6789/0}
    election epoch 1244, quorum 0,1,2 192.9.9.82,192.9.9.83,192.9.9.84
    osdmap e351: 16 osds: 16 up, 16 in
    pgmap v8112: 3456 pgs, 7 pools, 288 MB data, 30037 objects
    1736 MB used, 636 GB / 637 GB avail
    3456 active+clean
  1. python rbdc.py i /dev/zero -o rbd:xx6/xx 73400320(70M)
    osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7f1757e65740 time 2016-05-20 11:06:11.138252
    osdc/ObjectCacher.cc: 551: FAILED assert(i
    >empty())
    ceph version 0.94.6.1 (941c7dcebfdb2fc612fe99f2b6e4a2cf36247408)
    1: (()+0x175455) [0x7f172db72455]
    2: (()+0x3babf1) [0x7f172ddb7bf1]
    3: (()+0x680e6) [0x7f172da650e6]
    4: (()+0x90354) [0x7f172da8d354]
    5: (rbd_close()+0x2e) [0x7f172da3ecde]
    6: (ffi_call_unix64()+0x4c) [0x7f174c883dac]
    7: (ffi_call()+0x1f5) [0x7f174c8836d5]
    8: (ctypes_callproc()+0x30b) [0x7f174ca96c8b]
    9: (()+0xaa85) [0x7f174ca90a85]
    10: (PyObject_Call()+0x43) [0x7f17578da073]
    11: (PyEval_EvalFrameEx()+0x1d4c) [0x7f175796e34c]
    12: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
    13: (PyEval_EvalCodeEx()+0x7ed) [0x7f17579721ad]
    14: (()+0x6f098) [0x7f17578ff098]
    15: (PyObject_Call()+0x43) [0x7f17578da073]
    16: (()+0x59085) [0x7f17578e9085]
    17: (PyObject_Call()+0x43) [0x7f17578da073]
    18: (PyObject_CallFunctionObjArgs()+0xbc) [0x7f17578da96c]
    19: (PyEval_EvalFrameEx()+0x1674) [0x7f175796dc74]
    20: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
    21: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
    22: (PyEval_EvalCodeEx()+0x7ed) [0x7f17579721ad]
    23: (PyEval_EvalCode()+0x32) [0x7f17579722b2]
    24: (()+0xfb6ef) [0x7f175798b6ef]
    25: (PyRun_FileExFlags()+0x7e) [0x7f175798c8ae]
    26: (PyRun_SimpleFileExFlags()+0xe9) [0x7f175798db39]
    27: (Py_Main()+0xc9f) [0x7f175799eb3f]
    28: (
    _libc_start_main()+0xf5) [0x7f1756bcbb25]
    29: python() [0x400721]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    terminate called after throwing an instance of 'ceph::FailedAssertion'
    Aborted (core dumped)
  1. workaround
    some commits have been posted to figure out it that are f8f33bcaa6111aa1bab1f8246b7d516a60ab23a8, 6eca240b26e112641caa7ed31a5ae58e1fefff26, fe54bd1bbe5714b2613876bce857234ede26f68a, but unluckily these commits were not backported to hammer.so i think it's necessary to backport these commits info hammer.
Actions #1

Updated by Samuel Just over 7 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Nathan Cutler over 7 years ago

Two questions:

1. should this be in rbd tracker?
2. is this a duplicate of #12018 ?

Actions #3

Updated by Samuel Just over 7 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF