Bug #16005: rbd: client coredumps with io exceeding pool-quota after pool is already full - Ceph - Ceph

Actions

Copy link

Bug #16005

closed

rbd: client coredumps with io exceeding pool-quota after pool is already full

Added by shun song almost 8 years ago. Updated over 7 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v0.94.6

ceph-qa-suite:

rbd

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

description * when using rbd due to thin provision in ceph, it's easy to produce "image size is bigger than pool quota".as writing to image, if io offset exceeds pool quota but less than iamge size, this kind of operation in cli or lirbd will be allowed even after osdmap knows pool full. As a result, rbd client coredumps.
cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.0 (Maipo)
ceph -v
ceph version 0.94.7.1 (740be021de39c96a7ddfecae3482c177471798fc)
ceph osd pool set-quota xx6 max_bytes 60M
set-quota max_bytes = 60M for pool xx6
rbd create xx6/xx --size 1024
rbd info xx6/xx
rbd image 'xx':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.164fb6.238e1f29
format: 1
python rbdc.py ~~i /dev/zero -o rbd:xx6/xx 68157440(60M) <------~~ call rbd.Image.write(data,offset)
OK!
Copied 68157440 bytes.
Elapsed time: 0:00:04.432334.
rados -p xx6 ls
rb.0.164fb6.238e1f29.00000000000b
rb.0.164fb6.238e1f29.00000000000c
rb.0.164fb6.238e1f29.00000000000a
...
rbd_directory
xx.rbd
rb.0.164fb6.238e1f29.00000000000e
rb.0.164fb6.238e1f29.00000000000f
ceph -s
cluster 72add30f-e9fe-3934-0cb8-d6cd5d952f1e
health HEALTH_WARN
too many PGs per OSD (486 > max 300)
pool rbd has too few pgs
1 cache pools are missing hit_sets
pool 'xx6' is full
monmap e9: 3 mons at {192.9.9.82=192.9.9.82:6789/0,192.9.9.83=192.9.9.83:6789/0,192.9.9.84=192.9.9.84:6789/0}
election epoch 1244, quorum 0,1,2 192.9.9.82,192.9.9.83,192.9.9.84
osdmap e351: 16 osds: 16 up, 16 in
pgmap v8112: 3456 pgs, 7 pools, 288 MB data, 30037 objects
1736 MB used, 636 GB / 637 GB avail
3456 active+clean

python rbdc.py i /dev/zero -o rbd:xx6/xx 73400320(70M)
osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread 7f1757e65740 time 2016-05-20 11:06:11.138252
osdc/ObjectCacher.cc: 551: FAILED assert(i>empty())
ceph version 0.94.6.1 (941c7dcebfdb2fc612fe99f2b6e4a2cf36247408)
1: (()+0x175455) [0x7f172db72455]
2: (()+0x3babf1) [0x7f172ddb7bf1]
3: (()+0x680e6) [0x7f172da650e6]
4: (()+0x90354) [0x7f172da8d354]
5: (rbd_close()+0x2e) [0x7f172da3ecde]
6: (ffi_call_unix64()+0x4c) [0x7f174c883dac]
7: (ffi_call()+0x1f5) [0x7f174c8836d5]
8: (ctypes_callproc()+0x30b) [0x7f174ca96c8b]
9: (()+0xaa85) [0x7f174ca90a85]
10: (PyObject_Call()+0x43) [0x7f17578da073]
11: (PyEval_EvalFrameEx()+0x1d4c) [0x7f175796e34c]
12: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
13: (PyEval_EvalCodeEx()+0x7ed) [0x7f17579721ad]
14: (()+0x6f098) [0x7f17578ff098]
15: (PyObject_Call()+0x43) [0x7f17578da073]
16: (()+0x59085) [0x7f17578e9085]
17: (PyObject_Call()+0x43) [0x7f17578da073]
18: (PyObject_CallFunctionObjArgs()+0xbc) [0x7f17578da96c]
19: (PyEval_EvalFrameEx()+0x1674) [0x7f175796dc74]
20: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
21: (PyEval_EvalFrameEx()+0x4350) [0x7f1757970950]
22: (PyEval_EvalCodeEx()+0x7ed) [0x7f17579721ad]
23: (PyEval_EvalCode()+0x32) [0x7f17579722b2]
24: (()+0xfb6ef) [0x7f175798b6ef]
25: (PyRun_FileExFlags()+0x7e) [0x7f175798c8ae]
26: (PyRun_SimpleFileExFlags()+0xe9) [0x7f175798db39]
27: (Py_Main()+0xc9f) [0x7f175799eb3f]
28: (_libc_start_main()+0xf5) [0x7f1756bcbb25]
29: python() [0x400721]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

workaround
some commits have been posted to figure out it that are f8f33bcaa6111aa1bab1f8246b7d516a60ab23a8, 6eca240b26e112641caa7ed31a5ae58e1fefff26, fe54bd1bbe5714b2613876bce857234ede26f68a, but unluckily these commits were not backported to hammer.so i think it's necessary to backport these commits info hammer.