Project

General

Profile

Actions

Bug #12449

closed

ceph-osd core dumped when writing data to the backing storage pool which has a quota set on its cache pool

Added by runsisi hust over 8 years ago. Updated over 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have setup our cache tiering under the guidance of this link, with the only exception that we have also set a quota on the cache pool.

??[root@hust17 /home/runsisi]# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool max_bytes 20000000 stripe_width 0
pool 1 'cache' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 flags hashpspool,incomplete_clones max_bytes 100000000 tier_of 2 cache_mode writeback target_bytes 100000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
pool 2 'base' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 63 lfor 63 flags hashpspool tiers 1 read_tier 1 write_tier 1 stripe_width 0??

As you can see, the cache pool, whose name is "cache", has a 100MiB quota set on it, then when we use the rados utility to put a 200MB object to the backing pool, one of the three OSDs core dumped. This coredump won't happen if we clear the quota on the cache pool.

This can be reproduced easily as follows, and the attached file is the log of the core dumped OSD.

??[root@hust17 /home/runsisi]# ceph -s
    cluster cbc99ef9-fbc3-41ad-a726-47359f8d84b3
     health HEALTH_OK
     monmap e3: 3 mons at {ceph0=192.168.133.10:6789/0,ceph1=192.168.133.11:6789/0,ceph2=192.168.133.12:6789/0}
            election epoch 6, quorum 0,1,2 ceph0,ceph1,ceph2
     osdmap e132: 3 osds: 3 up, 3 in
      pgmap v413: 192 pgs, 3 pools, 4096 kB data, 1 objects
            323 MB used, 131 GB / 131 GB avail
                 192 active+clean
[root@hust17 /home/runsisi]# 
[root@hust17 /home/runsisi]# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool max_bytes 20000000 stripe_width 0
pool 1 'cache' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 flags hashpspool,incomplete_clones max_bytes 100000000 tier_of 2 cache_mode writeback target_bytes 100000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
pool 2 'base' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 63 lfor 63 flags hashpspool tiers 1 read_tier 1 write_tier 1 stripe_width 0

[root@hust17 /home/runsisi]# rados -p base ls
[root@hust17 /home/runsisi]# rados -p cache ls
[root@hust17 /home/runsisi]# ll -h 200m.dat 
-rw-r--r-- 1 root root 200M Jul 23 10:53 200m.dat
[root@hust17 /home/runsisi]# 
[root@hust17 /home/runsisi]# rados -p base put x1 200m.dat 
[root@hust17 /home/runsisi]# rados -p base put x1 200m.dat 
error putting base/x1: (28) No space left on device
[root@hust17 /home/runsisi]# rados -p base put x1 200m.dat 
2015-07-23 20:59:03.262865 7f937a0d7700  0 -- 192.168.133.1:0/1019253 >> 192.168.133.11:6800/20555 pipe(0x3009c90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x3003170).fault
[root@hust17 /home/runsisi]# ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.11998 root default                                     
-2 0.03999     host ceph2                                   
 0 0.03999         osd.0       up  1.00000          1.00000 
-3 0.03999     host ceph1                                   
 1 0.03999         osd.1     down  1.00000          1.00000 
-4 0.03999     host ceph0                                   
 2 0.03999         osd.2       up  1.00000          1.00000 
[root@hust17 /home/runsisi]# ceph --version
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)??

Files

ceph-osd.1.log.tar.gz (183 KB) ceph-osd.1.log.tar.gz log of the core dumped OSD runsisi hust, 07/23/2015 02:35 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #13098: OSD crashed when reached pool's max_bytes quotaResolved09/15/2015

Actions
Actions #1

Updated by runsisi hust over 8 years ago

Sorry for the inconvenience, how can i edit the issue description?

Actions #2

Updated by Kefu Chai over 8 years ago

Sorry for the inconvenience, how can i edit the issue description?

runsisi,

  1. click "Update" at the right side of the top banner.
  2. click "Description" (the small pencil icon) in the "Change properties" form.
Actions #3

Updated by Kefu Chai over 8 years ago

  • Description updated (diff)
Actions #4

Updated by Alexey Sheplyakov over 8 years ago

Looks like a duplicate of #13098 (or rather #13098 is a duplicate of this bug)

Actions #5

Updated by Loïc Dachary over 8 years ago

  • Status changed from New to Duplicate
Actions #6

Updated by Loïc Dachary over 8 years ago

  • Is duplicate of Bug #13098: OSD crashed when reached pool's max_bytes quota added
Actions

Also available in: Atom PDF