Feature #10198
PG removal occupy the disk thread several hours
0%
Description
We found an issue after we enable scrubbing/deep-scrubbing when doing recovering. The phenomenon is that all the radosgw threads get hung gradually in few hours, so all the requests get 500 error eventually.
After we turn on scrubing/deep-scrubibing, there are some .rgw.bucket.index pool's PGs starting to scrub/deep-scrub. But during the
whole time of scrubbing those objects, they can't be written. And for our case, those objects here are bucket index objects, so all the writes to those buckets will be blocked at updating bucket index info.
chunky scrub will exit temporarily to wait for replicas to build scrub map. When replicas finishes the job, it will send sub_op_scrub_map OP back to primary OSD. This OP will be processed by op_tp firstly and then queued into scrub_wq, so disk_tp can continue the left work.
Our problem is that after sub_op_scrub_map OP is processed and queued into scrub_wq, disk_tp doesn't continue to process for about 1.5 hrs. So all the writes to those buckets become slow requests, radosgw threads start to hang, eventually spread to all the radosgw threads.
Once chunk scrub exits, disk_tp turns to process other events. For our case, it is processing pg removing event of EC pool. Unfortunately it
takes very long time to process such event, because in pg removing, it seems to loop all the objects under each directory, but 32 to 64 one time.
One simple solution is to use different thread pool to process those different events which might impact each other?
History
#1 Updated by Zhi Zhang about 6 years ago
scrub exits to wait for replicas to build scrub map: ------------------------------------------------ 2014-11-27 04:52:11.336699 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] starting a new chunky scrub 2014-11-27 04:52:11.336735 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub start 2014-11-27 04:52:11.336758 7f288658e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-469) collection_list_partial: 4.65b_head 2014-11-27 04:52:11.336778 7f288658e700 20 _collection_list_partial 0//0//-1/0/0 5-25 ls.size 0 ... 2014-11-27 04:52:11.337033 7f288658e700 20 prefixes B5608CEF,B5624762,B563D6FD,B56830C7,B5689025,B56AA0EE,B56B1F95,B56B2A1D,B5E35DE5,B5E5B3E6,B5E86E85,B5E8A3D5,B5EC883E,B5ECADEE,B5ED8453,B5EEFABE 2014-11-27 04:52:11.337101 7f288658e700 20 filestore(/home/y/var/lib/ceph/osd/ceph-469) objects: [fec8065b/.dir.default.506102.4121/head//4,2674265b/.dir.default.1177618.958/head//4,df6d365b/.dir.default.1177618.284/head//4,7c03865b/.dir.default.1150130.745/head//4,5209865b/.dir.default.1177609.149/head//4,ee0aa65b/.dir.default.579855.7389/head//4,59f1b65b/.dir.default.1177618.1303/head//4,d1a2b65b/.dir.default.1150124.690/head//4,5ed53e5b/.dir.default.1150127.1797/head//4,6e3b5e5b/.dir.default.341342.2359/head//4,58e68e5b/.dir.default.1177618.297/head//4,5d3a8e5b/.dir.default.506102.318/head//4,e388ce5b/.dir.default.1177612.690/head//4,eedace5b/.dir.default.1177612.21/head//4,3548de5b/.dir.default.7291.5020/head//4,ebafee5b/.dir.default.29329.1440/head//4] 2014-11-27 04:52:11.339286 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub requesting scrubmap from osd.99 2014-11-27 04:52:11.339339 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub requesting scrubmap from osd.459 2014-11-27 04:52:11.339367 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] build_scrub_map 2014-11-27 04:52:11.339379 7f288658e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-469) collection_list_partial: 4.65b_head 2014-11-27 04:52:11.339384 7f288658e700 20 _collection_list_partial 0//0//-1 32-64 ls.size 0 2014-11-27 04:52:11.339588 7f288658e700 20 prefixes B5608CEF,B5624762,B563D6FD,B56830C7,B5689025,B56AA0EE,B56B1F95,B56B2A1D,B5E35DE5,B5E5B3E6,B5E86E85,B5E8A3D5,B5EC883E,B5ECADEE,B5ED8453,B5EEFABE 2014-11-27 04:52:11.339654 7f288658e700 20 filestore(/home/y/var/lib/ceph/osd/ceph-469) objects: [fec8065b/.dir.default.506102.4121/head//4,2674265b/.dir.default.1177618.958/head//4,df6d365b/.dir.default.1177618.284/head//4,7c03865b/.dir.default.1150130.745/head//4,5209865b/.dir.default.1177609.149/head//4,ee0aa65b/.dir.default.579855.7389/head//4,59f1b65b/.dir.default.1177618.1303/head//4,d1a2b65b/.dir.default.1150124.690/head//4,5ed53e5b/.dir.default.1150127.1797/head//4,6e3b5e5b/.dir.default.341342.2359/head//4,58e68e5b/.dir.default.1177618.297/head//4,5d3a8e5b/.dir.default.506102.318/head//4,e388ce5b/.dir.default.1177612.690/head//4,eedace5b/.dir.default.1177612.21/head//4,3548de5b/.dir.default.7291.5020/head//4,ebafee5b/.dir.default.29329.1440/head//4] 2014-11-27 04:52:11.339710 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] _scan_list scanning 16 objects deeply ... 2014-11-27 04:52:21.859629 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] build_scrub_map_chunk done. 2014-11-27 04:52:21.859720 7f288658e700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] wait for replicas to build scrub map ------------------------------------------------ osd_sub_scrub_map OP comes back quickly: ------------------------------------------------ 2014-11-27 04:52:21.893323 7f2888592700 10 osd.469 43799 dequeue_op 0x1730f4a0 prio 127 cost 6420 latency 7.496690 osd_sub_op(unknown.0.0:0 4.65b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 pg pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] ... 2014-11-27 04:52:21.893511 7f2888592700 10 osd.469 pg_epoch: 43797 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] map version is 43775'1003452 ... 2014-11-27 04:52:21.893534 7f2888592700 10 osd.469 43799 dequeue_op 0x1730f4a0 finish 2014-11-27 04:52:21.903364 7f2889594700 10 osd.469 43799 dequeue_op 0x117d6d20 prio 127 cost 6420 latency 6.399532 osd_sub_op(unknown.0.0:0 4.65b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 pg pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] ... 2014-11-27 04:52:21.903423 7f2889594700 10 osd.469 pg_epoch: 43799 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] got 459 scrub map ... 014-11-27 04:52:21.903483 7f2889594700 10 osd.469 pg_epoch: 43799 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] map version is 43775'1003452 2014-11-27 04:52:21.903495 7f2889594700 10 osd.469 43799 dequeue_op 0x117d6d20 finish ... ------------------------------------------------ disk_tp to process pg removing event: ------------------------------------------------ 2014-11-27 04:52:11.336758 7f288658e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-469) collection_list_partial: 4.65b_head 2014-11-27 04:52:11.336778 7f288658e700 20 _collection_list_partial 0//0//-1/0/0 5-25 ls.size 0 2014-11-27 04:52:11.337033 7f288658e700 20 prefixes B5608CEF,B5624762,B563D6FD,B56830C7,B5689025,B56AA0EE,B56B1F95,B56B2A1D,B5E35DE5,B5E5B3E6,B5E86E85,B5E8A3D5,B5EC883E,B5ECADEE,B5ED8453,B5EEFABE ... 2014-11-27 06:30:04.060618 7f288658e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-469) collection_list_partial: 3.264s0_head 2014-11-27 06:30:04.060629 7f288658e700 20 _collection_list_partial 3f78e264/default.1177618.14_mon03c008.cos.gq1.yahoo.com_fe4a94032b486ec449788c1cc6265024/head//3/18446744073709551615/0 32-64 ls.size 0 2014-11-27 06:30:04.060688 7f288658e700 20 prefixes 4 2014-11-27 06:30:04.060709 7f288658e700 20 prefixes 46 2014-11-27 06:30:04.060726 7f288658e700 20 prefixes 462 2014-11-27 06:30:04.060758 7f288658e700 20 prefixes 462E 2014-11-27 06:30:04.060810 7f288658e700 20 prefixes 462E8,462E9,462EA,462EB,462EC,462ED,462EE,462EF ... ------------------------------------------------ disk_up continues to process scrub: ------------------------------------------------ 2014-11-27 06:34:46.196360 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub_compare_maps has maps, analyzing 2014-11-27 06:34:46.196475 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub comparing replica scrub maps 2014-11-27 06:34:46.196488 7f288658e700 2 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub osd.469 has 16 items 2014-11-27 06:34:46.196500 7f288658e700 2 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] scrub replica 99 has 16 items ... 2014-11-27 06:34:46.196655 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] be_select_auth_object: selecting osd 459 for obj fec8065b/.dir.default.506102.4121/head//4 2014-11-27 06:34:46.196669 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] be_select_auth_object: selecting osd 469 for obj fec8065b/.dir.default.506102.4121/head//4 ... 2014-11-27 06:34:46.197471 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] be_select_auth_object: selecting osd 469 for obj ebafee5b/.dir.default.29329.1440/head//4 2014-11-27 06:34:46.197494 7f288658e700 2 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] 2014-11-27 06:34:46.197506 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] _scrub 2014-11-27 06:34:46.197580 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] _scrub (deep-scrub) finish 2014-11-27 06:34:46.198070 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] deep-scrub got 16/16 objects, 0/0 clones, 16/16 dirty, 16/16 omap, 0/0 hit_set_archive, 0/0 bytes. 2014-11-27 06:34:46.198089 7f288658e700 10 osd.469 pg_epoch: 43935 pg[4.65b( v 43775'1003452 (43547'1000451,43775'1003452] local-les=43311 n=16 ec=457 les/c 43311/43311 43260/43310/43310) [469,459,99] r=0 lpr=43310 crt=43769'1003450 lcod 43775'1003451 mlcod 43775'1003451 active+clean+scrubbing+deep] process_inconsistent() checking authoritative 2014-11-27 06:34:46.198102 7f288658e700 0 log [INF] : 4.65b deep-scrub ok ------------------------------------------------
#2 Updated by Samuel Just about 6 years ago
- Project changed from Ceph to rgw
Radosgw creates unbounded size index objects, will change eventually.
#3 Updated by Guang Yang about 6 years ago
Samuel Just wrote:
Radosgw creates unbounded size index objects, will change eventually.
Hi Sam,
It is more like a general problem at OSD side rather than a rgw problem. In short, the problem is that a single object could be non-writable for a couple of hours because the thread taking responsibility to mark it writable was stuck doing pg remove (which is time consuming), not because it is really not writable.
I am thinking about two possibilities to fix:
1. Pull the time critical section into op thread (make the object as writable)
2. Use a dedicated thread pool to handle pg removing (and other time consuming work load)
#4 Updated by Samuel Just about 6 years ago
In that case, two things:
1) move scrubbing into the OpWQ (I'm working on that one)
2) restructure pg removal to only remove N objects a time and requeue (at the front of the removal queue)
#5 Updated by Guang Yang about 6 years ago
- Subject changed from Radosgw threads get hung when hitting a paticular pattern at peering, remapping and deep-scrubbing to PG removal occupy the disk thread several hours
#6 Updated by Sage Weil almost 6 years ago
- Project changed from rgw to Ceph
#7 Updated by Guang Yang almost 6 years ago
Samuel Just wrote:
In that case, two things:
2) restructure pg removal to only remove N objects a time and requeue (at the front of the removal queue)
#8 Updated by Sage Weil almost 6 years ago
- Priority changed from Normal to High
#9 Updated by Samuel Just almost 6 years ago
- Tracker changed from Bug to Feature
- Target version set to v0.94
#10 Updated by Samuel Just almost 6 years ago
- Target version changed from v0.94 to v0.95
#11 Updated by Sage Weil almost 6 years ago
- Target version changed from v0.95 to v9.0.2
#12 Updated by Samuel Just over 5 years ago
- Status changed from New to Resolved