Bug #22113: osd: pg limit on replica test failure - RADOS - Ceph

Actions

Copy link

Bug #22113

closed

osd: pg limit on replica test failure

Added by Sage Weil over 6 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Kefu Chai

Category:

Target version:

% Done:

Source:

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700  1 -- 172.21.15.36:6806/23262 <== osd.0 172.21.15.36:6810/23263 12 ==== pg_log(1.0 epoch 60 log log((0'0,0'0], crt=0'0) pi ([13,58] intervals=([13,58] acting 0,1)) query_epoch 60) v5 ==== 1073+0+0 (2103684647 0 0) 0x55ef3937ce00 con 0x55ef3865ca00
remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700 15 osd.2 60 project_pg_history 1.0 from 60 to 60, start ec=13/13 lis/c 13/13 les/c/f 14/14/0 59/59/59
remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700 20 osd.2 60 _dispatch 0x55ef3937ce00 pg_log(1.0 epoch 60 log log((0'0,0'0], crt=0'0) pi ([13,58] intervals=([13,58] acting 0,1)) query_epoch 60) v5
remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700 20 osd.2 60 OSD::ms_dispatch: pg_log(1.0 epoch 60 log log((0'0,0'0], crt=0'0) pi ([13,58] intervals=([13,58] acting 0,1)) query_epoch 60) v5
remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700  5 osd.2 60 maybe_wait_for_max_pg withhold creation of pg 1.0: 1 >= 1
remote/smithi036/log/ceph-osd.2.log:2017-11-10 18:07:02.551 7fc704b83700  7 osd.2 60 handle_pg_log pg_log(1.0 epoch 60 log log((0'0,0'0], crt=0'0) pi ([13,58] intervals=([13,58] acting 0,1)) query_epoch 60) v5 from osd.0

but does not get created later.

/a/sage-2017-11-10_16:16:51-rados-wip-sage-testing-2017-11-10-0902-distro-basic-smithi/1835134

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Kefu Chai over 6 years ago

the reason why osd.2 didn't finish creating pg 1.0 is that after the unique_pool_14 (pool.15) is removed by mon.a, it didn't send the updated osdmap to osd.2, which was the primary osd of pg 14.0. and instead, it sent the osdmap.61 to a random osd: osd.1.

osd.2 didn't get the osdmap(61..) until the wait_for_clean timed out.

osd.2

2017-11-10 18:07:02.551 7fc704b83700  5 osd.2 60 maybe_wait_for_max_pg withhold creation of pg 1.0: 1 >= 1

2017-11-10 18:21:29.675 7fc705b85700 20 osd.2 60 _dispatch 0x55ef39334080 osd_map(61..62 src has 1..62) v4
2017-11-10 18:21:29.675 7fc705b85700  3 osd.2 60 handle_osd_map epochs [61,62], i have 60, src has [1,62]
2017-11-10 18:21:29.676 7fc70cbbd700 10 osd.2 60 _committed_osd_maps 61..62
2017-11-10 18:21:29.676 7fc70cbbd700  7 osd.2 62 consume_map version 62

mon.a

2017-11-10 18:07:17.569 7fa0e18f7700 10 mon.a@0(leader).osd e60 _prepare_remove_pool 15
..
2017-11-10 18:07:17.579 7fa0e40fc700 10 mon.a@0(leader).osd e60 encode_pending e 61
2017-11-10 18:07:17.581 7fa0dd0ee700  1 mon.a@0(leader).osd e61 e61: 4 total, 4 up, 3 in
2017-11-10 18:07:17.581 7fa0dd0ee700 10 mon.a@0(leader).osd e61 check_osdmap_subs
2017-11-10 18:07:17.581 7fa0dd0ee700 10 mon.a@0(leader).osd e61 check_osdmap_sub 0x562ae75a6d00 next 61 (onetime)
2017-11-10 18:07:17.581 7fa0dd0ee700  5 mon.a@0(leader).osd e61 send_incremental [61..61] to client.4097 172.21.15.36:0/1016650811
2017-11-10 18:07:17.581 7fa0dd0ee700 10 mon.a@0(leader).osd e61 build_incremental [61..61]
2017-11-10 18:07:17.581 7fa0dd0ee700 20 mon.a@0(leader).osd e61 build_incremental    inc 61 220 bytes
2017-11-10 18:07:17.581 7fa0dd0ee700  1 -- 172.21.15.36:6789/0 --> 172.21.15.36:0/1016650811 -- osd_map(61..61 src has 1..61) v4 -- 0x562ae7458a00 con 0

2017-11-10 18:07:17.581 7fa0dd0ee700 20 mon.a@0(leader).osd e61 check_pg_creates_sub .. osd.2 172.21.15.36:6805/23262
2017-11-10 18:07:17.581 7fa0dd0ee700 10 mon.a@0(leader).osd e61 committed, telling random osd.1 172.21.15.36:6801/23261 all about it

i think the current design works fine because objecter subscribes from mon continuously once it gets a fullmap. and if an OSD runs into a requests requires new osdmap, it will request from mon for a new map, neither does it hurt. even if an osd is out of sync when some of the pg(s) it serves does not exist anymore, it's fine. because the pg will get removed eventually, once the osd received the updated osdmap. just a matter of time.

but this design leads to a problem once the free-pg slots become a resource. we need to subscribe to the monitor continuously once there is any pending pg and stop doing so once all pending pgs are created.

Actions

Copy link