Bug #23447: collocated storage: losing one node leads to client write freeze 24 second - Ceph - Ceph

Actions

Copy link

Bug #23447

closed

collocated storage: losing one node leads to client write freeze 24 second

Added by Suvendu Mitra about 6 years ago. Updated about 6 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v12.2.2

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We are running ceph version 12.2.2, with bluestore with 3 Controller and 3 compute nodes. Ceph OSD and Ceph monitor's running collocated on controller.

A. Test details -
1. Launch a fedora-24 VM. start writing with dd on attached cinder volume. and other window monitor the resultant file size.
2. Restart one Controller Fri Mar 23 10:41:37 EET 2018 by issue "sudo reboot -f" from shell.

B. Observation
1. File size is constant for 14 sec, after that size grows

I have used several Ceph parameter e.g
osd heartbeat grace = 3, osd heartbeat interval = 2, mon lease = 1.0, mon election timeout = 2, osd mon heartbeat interval = 10

My observation sometime the client freeze less than 3 second but sometimes more than 20 second.

a. 1st try rebooting controller-2 -> 4 second write freeze
b. 2nd try rebooting controller-2 -> 2 second write freeze
c. 3rd time rebooting controller-3 -> 24 second write freeze

I have attached logs for "ceph -w" and ceph logs for all nodes

[root@controller-1 ~]# ceph -s
cluster:
id: 1c90ccdc-b322-4eb4-80af-7d8bca09206d
health: HEALTH_OK

services:
    mon: 3 daemons, quorum controller-1,controller-2,controller-3
    mgr: controller-1(active), standbys: controller-2, controller-3
    osd: 6 osds: 6 up, 6 in

data:
    pools:   4 pools, 736 pgs
    objects: 25744 objects, 100 GB
    usage:   209 GB used, 2025 GB / 2235 GB avail
    pgs:     736 active+clean

io:
    client:   5797 B/s wr, 0 op/s rd, 0 op/s wr

[root@controller-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.18271 root default
-3 0.72757 host controller-1
0 ssd 0.36378 osd.0 up 1.00000 1.00000
4 ssd 0.36378 osd.4 up 1.00000 1.00000
-5 0.72757 host controller-2
1 ssd 0.36378 osd.1 up 1.00000 1.00000
3 ssd 0.36378 osd.3 up 1.00000 1.00000
-7 0.72757 host controller-3
2 ssd 0.36378 osd.2 up 1.00000 1.00000
5 ssd 0.36378 osd.5 up 1.00000 1.00000
[root@controller-1 ~]# ceph mon stat
e1: 3 mons at {controller-1=192.168.1.23:6789/0,controller-2=192.168.1.24:6789/0,controller-3=192.168.1.25:6789/0}, election epoch 38, leader 0 controller-1, quorum 0,1,2 controller-1,controller-2,controller-3
[root@controller-1 ~]#

Files

Download all files

ceph_w_2.log (26.6 KB) ceph_w_2.log	Ceph -w when the test runs	Suvendu Mitra, 03/23/2018 09:05 AM
ceph_log_controller-1_latest.tar.gz (593 KB) ceph_log_controller-1_latest.tar.gz	Controller-1 log	Suvendu Mitra, 03/23/2018 09:09 AM
ceph_log_controller-2_latest.tar.gz (362 KB) ceph_log_controller-2_latest.tar.gz	Controleler-2 log	Suvendu Mitra, 03/23/2018 09:09 AM
ceph_log_controller-3_latest.tar.gz (405 KB) ceph_log_controller-3_latest.tar.gz	Controller-3 log	Suvendu Mitra, 03/23/2018 09:10 AM

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Status changed from New to Rejected

Please don't open duplicate tickets because you found a response unsatisfactory.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #23447

collocated storage: losing one node leads to client write freeze 24 second

Updated by Greg Farnum about 6 years ago