Support #13055: Problem with disconnect fuse by mds - CephFS - Ceph

Actions

Copy link

Support #13055

closed

Problem with disconnect fuse by mds

Added by Sergey Mir over 8 years ago. Updated over 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

Ceph - v0.94.3

Component(FS):

Labels (FS):

Pull request ID:

Description

Hello. Nobody knows answer in hex chat(ceph) about, so im asking advice here.
i have ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) (yet 0.94.2 at all osd hosts)
ceph's system is based on Linux 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The problem occurs not often with disconnecting one of clients mounted with fuse. That time i saw once that one of mds changed status and reconnects then, back to active state.
LAN is ok and 0 dropped packets for all the time, ports are opened 6800-7300 with iptables

part of logs by that time:
2015-09-11 12:26:29.496135 mds.1 (mon01/mds01_ip):6800/3114 154 : cluster [WRN] slow request 34.898807 seconds old, received at 2015-09-11 12:25:54.597245: client_request(client.448075:201 getattr pAsLsXsFs #20000042822 2015-09-11 12:25:54.596793) currently failed to rdlock, waiting
2015-09-11 12:26:33.495364 mds.0 (mon00/mds00_ip):6800/31743 126 : cluster [WRN] slow request 34.053063 seconds old, received at 2015-09-11 12:25:59.442271: client_request(client.448075:414 create #1000067c994/139ec6b20b48c01.jpg 2015-09-11 12:25:59.442081) currently failed to wrlock, waiting
...
2015-09-11 12:25:48.296273 7f0ee7c8b700 0 -- (mon01/mds01_ip):6800/3114 >> (client_ip):0/24507 pipe(0x2fb13a000 sd=32 :6800 s=2 pgs=2 cs=1 l=0 c=0x3042407e0).fault, server, going to standby
2015-09-11 12:26:29.496120 7f0eecaaf700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 34.898807 secs
2015-09-11 12:26:29.496131 7f0eecaaf700 0 log_channel(cluster) log [WRN] : slow request 34.898807 seconds old, received at 2015-09-11 12:25:54.597245: client_request(client.448075:201 getattr pAsLsXsFs #20000042822 2015-09-11 12:25:54.596793) currently failed to rdlock, waiting
...
2015-09-11 12:25:48.296009 7fb118c05700 0 -- (mon/mds0_ip):6800/31743 >> (client_ip):0/24507 pipe(0x17ce10000 sd=30 :6800 s=2 pgs=3 cs=1 l=0 c=0xc55162c0).fault with nothing to send, going to standby
...
2015-09-11 12:26:33.495358 7fb11d423700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 34.053063 secs
2015-09-11 12:26:33.495362 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 34.053063 seconds old, received at 2015-09-11 12:25:59.442271: client_request(client.448075:414 create #1000067c994/139ec6b20b48c01.jpg 2015-09-11 12:25:59.442081) currently failed to wrlock, waiting
...
2015-09-11 12:26:38.517722 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 32.712400 seconds old, received at 2015-09-11 12:26:05.805263: client_request(client.272050:5206204 create #10000a7cd09/4cd4ac4f2eab213.png 2015-09-11 12:26:05.813635) currently failed to wrlock, waiting
2015-09-11 12:26:43.691435 7fb11d423700 0 log_channel(cluster) log [INF] : closing stale session client.450156 (client_ip):0/24507 after 63.252966

~ceph -s
monmap e1: 3 mons at {mon00=(mon00_ip):6789/0,mon01=(mon01_ip):6789/0,mon02=(mon02_ip):6789/0}
election epoch 84, quorum 0,1,2 mon00,mon01,mon02
mdsmap e792: 2/2/2 up {0=mon00=up:active,1=mon01=up:active}, 1 up:standby
osdmap e1109: 12 osds: 12 up, 12 in
pgmap v1135961: 2048 pgs, 2 pools, 1197 GB data, 21693 kobjects
4126 GB used, 5776 GB / 10432 GB avail
2048 active+clean

here is my ceph config:
[global]
fsid = 47d3c0df-4ba9-4c53-9e58-7f6a59d170d6
mon_initial_members = mon00, mon01, mon02
mon_host = *******************
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 3 # Write an object 3 times.
osd_pool_min_size = 2 # Allow writing two copies in a degraded state.
osd_pool_default_pg_num = 800
osd_pool_default_pgp_num = 800
log_file = /var/log/ceph/$cluster-$name.log
osd_mkfs_type = ext4
osd_mount_options_ext4 = user_xattr,rw,noatime,nodiratime
max_mds = 2

[mon]
mon_clock_drift_allowed = 0.3
mon_osd_full_ratio = .95
mon_osd_nearfull_ratio = .80
mon_pg_warn_max_per_osd = 0

[osd]
osd_journal_size = 10240
public_network = ***/26
cluster_network = 10.10.7.0/24
osd_op_threads = 4
osd_disk_threads = 2
osd_recovery_max_active = 3

[osd.0]
host = osd00
public_addr = ***
cluster_addr = 10.10.7.10
devs = /dev/sda3

[osd.1]
host = osd00
public_addr = ***
cluster_addr = 10.10.7.10
devs = /dev/sdb3

[osd.2]
host = osd01
public_addr = ***
cluster_addr = 10.10.7.11
devs = /dev/sda3

[osd.3]
host = osd01
public_addr = ***
cluster_addr = 10.10.7.11
devs = /dev/sdb3

[osd.4]
host = osd02
public_addr = ***
cluster_addr = 10.10.7.12
devs = /dev/sda3

[osd.5]
host = osd02
public_addr = ***
cluster_addr = 10.10.7.12
devs = /dev/sdb3

[osd.6]
host = osd03
public_addr = ***
cluster_addr = 10.10.7.13
devs = /dev/sda3

[osd.7]
host = osd03
public_addr = ***
cluster_addr = 10.10.7.13
devs = /dev/sdb3

[osd.8]
host = osd04
public_addr = ***
cluster_addr = 10.10.7.14
devs = /dev/sda3

[osd.9]
host = osd04
public_addr = ***
cluster_addr = 10.10.7.14
devs = /dev/sdb3

[osd.10]
host = osd05
public_addr = ***
cluster_addr = 10.10.7.15
devs = /dev/sda3

[osd.11]
host = osd05
public_addr = ***
cluster_addr = 10.10.7.15
devs = /dev/sdb3

[mds]
mds_cache_size = 4000000
mds_session_autoclose = 60
mds_bal_frag = true
mds_bal_mode = 1

[mds.a]
host = mon00

[mds.b]
host = mon01

[mds.c]
host = mon02

[client]
keyring = /etc/ceph/ceph.client.admin.keyring
log_file = /var/log/ceph/$name.$pid.log

p.s. another question - about using few mds at one time:
should i set them in a production? only one active mds got always 1,5x-2x more inodes that set max by config and expired are much more so configuration with 2 of 3 mds is now workable,but how to make it work and load balance better i dont know...

Thx.

Files

Download all files

1.doc (5.89 KB) 1.doc		Sergey Mir, 09/18/2015 06:16 AM
0001-client-debug-ll_get_inode.patch (935 Bytes) 0001-client-debug-ll_get_inode.patch		Zheng Yan, 09/21/2015 03:05 AM
gdb (1.22 KB) gdb		Sergey Mir, 09/21/2015 01:25 PM
log (9.16 KB) log		Sergey Mir, 09/21/2015 01:25 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Support #13055

Problem with disconnect fuse by mds

Updated by Greg Farnum over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by John Spray over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Greg Farnum over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago