Support #13055: Problem with disconnect fuse by mds - CephFS - Ceph

Actions

Copy link

Support #13055

closed

Problem with disconnect fuse by mds

Added by Sergey Mir over 8 years ago. Updated over 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

Ceph - v0.94.3

Component(FS):

Labels (FS):

Pull request ID:

Description

Hello. Nobody knows answer in hex chat(ceph) about, so im asking advice here.
i have ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) (yet 0.94.2 at all osd hosts)
ceph's system is based on Linux 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The problem occurs not often with disconnecting one of clients mounted with fuse. That time i saw once that one of mds changed status and reconnects then, back to active state.
LAN is ok and 0 dropped packets for all the time, ports are opened 6800-7300 with iptables

part of logs by that time:
2015-09-11 12:26:29.496135 mds.1 (mon01/mds01_ip):6800/3114 154 : cluster [WRN] slow request 34.898807 seconds old, received at 2015-09-11 12:25:54.597245: client_request(client.448075:201 getattr pAsLsXsFs #20000042822 2015-09-11 12:25:54.596793) currently failed to rdlock, waiting
2015-09-11 12:26:33.495364 mds.0 (mon00/mds00_ip):6800/31743 126 : cluster [WRN] slow request 34.053063 seconds old, received at 2015-09-11 12:25:59.442271: client_request(client.448075:414 create #1000067c994/139ec6b20b48c01.jpg 2015-09-11 12:25:59.442081) currently failed to wrlock, waiting
...
2015-09-11 12:25:48.296273 7f0ee7c8b700 0 -- (mon01/mds01_ip):6800/3114 >> (client_ip):0/24507 pipe(0x2fb13a000 sd=32 :6800 s=2 pgs=2 cs=1 l=0 c=0x3042407e0).fault, server, going to standby
2015-09-11 12:26:29.496120 7f0eecaaf700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 34.898807 secs
2015-09-11 12:26:29.496131 7f0eecaaf700 0 log_channel(cluster) log [WRN] : slow request 34.898807 seconds old, received at 2015-09-11 12:25:54.597245: client_request(client.448075:201 getattr pAsLsXsFs #20000042822 2015-09-11 12:25:54.596793) currently failed to rdlock, waiting
...
2015-09-11 12:25:48.296009 7fb118c05700 0 -- (mon/mds0_ip):6800/31743 >> (client_ip):0/24507 pipe(0x17ce10000 sd=30 :6800 s=2 pgs=3 cs=1 l=0 c=0xc55162c0).fault with nothing to send, going to standby
...
2015-09-11 12:26:33.495358 7fb11d423700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 34.053063 secs
2015-09-11 12:26:33.495362 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 34.053063 seconds old, received at 2015-09-11 12:25:59.442271: client_request(client.448075:414 create #1000067c994/139ec6b20b48c01.jpg 2015-09-11 12:25:59.442081) currently failed to wrlock, waiting
...
2015-09-11 12:26:38.517722 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 32.712400 seconds old, received at 2015-09-11 12:26:05.805263: client_request(client.272050:5206204 create #10000a7cd09/4cd4ac4f2eab213.png 2015-09-11 12:26:05.813635) currently failed to wrlock, waiting
2015-09-11 12:26:43.691435 7fb11d423700 0 log_channel(cluster) log [INF] : closing stale session client.450156 (client_ip):0/24507 after 63.252966

~ceph -s
monmap e1: 3 mons at {mon00=(mon00_ip):6789/0,mon01=(mon01_ip):6789/0,mon02=(mon02_ip):6789/0}
election epoch 84, quorum 0,1,2 mon00,mon01,mon02
mdsmap e792: 2/2/2 up {0=mon00=up:active,1=mon01=up:active}, 1 up:standby
osdmap e1109: 12 osds: 12 up, 12 in
pgmap v1135961: 2048 pgs, 2 pools, 1197 GB data, 21693 kobjects
4126 GB used, 5776 GB / 10432 GB avail
2048 active+clean

here is my ceph config:
[global]
fsid = 47d3c0df-4ba9-4c53-9e58-7f6a59d170d6
mon_initial_members = mon00, mon01, mon02
mon_host = *******************
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 3 # Write an object 3 times.
osd_pool_min_size = 2 # Allow writing two copies in a degraded state.
osd_pool_default_pg_num = 800
osd_pool_default_pgp_num = 800
log_file = /var/log/ceph/$cluster-$name.log
osd_mkfs_type = ext4
osd_mount_options_ext4 = user_xattr,rw,noatime,nodiratime
max_mds = 2

[mon]
mon_clock_drift_allowed = 0.3
mon_osd_full_ratio = .95
mon_osd_nearfull_ratio = .80
mon_pg_warn_max_per_osd = 0

[osd]
osd_journal_size = 10240
public_network = ***/26
cluster_network = 10.10.7.0/24
osd_op_threads = 4
osd_disk_threads = 2
osd_recovery_max_active = 3

[osd.0]
host = osd00
public_addr = ***
cluster_addr = 10.10.7.10
devs = /dev/sda3

[osd.1]
host = osd00
public_addr = ***
cluster_addr = 10.10.7.10
devs = /dev/sdb3

[osd.2]
host = osd01
public_addr = ***
cluster_addr = 10.10.7.11
devs = /dev/sda3

[osd.3]
host = osd01
public_addr = ***
cluster_addr = 10.10.7.11
devs = /dev/sdb3

[osd.4]
host = osd02
public_addr = ***
cluster_addr = 10.10.7.12
devs = /dev/sda3

[osd.5]
host = osd02
public_addr = ***
cluster_addr = 10.10.7.12
devs = /dev/sdb3

[osd.6]
host = osd03
public_addr = ***
cluster_addr = 10.10.7.13
devs = /dev/sda3

[osd.7]
host = osd03
public_addr = ***
cluster_addr = 10.10.7.13
devs = /dev/sdb3

[osd.8]
host = osd04
public_addr = ***
cluster_addr = 10.10.7.14
devs = /dev/sda3

[osd.9]
host = osd04
public_addr = ***
cluster_addr = 10.10.7.14
devs = /dev/sdb3

[osd.10]
host = osd05
public_addr = ***
cluster_addr = 10.10.7.15
devs = /dev/sda3

[osd.11]
host = osd05
public_addr = ***
cluster_addr = 10.10.7.15
devs = /dev/sdb3

[mds]
mds_cache_size = 4000000
mds_session_autoclose = 60
mds_bal_frag = true
mds_bal_mode = 1

[mds.a]
host = mon00

[mds.b]
host = mon01

[mds.c]
host = mon02

[client]
keyring = /etc/ceph/ceph.client.admin.keyring
log_file = /var/log/ceph/$name.$pid.log

p.s. another question - about using few mds at one time:
should i set them in a production? only one active mds got always 1,5x-2x more inodes that set max by config and expired are much more so configuration with 2 of 3 mds is now workable,but how to make it work and load balance better i dont know...

Thx.

Files

Download all files

1.doc (5.89 KB) 1.doc		Sergey Mir, 09/18/2015 06:16 AM
0001-client-debug-ll_get_inode.patch (935 Bytes) 0001-client-debug-ll_get_inode.patch		Zheng Yan, 09/21/2015 03:05 AM
gdb (1.22 KB) gdb		Sergey Mir, 09/21/2015 01:25 PM
log (9.16 KB) log		Sergey Mir, 09/21/2015 01:25 PM

Actions

Copy link

Updated by Greg Farnum over 8 years ago

The log snippets you've pasted don't make much sense — you have some entries going back in time, and it looks like they're from several different files mixed together? All I can tell from them is that
1) there is a stuck request because it can't get locks on the inode in question
2) the TCP connection between the MDS and client went away. That can happen if it's idle, or if something in the network goes wrong. It's not clear if this connection is associated with the stuck requests or not, but given the stale client it looks like they're separate.

Do you have any actual issues you're seeing, or did the log output just look scary?

Actions

Copy link

Updated by Sergey Mir over 8 years ago

fuse loose connections few times a day with output in logs as i pasted at my previous message from few different log files.
Nothing else strange i cannot find for that time in logs...
i have found a few interesting lines in logs with previous disconnect from yesterday:
2015-09-10 13:11:16.753957 mds.0 [WRN] client.343916 isn't responding to mclientcaps(revoke), ino 10001833980 pending pAsLsXsFc issued pAsLsXsFscr, sent 60.182784 seconds ago
2015-09-10 13:11:17.068839 mds.0 [INF] closing stale session client.343916 (client_ip):0/8623 after 64.626196

about 1) - stuck request? im using an ssd, how it can be with perfect working lan?

i dont see in logs any actual info about problem so i dont know what could it be...

Actions

Copy link

Updated by Sergey Mir over 8 years ago

Need some solution to solve that problem. it drops again...

2015-09-11 15:42:57.450895 7fb118d06700 0 -- (mds00/mon00_ip):6800/31743 >> (client_ip):0/18991 pipe(0x325cea000 sd=29 :6800 s=2 pgs=3 cs=1 l=0 c=0xc5519860).fault, server, going to standby
--(some other unuseful info)--
2015-09-11 15:43:42.247256 7fb11d423700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 33.250719 secs
2015-09-11 15:43:42.247260 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 33.250719 seconds old, received at 2015-09-11 15:43:08.996500: client_request(client.272050:5721632 create #1000024e29b/08087b200911bff.jpg 2015-09-11 15:43:09.009381) currently failed to wrlock, waiting
2015-09-11 15:43:42.247264 7fb11d423700 0 log_channel(cluster) log [WRN] : slow request 32.796975 seconds old, received at 2015-09-11 15:43:09.450245: client_request(client.459540:496 getattr pAsLsXsFs #1000024e29b 2015-09-11 15:43:09.450498) currently failed to rdlock, waiting

Actions

Copy link

Updated by John Spray over 8 years ago

We need to know how the system is misbehaving -- not just to see log snippets.

What is the symptom of the problem from the point of view of an application using a client mount?

Actions

Copy link

Updated by Sergey Mir over 8 years ago

fuse is just disconnecting from ceph and says "transported point is not connected" and thats all, no logs or some other info...
so in overall:
1. i see in ceph logs error about slow requests and disconnect client
2. mounting with fuse again
3. waiting for another disconnect with tail -f ceph-mds.mon00.log| grep pipe

i want to repeat - there isn't any dropped packets at all. Any thoughts?

Actions

Copy link

Updated by Zheng Yan over 8 years ago

2015-09-10 13:11:17.068839 mds.0 [INF] closing stale session client.343916 (client_ip):0/8623 after 64.626196

MDS is supposed to close session after 300 seconds, but in your case, it's just 60 seconds. Did you change default values of mds_session_autoclose/mds_session_timeout/mds_reconnect_timeout

Actions

Copy link

Updated by Zheng Yan over 8 years ago

Sergey Mir wrote:

fuse is just disconnecting from ceph and says "transported point is not connected" and thats all, no logs or some other info...

"transported point is not connected" should be ceph-fuse crash. (If just session get closed, operation should hang instead of returning this error) Can you find any clue in ceph-fuse.log?

Actions

Copy link

Updated by Sergey Mir over 8 years ago

Zheng Yan,
yes,as i pasted earlier in my config, i have changed it to mds_session_autoclose = 60

Actions

Copy link

Updated by Greg Farnum over 8 years ago

"Transport endpoint is not connected" almost always means that the ceph-fuse client process has gone away. Check dmesg and see if it's getting killed by OOM-killer. Check the logs and see if it's got a backtrace in the file.

Actions

Copy link

#10

Updated by Sergey Mir over 8 years ago

dmesg and other clients logs are empty...
i have 5 clients with mounted to ceph.
3 of them are using more often(main load with high IOPS) and these connects are often going down as i spoke earlier.
2 of last using more rarely and fuse dont disconnecting...

Actions

Copy link

#11

Updated by Sergey Mir over 8 years ago

Zheng Yan wrote:
Can you find any clue in ceph-fuse.log?

there is no ceph-fuse.log on clients...
fuse have same version as current ceph - 0.94.3

Actions

Copy link

#12

Updated by Sergey Mir over 8 years ago

and about high load... i've checked mds nodes for memory overload or some other physical overload - didnt find anything exceeding normal state

Actions

Copy link

#13

Updated by Zheng Yan over 8 years ago

when "transported point is not connected" happen, could you check if the ceph-fuse process still exists. If not, run 'ulimit -c unlimited' before mounting cephfs. It will kernel generate coredump when ceph-fuse crashes

Actions

Copy link

#14

Updated by Sergey Mir over 8 years ago

ceph-fuse process still works after "disconnect" so i should make umount and only then mount to back to work...

Actions

Copy link

#15

Updated by Sergey Mir over 8 years ago

hello, is there any way to fix the problem?

Actions

Copy link

#16

Updated by Zheng Yan over 8 years ago

Sergey Mir wrote:

hello, is there any way to fix the problem?

try "ceph daemon client.xxx kick_stale_sessions"

Actions

Copy link

#17

Updated by Zheng Yan over 8 years ago

Sergey Mir wrote:

ceph-fuse process still works after "disconnect" so i should make umount and only then mount to back to work...

do you mean "the ceph-fuse is still running after Transport endpoint is not connected'?

Actions

Copy link

#18

Updated by Zheng Yan over 8 years ago

Sergey Mir wrote:

Zheng Yan wrote:
Can you find any clue in ceph-fuse.log?

there is no ceph-fuse.log on clients...
fuse have same version as current ceph - 0.94.3

the log should be located at /var/log/ceph/. named as client-xxx.xxx.log. could you double check

Actions

Copy link

#19

Updated by Sergey Mir over 8 years ago

there is no such log files on the client side where ceph-fuse is using...
as i understand logging is turned off, so how can i turn it on? i cannot find in the web such info...

Actions

Copy link

#20

Updated by Zheng Yan over 8 years ago

make sure /var/log/ceph directory exist and have proper permission. add "debug client = 20" to client section of ceph.conf. you can also pass "--log_file=xxx --debug_client=20" options to ceph-fuse.

Actions

Copy link

#21

Updated by Sergey Mir over 8 years ago

Thx,for now i need to wait next disconnect of fuse to look the message log...
Besides,i have changed filestore queue max ops to 25000 from 50 (i am using ssd), still no disconnects...could it be the reason of my disconnects?

Actions

Copy link

#22

Updated by Zheng Yan over 8 years ago

Sergey Mir wrote:

Thx,for now i need to wait next disconnect of fuse to look the message log...
Besides,i have changed filestore queue max ops to 25000 from 50 (i am using ssd), still no disconnects...could it be the reason of my disconnects?

no, I don't think so

Actions

Copy link

#23

Updated by Sergey Mir over 8 years ago

i have found the SAME problem at ceph tracker: http://tracker.ceph.com/issues/12297
logs are the same as into the 12297
have you fixed it already?

Actions

Copy link

#24

Updated by Zheng Yan over 8 years ago

Status changed from New to 4

Yes, we have fixed it. But the fix was not included in any release (We fix it after releasing v0.94.3), You need to clone the ceph git repository, checkout to hammer branch and compile the source code.

Actions

Copy link

#25

Updated by Sergey Mir over 8 years ago

thx. topic can be closed

Actions

Copy link

#26

Updated by Zheng Yan over 8 years ago

Status changed from 4 to Closed

Actions

Copy link

#27

Updated by Sergey Mir over 8 years ago

Greetings.
We have the same problem after compiling and merging from git. Still ceph-fuse disconnects. after some time i will attach core dump and some logs.

Actions

Copy link

#28

Updated by Sergey Mir over 8 years ago

File 1.doc 1.doc added

http://148.251.180.58/files/cores/core.ceph-fuse.15542.tar.gz - core dump
attached file-log

Actions

Copy link

#29

Updated by Zheng Yan over 8 years ago

coredump file is useless without execute file. could you use gdb to check the coredump file and send backtrace to us ("gdb ceph-fuse corefile", then type "bt")

Actions

Copy link

#30

Updated by Sergey Mir over 8 years ago

here it is:

(gdb) bt
#0??0x00007f9afcad779b in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1??0x0000000000624e0d in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2??handle_fatal_signal (signum=11) at global/signal_handler.cc:109
#3??<signal handler called>
#4??Client::_get_vino (this=this@entry=0x343d800, in=in@entry=0x0) at client/Client.cc:4403
#5??0x00000000005904de in Client::ll_getattr (this=0x343d800, in=in@entry=0x0, attr=attr@entry=0x7f9ac13f5c10, uid=33, gid=33) at client/Client.cc:8366
#6??0x00000000005489bd in fuse_ll_getattr (req=0x7f99c41c3f90, ino=<optimized out>, fi=<optimized out>) at client/fuse_ll.cc:137
#7??0x00007f9afcf0fb45 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#8??0x00007f9afcf1022b in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#9??0x00007f9afcf0ce49 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#10 0x00007f9afcad00a4 in start_thread (arg=0x7f9ac13f6700) at pthread_create.c:309
#11 0x00007f9afb460c2d in setns () at ../sysdeps/unix/syscall-template.S:81
#12 0x0000000000000000 in ?? ()

Actions

Copy link

#31

Updated by Zheng Yan over 8 years ago

File 0001-client-debug-ll_get_inode.patch 0001-client-debug-ll_get_inode.patch added

We never saw this backtrace before. Which version of kernel are you using? Please apply the attached debug patch to the source code and re-compile ceph. When ceph-fuse crash again, please send full log to us.

Actions

Copy link

#32

Updated by Sergey Mir over 8 years ago

We use clients with kernel version 3.13.0 (current client host which got that error have 3.13.0-58-generic, others have 3.13.0-61 etc)

bt*
#0??0x00007fed666ff20b in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1??0x0000000000624e0d in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2??handle_fatal_signal (signum=11) at global/signal_handler.cc:109
#3??<signal handler called>
#4??Inode::get (this=this@entry=0x7fec41d602b0) at ./log/SubsystemMap.h:62
#5??0x000000000054f879 in Client::_ll_get (this=this@entry=0x317ad10, in=0x7fec41d602b0) at client/Client.cc:8272
#6??0x000000000058df68 in Client::ll_lookup (this=0x317ad10, parent=parent@entry=0x7fec53f65340, name=name@entry=0x7fec6039abe8 "57db77a3fa10db0.jpg", attr=attr@entry=0x7fed4150dbf0,
????out=out@entry=0x7fed4150dbd8, uid=<optimized out>, gid=33) at client/Client.cc:8232
#7??0x0000000000548ab0 in fuse_ll_lookup (req=0x7fec841ff5a0, parent=<optimized out>, name=0x7fec6039abe8 "57db77a3fa10db0.jpg") at client/fuse_ll.cc:105
#8??0x00007fed66b3822b in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#9??0x00007fed66b34e49 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#10 0x00007fed666f7182 in start_thread (arg=0x7fed4150e700) at pthread_create.c:312
#11 0x00007fed6507c47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

50> 2015-09-21 15:51:49.841804 7fed5b7fe700 10 client.478534 add_update_cap issued pAsLsXsFs -> pAsLsXsFs from mds.0 on 1000167aa3a.head(ref=3 ll_ref=29 cap_refs={} open={} mode=42777 size=0/0 mtime=2015-09-01 17:18:36.671240 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7fec53a18030 0x7fec53f65340)
?? -49> 2015-09-21 15:51:49.841825 7fed5b7fe700 20 client.478534 got dirfrag map for 1000167aa3a frag * to mds -1
?? -48> 2015-09-21 15:51:49.841831 7fed5b7fe700 12 client.478534 insert_dentry_inode '57db77a3fa10db0.jpg' vino 100016e3c65.head in dir 1000167aa3a.head dn 0
?? -47> 2015-09-21 15:51:49.841836 7fed5b7fe700 15 client.478534 link dir 0x7fec53f65340 '57db77a3fa10db0.jpg' to inode 0x7fec41d602b0 dn 0x7fec41afa200 (new dn)
?? -46> 2015-09-21 15:51:49.841839 7fed5b7fe700 20 client.478534 link??inode 0x7fec41d602b0 parents now 0x7fec41afa200
?? -45> 2015-09-21 15:51:49.841842 7fed5b7fe700 10 client.478534 put_inode on 100016e3c65.head(ref=2 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fec41afa200 0x7fec41d602b0)
?? -44> 2015-09-21 15:51:49.841858 7fed5b7fe700 20 client.478534 handle_client_reply signalling caller 0x7fed4150d670
?? -43> 2015-09-21 15:51:49.841867 7fed5b7fe700 20 client.478534 handle_client_reply awaiting kickback on tid 424256 0x7fed5b7fd770
?? -42> 2015-09-21 15:51:49.841880 7fed0cff2700 20 client.478534 _ll_get 0x7fec725c36f0 10000295f60 -> 41
?? -41> 2015-09-21 15:51:49.841893 7fed0cff2700??3 client.478534 ll_open 10000295f60.head 32768
?? -40> 2015-09-21 15:51:49.841899 7fed0cff2700 10 client.478534 check_caps on 10000295f60.head(ref=2 ll_ref=41 cap_refs={1024=0,2048=0} open={1=1} mode=100644 size=17139/0 mtime=2015-02-09 14:19:01.455722 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[10000295f60 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fec41ca2670 0x7fec725c36f0) wanted pFscr used - is_delayed=1
?? -39> 2015-09-21 15:51:49.841929 7fed0cff2700 10 client.478534??cap mds.0 issued pAsLsXsFscr implemented pAsLsXsFscr revoking -
?? -38> 2015-09-21 15:51:49.841942 7fed0cff2700 10 client.478534 _create_fh 10000295f60 mode 1
?? -37> 2015-09-21 15:51:49.841947 7fed0cff2700 20 client.478534 trim_cache size 16397 max 16384
?? -36> 2015-09-21 15:51:49.841950 7fed0cff2700 15 client.478534 trim_dentry unlinking dn e39d0c68b624830.jpg in dir 10000634b8a
?? -35> 2015-09-21 15:51:49.841953 7fed0cff2700 15 client.478534 unlink dir 0x7fed4ca10570 'e39d0c68b624830.jpg' dn 0x7fec41cb9cb0 inode 0x7fec41d5cd60
?? -34> 2015-09-21 15:51:49.841957 7fed0cff2700 20 client.478534 unlink??inode 0x7fec41d5cd60 parents now
?? -33> 2015-09-21 15:51:49.841959 7fed0cff2700 10 client.478534 put_inode on 10000641c70.head(ref=2 ll_ref=2 cap_refs={} open={} mode=100777 size=4750/0 mtime=2013-03-26 08:42:14.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[10000641c70 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d5cd60)
?? -32> 2015-09-21 15:51:49.842033 7fed0cff2700 15 client.478534 unlink??removing 'e39d0c68b624830.jpg' dn 0x7fec41cb9cb0
?? -31> 2015-09-21 15:51:49.842038 7fed0cff2700 15 client.478534 close_dir dir 0x7fec41b734b0 on 0x7fed4ca10570
?? -30> 2015-09-21 15:51:49.842041 7fed0cff2700 10 client.478534 put_inode on 10000634b8a.head(ref=3 ll_ref=1103 cap_refs={} open={} mode=42777 size=0/0 mtime=2015-09-21 13:42:47.503566 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7fed4c9fba10 0x7fed4ca10570)
?? -29> 2015-09-21 15:51:49.842057 7fed0cff2700 15 client.478534 trim_dentry unlinking dn watermark.png in dir 100002941e7
?? -28> 2015-09-21 15:51:49.842060 7fed0cff2700 15 client.478534 unlink dir 0x7fed4e3866b0 'watermark.png' dn 0x7fec41ca2670 inode 0x7fec725c36f0
?? -27> 2015-09-21 15:51:49.842063 7fed0cff2700 20 client.478534 unlink??inode 0x7fec725c36f0 parents now
?? -26> 2015-09-21 15:51:49.842065 7fed0cff2700 10 client.478534 put_inode on 10000295f60.head(ref=3 ll_ref=41 cap_refs={1024=0,2048=0} open={1=1} mode=100644 size=17139/0 mtime=2015-02-09 14:19:01.455722 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[10000295f60 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec725c36f0)
?? -25> 2015-09-21 15:51:49.842080 7fed0cff2700 15 client.478534 unlink??removing 'watermark.png' dn 0x7fec41ca2670
?? -24> 2015-09-21 15:51:49.842084 7fed0cff2700 15 client.478534 close_dir dir 0x7fec41ce9470 on 0x7fed4e3866b0
?? -23> 2015-09-21 15:51:49.842086 7fed0cff2700 10 client.478534 put_inode on 100002941e7.head(ref=3 ll_ref=218 cap_refs={} open={} mode=42777 size=0/0 mtime=2015-09-20 22:26:21.235466 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7fed4cd24930 0x7fed4e3866b0)
?? -22> 2015-09-21 15:51:49.842113 7fed0cff2700 15 client.478534 trim_dentry unlinking dn 57db77a3fa10db0.jpg in dir 1000167aa3a
?? -21> 2015-09-21 15:51:49.842116 7fed0cff2700 15 client.478534 unlink dir 0x7fec53f65340 '57db77a3fa10db0.jpg' dn 0x7fec41afa200 inode 0x7fec41d602b0
?? -20> 2015-09-21 15:51:49.842119 7fed0cff2700 20 client.478534 unlink??inode 0x7fec41d602b0 parents now
?? -19> 2015-09-21 15:51:49.842120 7fed0cff2700 10 client.478534 put_inode on 100016e3c65.head(ref=1 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d602b0)
?? -18> 2015-09-21 15:51:49.842134 7fed0cff2700 10 client.478534 remove_cap mds.0 on 100016e3c65.head(ref=0 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d602b0)
?? -17> 2015-09-21 15:51:49.842149 7fed0cff2700 15 client.478534 remove_cap last one, closing snaprealm 0x7fed4c011ed0
?? -16> 2015-09-21 15:51:49.842151 7fed0cff2700 20 client.478534 put_snap_realm 1 0x7fed4c011ed0 250462 -> 250461
?? -15> 2015-09-21 15:51:49.842154 7fed0cff2700 10 client.478534 put_inode deleting 100016e3c65.head(ref=0 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps= objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d602b0)
?? 14> 2015-09-21 15:51:49.842164 7fed0cff2700 15 client.478534 unlink??removing '57db77a3fa10db0.jpg' dn 0x7fec41afa200
?? -13> 2015-09-21 15:51:49.842167 7fed0cff2700 15 client.478534 close_dir dir 0x7fec41cd8e90 on 0x7fec53f65340
?? -12> 2015-09-21 15:51:49.842170 7fed0cff2700 10 client.478534 put_inode on 1000167aa3a.head(ref=4 ll_ref=29 cap_refs={} open={} mode=42777 size=0/0 mtime=2015-09-01 17:18:36.671240 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7fec53a18030 0x7fec53f65340)
?? -11> 2015-09-21 15:51:49.842183 7fed0cff2700??3 client.478534 ll_open 10000295f60.head 32768 = 0 (0x7fec6009f7c0)
?? -10> 2015-09-21 15:51:49.842202 7fed0cff2700??3 client.478534 ll_forget 10000295f60 1
??-9> 2015-09-21 15:51:49.842206 7fed0cff2700 20 client.478534 _ll_put 0x7fec725c36f0 10000295f60 1 -> 40
??-8> 2015-09-21 15:51:49.842220 7fed4150e700 20 client.478534 sendrecv kickback on tid 424256 0x7fed5b7fd770
??-7> 2015-09-21 15:51:49.842226 7fed4150e700 20 client.478534 make_request target is 7fec41afa1f0(ref=0 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps= objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d602b0)
??6> 2015-09-21 15:51:49.842240 7fed4150e700 20 client.478534 lat 0.020914
??-5> 2015-09-21 15:51:49.842243 7fed4150e700 10 client.478534 _do_lookup res is 0
??-4> 2015-09-21 15:51:49.842246 7fed4150e700 10 client.478534 _lookup 1000167aa3a.head(ref=3 ll_ref=29 cap_refs={} open={} mode=42777 size=0/0 mtime=2015-09-01 17:18:36.671240 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7fec53a18030 0x7fec53f65340) 57db77a3fa10db0.jpg = 7fec41afa1f0(ref=0 ll_ref=0 cap_refs={} open={} mode=100777 size=23529/0 mtime=2015-09-01 17:17:03.680308 caps= objectset[100016e3c65 ts 0/0 objects 0 dirty_or_tx 0] 0x7fec41d602b0)
??~~3> 2015-09-21 15:51:49.842264 7fed4150e700 10 client.478534 fill_stat on 7fec41afa1f0 snap/dev0 mode 0100777 mtime 2015-09-01 17:17:03.680308 ctime 0.933478~~
??-2> 2015-09-21 15:51:49.842768 7fed584f5700??2 - (client_ip):0/8205 >> (osd_0_ip):6800/5202 pipe(0x7fed20013960 sd=2 :57249 s=2 pgs=1636 cs=1 l=1 c=0x7fed20017c00).reader couldn't read tag, (11) Resource temporarily unavailable
??~~1> 2015-09-21 15:51:49.842810 7fed584f5700??2 -~~ (client_ip)/8205 >> (osd_0_ip):6800/5202 pipe(0x7fed20013960 sd=2 :57249 s=2 pgs=1636 cs=1 l=1 c=0x7fed20017c00).fault (11) Resource temporarily unavailable
?? 0> 2015-09-21 15:51:49.843684 7fed4150e700 -1 ** Caught signal (Segmentation fault) *
in thread 7fed4150e700

ceph version 0.94.3-230-g13eab61 (13eab6117657723e9b15c5a58748e922cb5131e4)
 1: ceph-fuse() [0x624d4a]
 2: (()+0x10340) [0x7fed666ff340]
 3: (Inode::get()+0x22) [0x5b1332]
 4: (Client::_ll_get(Inode*)+0x29) [0x54f879]
 5: (Client::ll_lookup(Inode*, char const*, stat*, Inode**, int, int)+0xd8) [0x58df68]
 6: ceph-fuse() [0x548ab0]
 7: (()+0x1522b) [0x7fed66b3822b]
 8: (()+0x11e49) [0x7fed66b34e49]
 9: (()+0x8182) [0x7fed666f7182]
 10: (clone()+0x6d) [0x7fed6507c47d]

p.s. please re-open topic

Actions

Copy link

#33

Updated by Zheng Yan over 8 years ago

I can't open the issue neither. But this seems to be duplicate of #12674

Actions

Copy link Download all files

#34

Updated by Sergey Mir over 8 years ago

File gdb gdb added
File log log added

attached last logs.
last error msg is different from previous, so maybe i should attach other when they happen?

Actions

Copy link

#35

Updated by Zheng Yan over 8 years ago

It's likely been fixed by pull request https://github.com/ceph/ceph/pull/4753. could you try compiling ceph-fuse from the newest code (https://github.com/ceph/ceph.git master)

Actions

Copy link

#36

Updated by Sergey Mir over 8 years ago

thx it works.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Support #13055

Problem with disconnect fuse by mds

Updated by Greg Farnum over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by John Spray over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Greg Farnum over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago

Updated by Zheng Yan over 8 years ago

Updated by Sergey Mir over 8 years ago