Bug #4489
ceph fs hangs on file stat
0%
Description
hi. I have cephfs (kernel client) mounted from two hosts at /var/www.
I'm trying to do
stat /var/www/configuration/etc/php5/apache2/php.ini
or
ls -la /var/www/configuration/etc/php5/apache2/php.ini
and it hangs.
MDS doesn't show anything except
2013-03-19 00:00:25.721238 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 69 rtt 0.002541 2013-03-19 00:00:28.713766 7fa79bcf9700 20 mds.-1.bal get_load no root, no load 2013-03-19 00:00:28.713831 7fa79bcf9700 15 mds.-1.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.12> 2013-03-19 00:00:29.718796 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 70 (currently up:standby) 2013-03-19 00:00:29.718830 7fa79bcf9700 1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 70 v463) v2 -- ?+0 0x1e4fdc0 con 0x1de9580 2013-03-19 00:00:29.720840 7fa79ddfe700 1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 86 ==== mdsbeacon(15037/1 up:standby seq 70 v463) v2 ==== 103+0+0 (3353078024 0 0) 0x1e49840 con 0x1de9580 2013-03-19 00:00:29.720863 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 70 rtt 0.002053 2013-03-19 00:00:33.713839 7fa79bcf9700 20 mds.-1.bal get_load no root, no load 2013-03-19 00:00:33.713903 7fa79bcf9700 15 mds.-1.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.19> 2013-03-19 00:00:33.718915 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 71 (currently up:standby) 2013-03-19 00:00:33.718944 7fa79bcf9700 1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 71 v463) v2 -- ?+0 0x1e4fb00 con 0x1de9580 2013-03-19 00:00:33.720976 7fa79ddfe700 1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 87 ==== mdsbeacon(15037/1 up:standby seq 71 v463) v2 ==== 103+0+0 (3203568751 0 0) 0x1e49b00 con 0x1de9580 2013-03-19 00:00:33.721003 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 71 rtt 0.002077 2013-03-19 00:00:37.719037 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 72 (currently up:standby) 2013-03-19 00:00:37.719077 7fa79bcf9700 1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 72 v463) v2 -- ?+0 0x1e4f840 con 0x1de9580 2013-03-19 00:00:37.721179 7fa79ddfe700 1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 88 ==== mdsbeacon(15037/1 up:standby seq 72 v463) v2 ==== 103+0+0 (331341952 0 0) 0x1e49dc0 con 0x1de9580
with
debug ms = 1 debug mds = 20
It stales on file
php.ini - usual file
Hangs on both clients.
Kernel version is 3.7.2.
root@hosting-cloud1-s1:/var/www/configuration/etc/php5/apache2# stat php.ini File: `php.ini' Size: 67654 Blocks: 133 IO Block: 4194304 regular file Device: 0h/0d Inode: 1099514486171 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-03-13 16:39:01.000000000 +0700 Modify: 2013-01-18 16:12:01.000000000 +0700 Change: 2013-03-14 13:51:06.942637000 +0700
After second client reboot, no more hangs.
Mounted as
10.252.0.3:6789,10.252.0.2:6789,10.252.0.4:6789:/hosting/cloud-1/ on /var/www type ceph (snapdirname=.cc633faa563cbe671221758ad9c01de3,dirstat,norbytes,nocrc,name=admin,readdir_max_entries=8192,readdir_max_bytes=4194304,key=client.admin)
History
#1 Updated by Ivan Kudryavtsev about 11 years ago
can provide shell access to one of servers but don't know if it can be reproduced easily.
#2 Updated by Ivan Kudryavtsev about 11 years ago
code, which caused hung running on two hosts:
root@hosting-cloud1-s2:/usr/lib/php5# cat maxlifetime #!/bin/sh -e max=1440 if which php5 >/dev/null 2>&1 && [ -e /etc/php5/apache2/php.ini ]; then cur=$(php5 -c /etc/php5/apache2/php.ini -d "error_reporting='E_ALL & ~E_DEPRECATED'" -r 'print ini_get("session.gc_maxlifetime");') [ -z "$cur" ] && cur=0 [ "$cur" -gt "$max" ] && max=$cur else for ini in /etc/php5/*/php.ini; do cur=$(sed -n -e 's/^[[:space:]]*session.gc_maxlifetime[[:space:]]*=[[:space:]]*\([0-9]\+\).*$/\1/p' $ini 2>/dev/null || true); [ -z "$cur" ] && cur=0 [ "$cur" -gt "$max" ] && max=$cur done fi echo $(($max/60)) exit 0
#3 Updated by Ivan Kudryavtsev about 11 years ago
Wrapping cron.d code.
root [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete
#4 Updated by Ivan Kudryavtsev about 11 years ago
I think it could be connected with #4486, because, I found about 150 launched cron tasks and every task is launched twice per hour.
#5 Updated by Ivan Kudryavtsev about 11 years ago
no, it started early than #4486, so the reason is another one.
#6 Updated by Ivan Kudryavtsev about 11 years ago
and no other specific events were that moment (like scrubbing, osd/mds/mon failures).
#7 Updated by Ian Colle about 11 years ago
- Subject changed from ceph fs hands on file stat to ceph fs hangs on file stat
- Assignee set to Greg Farnum
#8 Updated by Greg Farnum about 11 years ago
- Status changed from New to Need More Info
That log is from a standby MDS. You'll need to provide the log of the active MDS for us to do anything with it. :)
#9 Updated by Ivan Kudryavtsev about 11 years ago
Oh, sorry for that. It seems, I failed with log. I will attach correct log next time problem persist. But the problem exists, anyway. Process locked on file stat.
#10 Updated by Ivan Kudryavtsev about 11 years ago
And MDS reload doesn't fixed problem until I rebooted one of FS clients.
#11 Updated by Greg Farnum almost 11 years ago
Why are you specifying the snapdirname to that weird value when mounting this?
#12 Updated by Ivan Kudryavtsev almost 11 years ago
Hm, snapdirname is something obfuscated (but have no use, actually).
I've got the same error one more time, so I belive you also can find it if will deploy php sessions and configs at cephfs and will try to call session cleaning some times.
As for me, unfortunately, I'm not able to play more with Ceph FS since it's too buggy. So, I removed FS from my cluster and go to shared RBD/GFS2. Hope this will work well.
#13 Updated by Greg Farnum almost 11 years ago
- Status changed from Need More Info to Duplicate
All right; that should be more stable for you. :)
Thanks for the steps to reproduce. I'm going to tentatively mark this as a duplicate as it sounds like it may be the same as #3637.
#14 Updated by Greg Farnum almost 11 years ago
- Status changed from Duplicate to New
Never mind, forgot the other one involved max size changes.
#15 Updated by Greg Farnum almost 11 years ago
- Assignee deleted (
Greg Farnum)
#16 Updated by Sage Weil about 10 years ago
- Status changed from New to Can't reproduce
- Source changed from Development to Community (user)