Project

General

Profile

Bug #4489

ceph fs hangs on file stat

Added by Ivan Kudryavtsev about 11 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

hi. I have cephfs (kernel client) mounted from two hosts at /var/www.
I'm trying to do

stat /var/www/configuration/etc/php5/apache2/php.ini

or

ls -la /var/www/configuration/etc/php5/apache2/php.ini

and it hangs.

MDS doesn't show anything except

2013-03-19 00:00:25.721238 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 69 rtt 0.002541
2013-03-19 00:00:28.713766 7fa79bcf9700 20 mds.-1.bal get_load no root, no load
2013-03-19 00:00:28.713831 7fa79bcf9700 15 mds.-1.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.12>
2013-03-19 00:00:29.718796 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 70 (currently up:standby)
2013-03-19 00:00:29.718830 7fa79bcf9700  1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 70 v463) v2 -- ?+0 0x1e4fdc0 con 0x1de9580
2013-03-19 00:00:29.720840 7fa79ddfe700  1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 86 ==== mdsbeacon(15037/1 up:standby seq 70 v463) v2 ==== 103+0+0 (3353078024 0 0) 0x1e49840 con 0x1de9580
2013-03-19 00:00:29.720863 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 70 rtt 0.002053
2013-03-19 00:00:33.713839 7fa79bcf9700 20 mds.-1.bal get_load no root, no load
2013-03-19 00:00:33.713903 7fa79bcf9700 15 mds.-1.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.19>
2013-03-19 00:00:33.718915 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 71 (currently up:standby)
2013-03-19 00:00:33.718944 7fa79bcf9700  1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 71 v463) v2 -- ?+0 0x1e4fb00 con 0x1de9580
2013-03-19 00:00:33.720976 7fa79ddfe700  1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 87 ==== mdsbeacon(15037/1 up:standby seq 71 v463) v2 ==== 103+0+0 (3203568751 0 0) 0x1e49b00 con 0x1de9580
2013-03-19 00:00:33.721003 7fa79ddfe700 10 mds.-1.0 handle_mds_beacon up:standby seq 71 rtt 0.002077
2013-03-19 00:00:37.719037 7fa79bcf9700 10 mds.-1.0 beacon_send up:standby seq 72 (currently up:standby)
2013-03-19 00:00:37.719077 7fa79bcf9700  1 -- 10.252.0.4:6802/4678 --> 10.252.0.3:6789/0 -- mdsbeacon(15037/1 up:standby seq 72 v463) v2 -- ?+0 0x1e4f840 con 0x1de9580
2013-03-19 00:00:37.721179 7fa79ddfe700  1 -- 10.252.0.4:6802/4678 <== mon.1 10.252.0.3:6789/0 88 ==== mdsbeacon(15037/1 up:standby seq 72 v463) v2 ==== 103+0+0 (331341952 0 0) 0x1e49dc0 con 0x1de9580

with

debug ms = 1
debug mds = 20

It stales on file

php.ini - usual file

Hangs on both clients.

Kernel version is 3.7.2.

root@hosting-cloud1-s1:/var/www/configuration/etc/php5/apache2# stat php.ini
  File: `php.ini'
  Size: 67654           Blocks: 133        IO Block: 4194304 regular file
Device: 0h/0d   Inode: 1099514486171  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-03-13 16:39:01.000000000 +0700
Modify: 2013-01-18 16:12:01.000000000 +0700
Change: 2013-03-14 13:51:06.942637000 +0700

After second client reboot, no more hangs.

Mounted as

10.252.0.3:6789,10.252.0.2:6789,10.252.0.4:6789:/hosting/cloud-1/ on /var/www type ceph (snapdirname=.cc633faa563cbe671221758ad9c01de3,dirstat,norbytes,nocrc,name=admin,readdir_max_entries=8192,readdir_max_bytes=4194304,key=client.admin)

History

#1 Updated by Ivan Kudryavtsev about 11 years ago

can provide shell access to one of servers but don't know if it can be reproduced easily.

#2 Updated by Ivan Kudryavtsev about 11 years ago

code, which caused hung running on two hosts:

root@hosting-cloud1-s2:/usr/lib/php5# cat maxlifetime
#!/bin/sh -e

max=1440

if which php5 >/dev/null 2>&1 && [ -e /etc/php5/apache2/php.ini ]; then
  cur=$(php5 -c /etc/php5/apache2/php.ini -d "error_reporting='E_ALL & ~E_DEPRECATED'" -r 'print ini_get("session.gc_maxlifetime");')
  [ -z "$cur" ] && cur=0
  [ "$cur" -gt "$max" ] && max=$cur
else
        for ini in /etc/php5/*/php.ini; do
          cur=$(sed -n -e 's/^[[:space:]]*session.gc_maxlifetime[[:space:]]*=[[:space:]]*\([0-9]\+\).*$/\1/p' $ini 2>/dev/null || true);
          [ -z "$cur" ] && cur=0
          [ "$cur" -gt "$max" ] && max=$cur
        done
fi

echo $(($max/60))

exit 0

#3 Updated by Ivan Kudryavtsev about 11 years ago

Wrapping cron.d code.

root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete

#4 Updated by Ivan Kudryavtsev about 11 years ago

I think it could be connected with #4486, because, I found about 150 launched cron tasks and every task is launched twice per hour.

#5 Updated by Ivan Kudryavtsev about 11 years ago

no, it started early than #4486, so the reason is another one.

#6 Updated by Ivan Kudryavtsev about 11 years ago

and no other specific events were that moment (like scrubbing, osd/mds/mon failures).

#7 Updated by Ian Colle about 11 years ago

  • Subject changed from ceph fs hands on file stat to ceph fs hangs on file stat
  • Assignee set to Greg Farnum

#8 Updated by Greg Farnum about 11 years ago

  • Status changed from New to Need More Info

That log is from a standby MDS. You'll need to provide the log of the active MDS for us to do anything with it. :)

#9 Updated by Ivan Kudryavtsev about 11 years ago

Oh, sorry for that. It seems, I failed with log. I will attach correct log next time problem persist. But the problem exists, anyway. Process locked on file stat.

#10 Updated by Ivan Kudryavtsev about 11 years ago

And MDS reload doesn't fixed problem until I rebooted one of FS clients.

#11 Updated by Greg Farnum almost 11 years ago

Why are you specifying the snapdirname to that weird value when mounting this?

#12 Updated by Ivan Kudryavtsev almost 11 years ago

Hm, snapdirname is something obfuscated (but have no use, actually).
I've got the same error one more time, so I belive you also can find it if will deploy php sessions and configs at cephfs and will try to call session cleaning some times.

As for me, unfortunately, I'm not able to play more with Ceph FS since it's too buggy. So, I removed FS from my cluster and go to shared RBD/GFS2. Hope this will work well.

#13 Updated by Greg Farnum almost 11 years ago

  • Status changed from Need More Info to Duplicate

All right; that should be more stable for you. :)

Thanks for the steps to reproduce. I'm going to tentatively mark this as a duplicate as it sounds like it may be the same as #3637.

#14 Updated by Greg Farnum almost 11 years ago

  • Status changed from Duplicate to New

Never mind, forgot the other one involved max size changes.

#15 Updated by Greg Farnum almost 11 years ago

  • Assignee deleted (Greg Farnum)

#16 Updated by Sage Weil about 10 years ago

  • Status changed from New to Can't reproduce
  • Source changed from Development to Community (user)

Also available in: Atom PDF