Project

General

Profile

Actions

Bug #704

closed

it hang both in client and osd

Added by longguang yue over 13 years ago. Updated about 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1.# sync ;in client,it hang ......never return
2.in osd0,
  1. ls /date/osd0 ;hang there,never return.

Files

dmesg.yue (81.1 KB) dmesg.yue longguang yue, 01/13/2011 04:55 PM
dmesg.ceph (248 KB) dmesg.ceph longguang yue, 01/18/2011 12:43 AM
Actions #1

Updated by longguang yue over 13 years ago

#ls /data/osd0
cat /proc/31361/statck
__mutex_fastpath_lock_retval+0x18/0x1a
vfs_readdir+0x59/0xb2
sys_getdents+0x81/0xd1
system_call_fastpath+0x16/0x1b
0xfffffffffffff

Actions #2

Updated by longguang yue over 13 years ago

at mds0
#cat /proc/469/statck
futex_wait)queue)me+0xc5/0xe4
futex_wait+0x143/0x2f9
do_futex+0x9c/0x852
sys_futex+0x134/0x144
system_cal_fsatpath+0x16/0x1b
0xfffffffffffffff

Actions #3

Updated by longguang yue over 13 years ago

Actions #4

Updated by Sage Weil over 13 years ago

can you include 'ceph -s' output?

Actions #5

Updated by longguang yue over 13 years ago

at the first time ,i mkcephfs -a --mkbtrfs.client can mount.ceph,
but if you ls /data/osd0 at osd0-node,it will hang,
mkcephfs can make it work the first time,if it erroe,you have to mkcephfs again.

-----------------------------------------
2011-01-17 13:13:42.103691 7ffe9ccc6740 thread 140731529058048 start
2011-01-17 13:13:42.158882 7ffe9ccc6740 thread 140731520665344 start
2011-01-17 13:13:42.158936 7ffe9ccc6740 thread 140731512272640 start
2011-01-17 13:13:42.159025 7ffe9ccc6740 thread 140731503879936 start
2011-01-17 13:13:42.160186 7ffe9ccc5700 send_observe_requests
2011-01-17 13:13:42.160220 7ffe9ccc5700 mon <- observe pgmap
2011-01-17 13:13:42.160232 7ffe9ccc5700 mon <- observe mdsmap
2011-01-17 13:13:42.160239 7ffe9ccc5700 mon <- observe osdmap
2011-01-17 13:13:42.160245 7ffe9ccc5700 mon <- observe logm
2011-01-17 13:13:42.160253 7ffe9ccc5700 mon <- observe class
2011-01-17 13:13:42.160260 7ffe9ccc5700 mon <- observe monmap
2011-01-17 13:13:42.160266 7ffe9ccc5700 mon <- observe auth
2011-01-17 13:13:42.160273 7ffe9ccc5700 refresh after 150 with same mon
2011-01-17 13:13:42.160334 7ffe9b4c2700 thread 140731502827264 start
2011-01-17 13:13:42.160976 7ffe9b3c1700 reader got ack seq 1
2011-01-17 13:13:42.163976 7ffe9b3c1700 reader got ack seq 9
2011-01-17 13:13:42.164084 7ffe9ccc6740 send_observe_requests
2011-01-17 13:13:42.164098 7ffe9ccc6740 mon <- observe pgmap
2011-01-17 13:13:42.164165 7ffe9ccc6740 mon <- observe mdsmap
2011-01-17 13:13:42.164179 7ffe9ccc6740 mon <- observe osdmap
2011-01-17 13:13:42.164187 7ffe9ccc6740 mon <- observe logm
2011-01-17 13:13:42.164194 7ffe9ccc6740 mon <- observe class
2011-01-17 13:13:42.164202 7ffe9ccc6740 mon <- observe monmap
2011-01-17 13:13:42.164209 7ffe9ccc6740 mon <- observe auth
2011-01-17 13:13:42.164216 7ffe9ccc6740 refresh after 150 with same mon
2011-01-17 13:13:42.167500 7ffe9ccc5700 mon0 > pgmap v100 (latest)
2011-01-17 13:13:42.167497 pg v100: 528 pgs: 528 active+clean; 28707 KB data, 154 MB used, 40803 MB / 40958 MB avail
2011-01-17 13:13:42.168589 7ffe9b3c1700 reader got ack seq 16
2011-01-17 13:13:42.168628 7ffe9ccc5700 mon0 -> mdsmap v6 (latest)
2011-01-17 13:13:42.168628 mds e6: 1/1/1 up {0=up:replay}
2011-01-17 13:13:42.168700 7ffe9ccc5700 mon0 -> osdmap v7 (latest)
2011-01-17 13:13:42.168700 osd e7: 2 osds: 2 up, 2 in
2011-01-17 13:13:42.168798 7ffe9ccc5700 mon0 -> logm v181 (latest)
2011-01-17 13:13:42.168798 log 2011-01-17 13:11:34.262474 mon0 10.28.5.10:6789/0 2 : [INF] mds? 10.28.5.13:6800/26828 up:boot
2011-01-17 13:13:42.168913 7ffe9ccc5700 mon0 -> class v1 (latest)
2011-01-17 13:13:42.168966 7ffe9ccc5700 mon0 -> monmap v1 (latest)
2011-01-17 13:13:42.168965 mon e1: 1 mons at {0=10.28.5.10:6789/0}
2011-01-17 13:13:42.169021 7ffe9ccc5700 mon0 -> auth v5 (latest)
2011-01-17 13:13:42.169107 7ffe9ccc6740 thread 140731503879936 stop
2011-01-17 13:13:42.169165 7ffe9ccc6740 thread 140731502827264 stop
2011-01-17 13:13:42.169208 7ffe9ccc6740 thread 140731529058048 stop
2011-01-17 13:13:42.169245 7ffe9ccc6740 thread 140731512272640 stop
2011-01-17 13:13:42.169269 7ffe9ccc6740 thread 140731520665344 stop
----------------------------------

Actions #6

Updated by longguang yue over 13 years ago

this time ,i mkcephfs again.
at first mount.ceph is ok,i cp fine /mnt/ceph ,this is ok,but sync hang,you can ls now.while ceph -w give little msg.
i go to osd1, ls /data/osd1,it hang,but ls osd0 is fine.
[in client] ls /mnt/ceph, is ok, but when cp Newfile to /mnt/ceph , ls /mnt/ceph , it hang!!!!!!

i am try find the point,but have no idea. log is at highest level,but i can not find usable info.

attached file is ,ls's stack,ceph s, dmesg at client,
--ceph -w ----------------

mon0 -> pgmap registered

mon0 -> mdsmap registered

mon0 -> osdmap registered

logm,class,auth registered...........

reader got ack seq 346....

Actions #7

Updated by longguang yue over 13 years ago

it is strange, previous osd0 hang,this time osd1 hang.....

Actions #8

Updated by Sage Weil about 13 years ago

  • Status changed from New to Closed

the ceph -s output shows the mds in state 'up:replay'.. that's why the client couldn't mount.

Actions

Also available in: Atom PDF