Project

General

Profile

Actions

Bug #15379

closed

ceph mds continiously crashes and going into laggy state (stray purging problems)

Added by Kenneth Waegeman about 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are removing a very large directory from cephFS on infernalis 9.2.1, and after a while, an MDS was in laggy state without failing over to another mds. I checked and all mdss has crashed. It leaves a stacktrace:

   -26> 2016-04-04 23:46:35.743970 7faa127fe700  5 mds.0.log _submit_thread 5134992710184~531 : EUpdate purge_stray [metablob 600, 1 dirs]
   -25> 2016-04-04 23:46:35.743978 7faa143fe700  1 -- 10.141.16.3:6806/2790032 --> 10.141.16.16:6824/1773611 -- osd_op(mds.0.332:140386 100015cef56.00000000 [delete] 4.116a78fc snapc 1=[] ondisk+write+known_if_r
edirected+full_force e12853) v6 -- ?+0 0x7faa3157d700 con 0x7faa29d9c840
   -24> 2016-04-04 23:46:35.744014 7faa143fe700  1 -- 10.141.16.3:6806/2790032 --> 10.141.16.25:6801/2962763 -- osd_op(mds.0.332:140387 100015cef57.00000000 [delete] 4.aed913ee snapc 1=[] ondisk+write+known_if_r
edirected+full_force e12853) v6 -- ?+0 0x7fb273ab99c0 con 0x7faa2994dde0
   -23> 2016-04-04 23:46:35.744019 7faa127fe700  5 mds.0.log _submit_thread 5134992710735~531 : EUpdate purge_stray [metablob 600, 1 dirs]
   -22> 2016-04-04 23:46:35.744044 7faa127fe700  5 mds.0.log _submit_thread 5134992711286~531 : EUpdate purge_stray [metablob 600, 1 dirs]
   -21> 2016-04-04 23:46:35.744046 7faa143fe700  1 -- 10.141.16.3:6806/2790032 --> 10.141.16.25:6801/2962763 -- osd_op(mds.0.332:140388 100015cef58.00000000 [delete] 4.7a4ded53 snapc 1=[] ondisk+write+known_if_r
edirected+full_force e12853) v6 -- ?+0 0x7fb2718f6b00 con 0x7faa2994dde0
   -20> 2016-04-04 23:46:35.744183 7faa19cb3700  1 -- 10.141.16.3:6806/2790032 <== client.834666 10.141.16.11:0/2497359121 2362568 ==== client_request(client.834666:74676965 lookup #10002349485/constructramseygr
aphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516) v2 ==== 185+0+0 (3464717445 0 0) 0x7fb76e9c2ec0 con 0x7faa2420dc80
   -19> 2016-04-04 23:46:35.744198 7faa19cb3700  4 mds.0.server handle_client_request client_request(client.834666:74676965 lookup #10002349485/constructramseygraphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 201
6-04-04 23:46:35.730516) v2
   -18> 2016-04-04 23:46:35.744260 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744158, event: throttled, op: client_request(client.834666:74676965 lookup #10002349485/constructramsey
graphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -17> 2016-04-04 23:46:35.744264 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744172, event: all_read, op: client_request(client.834666:74676965 lookup #10002349485/constructramseyg
raphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -16> 2016-04-04 23:46:35.744268 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744195, event: dispatched, op: client_request(client.834666:74676965 lookup #10002349485/constructramse
ygraphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -15> 2016-04-04 23:46:35.744289 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744289, event: acquired locks, op: client_request(client.834666:74676965 lookup #10002349485/constructr
amseygraphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -14> 2016-04-04 23:46:35.744297 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744297, event: replying, op: client_request(client.834666:74676965 lookup #10002349485/constructramseyg
raphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -13> 2016-04-04 23:46:35.744307 7faa19cb3700  1 -- 10.141.16.3:6806/2790032 --> 10.141.16.11:0/2497359121 -- client_reply(???:74676965 = 0 (0) Success) v1 -- ?+0 0x7faa37b68b00 con 0x7faa2420dc80
   -12> 2016-04-04 23:46:35.744317 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744317, event: finishing request, op: client_request(client.834666:74676965 lookup #10002349485/constru
ctramseygraphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -11> 2016-04-04 23:46:35.744331 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744331, event: cleaned up request, op: client_request(client.834666:74676965 lookup #10002349485/constr
uctramseygraphs_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
   -10> 2016-04-04 23:46:35.744336 7faa19cb3700  5 -- op tracker -- seq: 1571327, time: 2016-04-04 23:46:35.744336, event: done, op: client_request(client.834666:74676965 lookup #10002349485/constructramseygraph
s_lower-3_9_32_e108-3n8-mod_500-haunter-pbs 2016-04-04 23:46:35.730516)
    -9> 2016-04-04 23:46:35.744454 7faa19cb3700  1 -- 10.141.16.3:6806/2790032 <== client.834666 10.141.16.11:0/2497359121 2362569 ==== client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-04 23:46
:35.730516) v2 ==== 122+0+0 (3349840858 0 0) 0x7faa37b68b00 con 0x7faa2420dc80
    -8> 2016-04-04 23:46:35.744465 7faa19cb3700  4 mds.0.server handle_client_request client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-04 23:46:35.730516) v2
    -7> 2016-04-04 23:46:35.744469 7faa19cb3700  5 -- op tracker -- seq: 1571328, time: 2016-04-04 23:46:35.744437, event: throttled, op: client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-04 23:
46:35.730516)
    -6> 2016-04-04 23:46:35.744473 7faa19cb3700  5 -- op tracker -- seq: 1571328, time: 2016-04-04 23:46:35.744448, event: all_read, op: client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-04 23:4
6:35.730516)
    -5> 2016-04-04 23:46:35.744476 7faa19cb3700  5 -- op tracker -- seq: 1571328, time: 2016-04-04 23:46:35.744463, event: dispatched, op: client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-04 23
:46:35.730516)
    -4> 2016-04-04 23:46:35.744492 7faa19cb3700  5 -- op tracker -- seq: 1571328, time: 2016-04-04 23:46:35.744492, event: acquired locks, op: client_request(client.834666:74676966 readdir #1000266eb9f 2016-04-0
4 23:46:35.730516)
    -3> 2016-04-04 23:46:35.744517 7faa19cb3700  1 -- 10.141.16.3:6806/2790032 --> 10.141.16.2:6805/1303901 -- osd_op(mds.0.332:140389 1000266eb9f.00000000 [omap-get-header 0~0,omap-get-vals 0~16,getxattr parent
] 6.acf7ddd ack+read+known_if_redirected+full_force e12853) v6 -- ?+0 0x7fb76e9c2ec0 con 0x7faa2420a420
    -2> 2016-04-04 23:46:35.757602 7faa13bfd700  1 -- 10.141.16.3:6806/2790032 <== osd.3 10.141.16.2:6805/1303901 1945 ==== osd_op_reply(140389 1000266eb9f.00000000 [omap-get-header 0~0,omap-get-vals 0~16,getxat
tr (432)] v0'0 uv54613 ondisk = 0) v6 ==== 271+0+597535 (3018918272 0 3680795695) 0x7fb279097c80 con 0x7faa2420a420
    -1> 2016-04-04 23:46:35.758698 7fa9febff700  1 -- 10.141.16.3:6806/2790032 <== osd.96 10.141.16.20:6802/1775683 4771 ==== osd_op_reply(140373 100015cef4c.00000000 [delete] v0'0 uv2249860 ondisk = 0) v6 ==== 
187+0+0 (3654499068 0 0) 0x7faa2a9ac000 con 0x7faa29d9c160
     0> 2016-04-04 23:46:35.767677 7faa143fe700 -1 *** Caught signal (Aborted) **
 in thread 7faa143fe700

 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x4b6f42) [0x7faa20b6df42]
 2: (()+0xf100) [0x7faa1f87b100]
 3: (gsignal()+0x37) [0x7faa1e69d5f7]
 4: (abort()+0x148) [0x7faa1e69ece8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7faa1efa19d5]
 6: (()+0x5e946) [0x7faa1ef9f946]
 7: (()+0x5e973) [0x7faa1ef9f973]
 8: (()+0x5eb93) [0x7faa1ef9fb93]
 9: (CDir::_load_dentry(std::string const&, std::string const&, snapid_t, ceph::buffer::list&, int, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const*, bool*, std::list<CInode*, std::allocator<CInode*> >*)+0x2823) [0x7faa20a6f423]
 10: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, std::string const&, int)+0x9f8) [0x7faa20a701e8]
 11: (MDSIOContextBase::complete(int)+0xa4) [0x7faa20ad0a54]
 12: (Finisher::finisher_thread_entry()+0x168) [0x7faa20b90f58]
 13: (()+0x7dc5) [0x7faa1f873dc5]
 14: (clone()+0x6d) [0x7faa1e75e28d]

I tried to restart the MDS but it again crashes every time, with similar stacktrace:

   -27> 2016-04-05 09:58:23.751054 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 105 ==== osd_op_reply(36267 200.0012af3e [write 3618294~1635 [fadvise_dontneed]] v12857'139171 uv
139171 ondisk = 0) v6 ==== 179+0+0 (2335018870 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -26> 2016-04-05 09:58:23.751097 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 106 ==== osd_op_reply(36268 200.0012af3e [write 3619929~990 [fadvise_dontneed]] v12857'139172 uv1
39172 ondisk = 0) v6 ==== 179+0+0 (1251463916 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -25> 2016-04-05 09:58:23.751122 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 107 ==== osd_op_reply(36269 200.0012af3e [write 3620919~1231 [fadvise_dontneed]] v12857'139173 uv
139173 ondisk = 0) v6 ==== 179+0+0 (155261733 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -24> 2016-04-05 09:58:23.751150 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 108 ==== osd_op_reply(36270 200.0012af3e [write 3622150~1234 [fadvise_dontneed]] v12857'139174 uv
139174 ondisk = 0) v6 ==== 179+0+0 (1138588898 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -23> 2016-04-05 09:58:23.751170 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 109 ==== osd_op_reply(36271 200.0012af3e [write 3623384~1228 [fadvise_dontneed]] v12857'139175 uv
139175 ondisk = 0) v6 ==== 179+0+0 (2523301192 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -22> 2016-04-05 09:58:23.751194 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 110 ==== osd_op_reply(36272 200.0012af3e [write 3624612~1230 [fadvise_dontneed]] v12857'139176 uv
139176 ondisk = 0) v6 ==== 179+0+0 (1337395193 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -21> 2016-04-05 09:58:23.751215 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 111 ==== osd_op_reply(36273 200.0012af3e [write 3625842~1232 [fadvise_dontneed]] v12857'139177 uv
139177 ondisk = 0) v6 ==== 179+0+0 (2160167011 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -20> 2016-04-05 09:58:23.751241 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 112 ==== osd_op_reply(36274 200.0012af3e [write 3627074~1232 [fadvise_dontneed]] v12857'139178 uv
139178 ondisk = 0) v6 ==== 179+0+0 (1458940318 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -19> 2016-04-05 09:58:23.751292 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 113 ==== osd_op_reply(36275 200.0012af3e [write 3628306~1230 [fadvise_dontneed]] v12857'139179 uv
139179 ondisk = 0) v6 ==== 179+0+0 (2161543871 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -18> 2016-04-05 09:58:23.751315 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 114 ==== osd_op_reply(36276 200.0012af3e [write 3629536~1230 [fadvise_dontneed]] v12857'139180 uv
139180 ondisk = 0) v6 ==== 179+0+0 (408786516 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -17> 2016-04-05 09:58:23.751387 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 115 ==== osd_op_reply(36277 200.0012af3e [write 3630766~1240 [fadvise_dontneed]] v12857'139181 uv
139181 ondisk = 0) v6 ==== 179+0+0 (155577562 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -16> 2016-04-05 09:58:23.751411 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 116 ==== osd_op_reply(36278 200.0012af3e [write 3632006~4335 [fadvise_dontneed]] v12857'139182 uv
139182 ondisk = 0) v6 ==== 179+0+0 (3957996991 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -15> 2016-04-05 09:58:23.751437 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 117 ==== osd_op_reply(36265 600.00000000 [stat,tmap2omap 1~0,omap-set-header 0~226,omap-set-vals 
0~508380,omap-rm-keys 0~116024] v12857'145527 uv145527 ondisk = 0) v6 ==== 347+0+0 (3496540317 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -14> 2016-04-05 09:58:23.752939 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 118 ==== osd_op_reply(36279 200.0012af3e [write 3636341~1752 [fadvise_dontneed]] v12857'139183 uv
139183 ondisk = 0) v6 ==== 179+0+0 (4072285340 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -13> 2016-04-05 09:58:23.753008 7f1e51dfa700  1 -- 10.141.16.3:6800/2852702 <== osd.5 10.141.16.3:6801/2389244 119 ==== osd_op_reply(36361 200.0012af3e [write 3638093~33611 [fadvise_dontneed]] v12857'139184 u
v139184 ondisk = 0) v6 ==== 179+0+0 (4108456706 0 0) 0x7f1f875c3180 con 0x7f1e638249a0
   -12> 2016-04-05 09:58:24.205130 7f1e56ec3700 10 monclient: tick
   -11> 2016-04-05 09:58:24.205149 7f1e56ec3700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2016-04-05 09:57:54.205148)
   -10> 2016-04-05 09:58:24.205159 7f1e56ec3700 10 monclient: renew subs? (now: 2016-04-05 09:58:24.205159; renew after: 2016-04-05 09:59:24.204140) -- no
    -9> 2016-04-05 09:58:25.517199 7f1e54bfe700 10 monclient: _send_mon_message to mon.mds02 at 10.141.16.2:6789/0
    -8> 2016-04-05 09:58:25.517221 7f1e54bfe700  1 -- 10.141.16.3:6800/2852702 --> 10.141.16.2:6789/0 -- mdsbeacon(891178/mds03 up:active seq 64 v11535) v4 -- ?+0 0x7f1e700a6a00 con 0x7f1e638242c0
    -7> 2016-04-05 09:58:29.517263 7f1e54bfe700 10 monclient: _send_mon_message to mon.mds02 at 10.141.16.2:6789/0
    -6> 2016-04-05 09:58:29.517286 7f1e54bfe700  1 -- 10.141.16.3:6800/2852702 --> 10.141.16.2:6789/0 -- mdsbeacon(891178/mds03 up:active seq 65 v11535) v4 -- ?+0 0x7f1e700a6d00 con 0x7f1e638242c0
    -5> 2016-04-05 09:58:33.517341 7f1e54bfe700 10 monclient: _send_mon_message to mon.mds02 at 10.141.16.2:6789/0
    -4> 2016-04-05 09:58:33.517381 7f1e54bfe700  1 -- 10.141.16.3:6800/2852702 --> 10.141.16.2:6789/0 -- mdsbeacon(891178/mds03 up:active seq 66 v11535) v4 -- ?+0 0x7f1e700a7000 con 0x7f1e638242c0
    -3> 2016-04-05 09:58:34.205253 7f1e56ec3700 10 monclient: tick
    -2> 2016-04-05 09:58:34.205288 7f1e56ec3700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2016-04-05 09:58:04.205287)
    -1> 2016-04-05 09:58:34.205308 7f1e56ec3700 10 monclient: renew subs? (now: 2016-04-05 09:58:34.205308; renew after: 2016-04-05 09:59:24.204140) -- no
     0> 2016-04-05 09:58:35.480254 7f1e527fd700 -1 *** Caught signal (Aborted) **
 in thread 7f1e527fd700

 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x4b6f42) [0x7f1e5f61ff42]
 2: (()+0xf100) [0x7f1e5e32d100]
 3: (gsignal()+0x37) [0x7f1e5d14f5f7]
 4: (abort()+0x148) [0x7f1e5d150ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f1e5da539d5]
 6: (()+0x5e946) [0x7f1e5da51946]
 7: (()+0x5e973) [0x7f1e5da51973]
 8: (()+0x5eb93) [0x7f1e5da51b93]
 9: (CDir::_load_dentry(std::string const&, std::string const&, snapid_t, ceph::buffer::list&, int, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const*, bool*, std::list<CInode*, std::alloc
ator<CInode*> >*)+0x2823) [0x7f1e5f521423]
 10: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, std::string const&, int)+0x9
f8) [0x7f1e5f5221e8]
 11: (MDSIOContextBase::complete(int)+0xa4) [0x7f1e5f582a54]
 12: (Finisher::finisher_thread_entry()+0x168) [0x7f1e5f642f58]
 13: (()+0x7dc5) [0x7f1e5e325dc5]
 14: (clone()+0x6d) [0x7f1e5d21028d]

How can we fix this?


Files

ceph-mds.mds03.log.1.gz (188 KB) ceph-mds.mds03.log.1.gz Kenneth Waegeman, 04/05/2016 11:19 AM
ceph-mds02.tail.tail.gz (219 KB) ceph-mds02.tail.tail.gz Kenneth Waegeman, 04/05/2016 12:55 PM
ceph-mds02.tail.gz (432 KB) ceph-mds02.tail.gz Kenneth Waegeman, 04/05/2016 01:26 PM
Actions #1

Updated by John Spray about 8 years ago

Further up the log (before the minus-prefixed line numbers) there should be info from the actual failed assertion

Actions #2

Updated by Kenneth Waegeman about 8 years ago

Added the whole log file of the original crash

Actions #3

Updated by John Spray about 8 years ago

Hmm, you're getting heartbeatmap warnings which indicate that something in the MDS is blocking for much longer than it should (possibly it's trying to purge things more aggressively than this cluster can handle).

Something else weird is going on, to cause the "was purgeable but no longer is!" messages.

I can't actually see what's raising the SIGABRT, which is weird, I was expecting to see a nice log message saying which assert() it hit.

Please turn "debug mds" up to 10 and capture another log. If the sizes don't get too unmanageable, also try turning it up to 20.

Actions #4

Updated by Kenneth Waegeman about 8 years ago

I ran it with debug 10, total file after crash was 10G. I kept the last 1 million lines, I hope i'm able to add it

Actions #5

Updated by Kenneth Waegeman about 8 years ago

Did not succeed. Tried with last 100000 lines

Actions #6

Updated by Kenneth Waegeman about 8 years ago

Always get Request Entity Too Large, even for one MB.. is there other way to include this ?
It works with the last 10000 lines, now included

I can do this too for debug 20 if it helps

Actions #7

Updated by Kenneth Waegeman about 8 years ago

with debug 20

Actions #8

Updated by Greg Farnum about 8 years ago

It's probably best to include the whole file and upload with ceph-post-file. :)

Actions #9

Updated by Kenneth Waegeman about 8 years ago

Hmm, I tried this, but I can't upload it : (I added verbose output)

sftp> mkdir post/780bd87b-b262-46a3-90a9-98f685e8a8a7_kennethwaegeman_ugentbe_2ebf3153-628c-4c20-a5f9-55745eafe77a
Couldn't create directory: Failure

[root@mds02 ~]# ceph-post-file d 'issue 15379' -u kennethwaegeman_ugentbe /var/lib/ceph/osd/meta/sda/ceph-mds02.gz
args: -d issue 15379 -u kennethwaegeman_ugentbe -
/var/lib/ceph/osd/meta/sda/ceph-mds02.gz
args: u kennethwaegeman_ugentbe - /var/lib/ceph/osd/meta/sda/ceph-mds02.gz
args: -- /var/lib/ceph/osd/meta/sda/ceph-mds02.gz
/usr/bin/ceph-post-file: upload tag 780bd87b-b262-46a3-90a9-98f685e8a8a7
/usr/bin/ceph-post-file: user: kennethwaegeman_ugentbe
/usr/bin/ceph-post-file: description: issue 15379
/usr/bin/ceph-post-file: will upload file /var/lib/ceph/osd/meta/sda/ceph-mds02.gz
OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 56: Applying options for
debug2: ssh_connect: needpriv 0
debug1: Connecting to drop.ceph.com [8.43.84.130] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /tmp/tmp.jAbQk7CFYq type 2
debug1: identity file /tmp/tmp.jAbQk7CFYq-cert type 1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 pat OpenSSH_6.6.1
compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: ,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ,,ssh-rsa,,,,,,,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,,,,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,,,,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,
debug2: kex_parse_kexinit: ,,,,,,,,,hmac-md5,hmac-sha1,,,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: ,,,,,,,,,hmac-md5,hmac-sha1,,,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: ,zlib,none
debug2: kex_parse_kexinit: ,zlib,none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: ,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,,,,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,,,,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,
debug2: kex_parse_kexinit: ,,,,,,,,,hmac-md5,hmac-sha1,,,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: ,,,,,,,,,hmac-md5,hmac-sha1,,,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,
debug2: kex_parse_kexinit: none,
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: setup
debug1: kex: server>client aes128-ctr
debug2: mac_setup: setup
debug1: kex: client->server aes128-ctr
debug1: kex: need=16 dh_need=16
debug1: kex: need=16 dh_need=16
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: RSA a7:72:cf:97:99:4b:21:25:64:37:2f:39:3a:82:8e:f7
debug1: Host 'drop.ceph.com' is known and matches the RSA host key.
debug1: Found key in /usr/share/ceph/known_hosts_drop.ceph.com:1
debug1: ssh_rsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /tmp/tmp.jAbQk7CFYq (0x7fed35e62410), explicit
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering DSA public key: /tmp/tmp.jAbQk7CFYq
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-dss blen 433
debug2: input_userauth_pk_ok: fp 56:02:0f:49:d7:7b:46:06:59:8a:86:8e:d4:9c:56:3d
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type DSA
debug1: Enabling compression at level 6.
debug1: Authentication succeeded (publickey).
Authenticated to drop.ceph.com ([8.43.84.130]:22).
debug2: fd 4 setting O_NONBLOCK
debug1: channel 0: new [client-session]
debug2: channel 0: send open
debug1: Requesting
debug1: Entering interactive session.
debug2: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug1: Sending environment.
debug1: Sending env LC_PAPER = en_GB.UTF-8
debug2: channel 0: request env confirm 0
debug1: Sending env LC_MONETARY = en_GB.UTF-8
debug2: channel 0: request env confirm 0
debug1: Sending env LC_NUMERIC = en_GB.UTF-8
debug2: channel 0: request env confirm 0
debug1: Sending env XMODIFIERS = @im=ibus
debug2: channel 0: request env confirm 0
debug1: Sending env LANG = en_US.utf8
debug2: channel 0: request env confirm 0
debug1: Sending env LC_MEASUREMENT = en_GB.UTF-8
debug2: channel 0: request env confirm 0
debug1: Sending env LC_TIME = en_GB.UTF-8
debug2: channel 0: request env confirm 0
debug1: Sending subsystem: sftp
debug2: channel 0: request subsystem confirm 1
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 2097152
debug2: channel_input_status_confirm: type 99 id 0
debug2: subsystem request accepted on channel 0
debug2: Remote version: 3
debug2: Server supports extension "" revision 1
debug2: Server supports extension "" revision 2
debug2: Server supports extension "" revision 2
debug2: Server supports extension "" revision 1
debug2: Server supports extension "" revision 1
sftp> mkdir post/780bd87b-b262-46a3-90a9-98f685e8a8a7_kennethwaegeman_ugentbe_2ebf3153-628c-4c20-a5f9-55745eafe77a
Couldn't create directory: Failure
debug2: channel 0: read<=0 rfd 4 len 0
debug2: channel 0: read failed
debug2: channel 0: close_read
debug2: channel 0: input open -> drain
debug2: channel 0: ibuf empty
debug2: channel 0: send eof
debug2: channel 0: input drain -> closed
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
debug2: channel 0: rcvd close
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug1: fd 0 clearing O_NONBLOCK
Transferred: sent 3872, received 3124 bytes, in 0.8 seconds
Bytes per second: sent 5080.5, received 4099.1
debug1: Exit status 0
debug1: compress outgoing: raw data 586, compressed 397, factor 0.68
debug1: compress incoming: raw data 302, compressed 172, factor 0.57

Actions #10

Updated by Greg Farnum about 8 years ago

We were having some temporary infrastructure issues, can you try again? :)

Actions #11

Updated by Zheng Yan about 8 years ago

are there coredump files? If there are, run gdb ceph-mds core.xxx. It will give us exact location of crash.

Actions #12

Updated by Kenneth Waegeman about 8 years ago

Thanks, I uploaded the file:
ceph-post-file: 99c6e33a-7e54-472f-847b-68400954bbe4

I don't see any coredumps.

Actions #13

Updated by Zheng Yan about 8 years ago

you can enable coredump by following commands

#ulimit -c unlimited
#ceph-mds -f -i a -c /etc/ceph/ceph.conf

After mds crash, there should be a core.xxx file in current directory. Then run gdb ceph-mds core.xxx, it will give us exact location of crash.

If your ceph-mds binary does not include debug symbols, please compile ceph-mds from source.

#cd ceph-source
#mkdir build
#cd build
#cmake -DCMAKE_C_FLAGS="-g -fno-omit-frame-pointer" ..
#make -j8

Actions #14

Updated by Kenneth Waegeman about 8 years ago

[root@mds02 ~]# gdb ceph-mds /var/lib/ceph/osd/meta/sdb/core.ceph-mds.2922264.mds02.gigalith.os.1459955389 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ceph-mds...Reading symbols from /usr/lib/debug/usr/bin/ceph-mds.debug...done.
done.
[New LWP 2922331]
[New LWP 2922337]
[New LWP 2922351]
[New LWP 2922350]
[New LWP 2922335]
[New LWP 2922355]
[New LWP 2922346]
[New LWP 2922332]
[New LWP 2922336]
[New LWP 2922333]
[New LWP 2922374]
[New LWP 2922370]
[New LWP 2922371]
[New LWP 2922349]
[New LWP 2922368]
[New LWP 2922364]
[New LWP 2922503]
[New LWP 2922352]
[New LWP 2922520]
[New LWP 2922519]
[New LWP 2922500]
[New LWP 2922358]
[New LWP 2922330]
[New LWP 2922354]
[New LWP 2922539]
[New LWP 2922378]
[New LWP 2922540]
[New LWP 2922278]
[New LWP 2922270]
[New LWP 2922509]
[New LWP 2922353]
[New LWP 2922304]
[New LWP 2922361]
[New LWP 2922357]
[New LWP 2922524]
[New LWP 2922513]
[New LWP 2922522]
[New LWP 2922367]
[New LWP 2922502]
[New LWP 2922266]
[New LWP 2922356]
[New LWP 2922511]
[New LWP 2922376]
[New LWP 2922505]
[New LWP 2922345]
[New LWP 2922274]
[New LWP 2922369]
[New LWP 2922360]
[New LWP 2922507]
[New LWP 2922273]
[New LWP 2922512]
[New LWP 2922362]
[New LWP 2922342]
[New LWP 2922375]
[New LWP 2922527]
[New LWP 2922363]
[New LWP 2922372]
[New LWP 2922541]
[New LWP 2922365]
[New LWP 2922347]
[New LWP 2922534]
[New LWP 2922366]
[New LWP 2922267]
[New LWP 2922276]
[New LWP 2922344]
[New LWP 2922665]
[New LWP 2922277]
[New LWP 2922269]
[New LWP 2922504]
[New LWP 2922546]
[New LWP 2922275]
[New LWP 2922377]
[New LWP 2922373]
[New LWP 2922529]
[New LWP 2922536]
[New LWP 2922499]
[New LWP 2922666]
[New LWP 2922359]
[New LWP 2922501]
[New LWP 2922530]
[New LWP 2922506]
[New LWP 2922532]
[New LWP 2922545]
[New LWP 2922542]
[New LWP 2922510]
[New LWP 2922265]
[New LWP 2922508]
[New LWP 2922537]
[New LWP 2922544]
[New LWP 2922514]
[New LWP 2922517]
[New LWP 2922531]
[New LWP 2922525]
[New LWP 2922515]
[New LWP 2922521]
[New LWP 2922543]
[New LWP 2922533]
[New LWP 2922535]
[New LWP 2922518]
[New LWP 2922523]
[New LWP 2922516]
[New LWP 2922526]
[New LWP 2922340]
[New LWP 2922528]
[New LWP 2922538]
[New LWP 2922271]
[New LWP 2922272]
[New LWP 2922264]
[New LWP 2922268]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ceph-mds -f -i mds02 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f772abe5fcb in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install boost-system-1.53.0-25.el7.x86_64 boost-thread-1.53.0-25.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 gperftools-libs-2.4-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 libunwind-1.1-5.el7_2.2.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-19.el7_2.x86_64 nss-softokn-3.16.2.3-13.el7_1.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-9.el7_2.x86_64 sqlite-3.7.17-8.el7.x86_64
(gdb) q

Tried running the debuginfo-install, but nothing found.

(gdb) backtrace
#0  0x00007f772abe5fcb in raise () from /lib64/libpthread.so.0
#1  0x00007f772bca500d in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00007f7729a085f7 in raise () from /lib64/libc.so.6
#5  0x00007f7729a09ce8 in abort () from /lib64/libc.so.6
#6  0x00007f772a30c9d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#7  0x00007f772a30a946 in ?? () from /lib64/libstdc++.so.6
#8  0x00007f772a30a973 in std::terminate() () from /lib64/libstdc++.so.6
#9  0x00007f772a30ab93 in __cxa_throw () from /lib64/libstdc++.so.6
#10 0x00007f772bba6423 in operator new (num_bytes=1432) at mds/CInode.h:142
#11 CDir::_load_dentry (this=this@entry=0x7f772f9c4e38, key=..., dname="10001f1dbd5", last=last@entry=..., bl=..., pos=pos@entry=468099, 
    snaps=snaps@entry=0x0, force_dirty=force_dirty@entry=0x7f77214e945c, undef_inodes=undef_inodes@entry=0x7f77214e9480) at mds/CDir.cc:1660
#12 0x00007f772bba71e8 in CDir::_omap_fetched (this=0x7f772f9c4e38, hdrbl=..., omap=..., want_dn=..., r=<optimized out>) at mds/CDir.cc:1818
#13 0x00007f772bc07a54 in complete (r=0, this=0x7f772e97ba00) at include/Context.h:64
#14 MDSIOContextBase::complete (this=0x7f772e97ba00, r=0) at mds/MDSContext.cc:65
#15 0x00007f772bcc7f58 in Finisher::finisher_thread_entry (this=0x7f772e998780) at common/Finisher.cc:65
#16 0x00007f772abdedc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f7729ac928d in clone () from /lib64/libc.so.6

also output of running the mds:

-sh-4.2$ ceph-mds -f -i mds02 -c /etc/ceph/ceph.conf 
starting mds.mds02 at :/0

tcmalloc: large alloc 1501569024 bytes == 0x7f77fa7c4000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 3003129856 bytes == 0x7f78d14a8000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 1677729792 bytes == 0x7f79e6c90000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bb99d0c 0x7f772bba4e88 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 6006251520 bytes == 0x7f7a4ac92000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 3355451392 bytes == 0x7f7bc0578000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bb99d0c 0x7f772bba4e88 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 12012494848 bytes == 0x7f7c88986000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 6710894592 bytes == 0x7f7ff0dea000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bb99d0c 0x7f772bba4e88 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 24024981504 bytes == (nil) @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 12012494848 bytes == 0x7f8181ac0000 @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60d60 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 24024981504 bytes == (nil) @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60b53 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
tcmalloc: large alloc 12012494848 bytes == (nil) @  0x7f772b36cc73 0x7f772b38f1d5 0x7f772ba60d60 0x7f772bba63e9 0x7f772bba71e8 0x7f772bc07a54 0x7f772bcc7f58 0x7f772abdedc5 0x7f7729ac928d (nil)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Caught signal (Aborted) **
 in thread 7f77214ea700
 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x4b6f42) [0x7f772bca4f42]
 2: (()+0xf100) [0x7f772abe6100]
 3: (gsignal()+0x37) [0x7f7729a085f7]
 4: (abort()+0x148) [0x7f7729a09ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f772a30c9d5]
 6: (()+0x5e946) [0x7f772a30a946]
 7: (()+0x5e973) [0x7f772a30a973]
 8: (()+0x5eb93) [0x7f772a30ab93]
 9: (CDir::_load_dentry(std::string const&, std::string const&, snapid_t, ceph::buffer::list&, int, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const*, bool*, std::list<CInode*, std::allocator<CInode*> >*)+0x2823) [0x7f772bba6423]
 10: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, std::string const&, int)+0x9f8) [0x7f772bba71e8]
 11: (MDSIOContextBase::complete(int)+0xa4) [0x7f772bc07a54]
 12: (Finisher::finisher_thread_entry()+0x168) [0x7f772bcc7f58]
 13: (()+0x7dc5) [0x7f772abdedc5]
 14: (clone()+0x6d) [0x7f7729ac928d]
2016-04-06 17:09:49.229721 7f77214ea700 -1 *** Caught signal (Aborted) **
 in thread 7f77214ea700

The mds also takes a LOT of memory, 54G of 64GB available. But no new entries in dmesg on the time of crash.
Also tried to decrease the mds cache size back to 100000 (we had 1000000) but nothing changes

Actions #15

Updated by John Spray about 8 years ago

Ah, so this is basically http://tracker.ceph.com/issues/11950

The MDS doesn't limit how many stray files (waiting to be purged) there can be at one time, and on startup the MDS tries to load everything from the stray directory into memory at once. So if enough files were unlinked, but not yet purged, then you end up with this situation.

In an extreme situation, it is possible to manually remove the dirfrag objects for the strays, but in this instance you will probably be better off trying to make it past this allocation error (can you enable more swap space on this host and wait for the situation to pass, perhaps?)

Actions #16

Updated by Kenneth Waegeman about 8 years ago

Adding a swapfile of 32GB did make it work.. for now.

Thanks!!

Our mds daemon is running, but now using 73G of RAM. If I understand correctly there is not yet a way to limit this? Will there be a fix for this in Jewel, or already some workaround ?

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND   
2986187 ceph 20 0 73.517g 0.056t 3656 S 86.4 91.0 291:36.22 ceph-mds
Actions #17

Updated by Kenneth Waegeman about 8 years ago

Still not able to resume the rm normally.. Even with the added RAM (now 64+32 GB), this is not enough for the MDS.

   -25> 2016-04-09 18:06:16.308563 7f9a4cffc700  5 mds.0.log _submit_thread 5189743143461~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -24> 2016-04-09 18:06:16.310520 7f9a4cffc700  5 mds.0.log _submit_thread 5189743144012~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -23> 2016-04-09 18:06:16.310517 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.14:6826/1779062 -- osd_op(mds.0.348:15930806 100026fb6be.000000f4 [delete] 4.f1d098e snapc 1=[] ondisk+write+known_if_
redirected+full_force e12888) v6 -- ?+0 0x7fad0c8298c0 con 0x7f9a6cb7adc0
   -22> 2016-04-09 18:06:16.310539 7f9a4cffc700  5 mds.0.log _submit_thread 5189743144563~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -21> 2016-04-09 18:06:16.310548 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.20:6818/1775706 -- osd_op(mds.0.348:15930807 100026f63fb.00000000 [delete] 4.f16f4270 snapc 1=[] ondisk+write+known_if
_redirected+full_force e12888) v6 -- ?+0 0x7fad0c828840 con 0x7f9a6cb7a160
   -20> 2016-04-09 18:06:16.310556 7f9a4cffc700  5 mds.0.log _submit_thread 5189743145114~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -19> 2016-04-09 18:06:16.310572 7f9a4cffc700  5 mds.0.log _submit_thread 5189743145665~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -18> 2016-04-09 18:06:16.310571 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.23:6818/1819849 -- osd_op(mds.0.348:15930808 100026f63fa.00000000 [delete] 4.864b7260 snapc 1=[] ondisk+write+known_if
_redirected+full_force e12888) v6 -- ?+0 0x7fad0c828dc0 con 0x7f9a6cb7bfa0
   -17> 2016-04-09 18:06:16.310586 7f9a4cffc700  5 mds.0.log _submit_thread 5189743146216~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -16> 2016-04-09 18:06:16.310592 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.1:6803/1492 -- osd_op(mds.0.348:15930809 200.0012e140 [delete] 6.aa795f80 ondisk+write+known_if_redirected+full_force 
e12888) v6 -- ?+0 0x7fad0c82b440 con 0x7f9a5dbc09a0
   -15> 2016-04-09 18:06:16.310614 7f9a4cffc700  5 mds.0.log _submit_thread 5189743146767~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -14> 2016-04-09 18:06:16.310627 7f9a4cffc700  5 mds.0.log _submit_thread 5189743147318~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -13> 2016-04-09 18:06:16.310638 7f9a4cffc700  5 mds.0.log _submit_thread 5189743147869~531 : EUpdate purge_stray [metablob 605, 1 dirs]
   -12> 2016-04-09 18:06:16.312806 7f9a4aefe700  1 -- 10.141.16.2:6800/2986187 <== osd.1 10.141.16.1:6803/1492 335351 ==== osd_op_reply(15930809 200.0012e140 [delete] v12888'151160 uv151160 ondisk = 0) v6 ==== 1
79+0+0 (2514634537 0 0) 0x7fad0055d700 con 0x7f9a5dbc09a0
   -11> 2016-04-09 18:06:16.314902 7f9a4dffe700  5 -- op tracker -- seq: 32930835, time: 2016-04-09 18:06:16.314901, event: journal_committed: journal_and_reply, op: client_request(client.939726:10345461 rmdir #
100015052fa/temp CRU 2016-04-09 18:05:22.175859)
   -10> 2016-04-09 18:06:16.320951 7f9a4dffe700  5 -- op tracker -- seq: 32930835, time: 2016-04-09 18:06:16.320951, event: replying, op: client_request(client.939726:10345461 rmdir #100015052fa/temp CRU 2016-04
-09 18:05:22.175859)
    -9> 2016-04-09 18:06:16.321062 7f9a4cffc700  5 mds.0.log _submit_thread 5189743148420~970 : EUpdate scatter_writebehind [metablob 100, 1 dirs]
    -8> 2016-04-09 18:06:16.321210 7f9a4cffc700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.1:6802/1493 -- osd_op(mds.0.348:15930810 200.0012e153 [write 752185~34601 [fadvise_dontneed]] 6.889889e8 ondisk+write+
known_if_redirected+full_force e12888) v6 -- ?+0 0x7facf30f6b00 con 0x7f9a5dbc0b00
    -7> 2016-04-09 18:06:16.321252 7f9a4cffc700  5 mds.0.log _submit_thread 5189743149410~1212 : EUpdate scatter_writebehind [metablob 100015052f8, 2 dirs]
    -6> 2016-04-09 18:06:16.321276 7f9a4cffc700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.1:6802/1493 -- osd_op(mds.0.348:15930811 200.0012e153 [write 786786~1232 [fadvise_dontneed]] 6.889889e8 ondisk+write+k
nown_if_redirected+full_force e12888) v6 -- ?+0 0x7facf30f7080 con 0x7f9a5dbc0b00
    -5> 2016-04-09 18:06:16.321451 7f9a4cffc700  5 mds.0.log _submit_thread 5189743150642~1212 : EUpdate scatter_writebehind [metablob 100015052f9, 2 dirs]
    -4> 2016-04-09 18:06:16.321476 7f9a4cffc700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.1:6802/1493 -- osd_op(mds.0.348:15930812 200.0012e153 [write 788018~1232 [fadvise_dontneed]] 6.889889e8 ondisk+write+k
nown_if_redirected+full_force e12888) v6 -- ?+0 0x7facf30f6840 con 0x7f9a5dbc0b00
    -3> 2016-04-09 18:06:16.321109 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.11:0/1387684715 -- client_caps(grant ino 1000150534d 370020008 seq 26213 caps=pAsLsXsFs dirty=- wanted=pLs follows 0 s
ize 0/0 ts 1 mtime 2016-04-09 18:05:22.165859 tws 4) v6 -- ?+0 0x7face851d000 con 0x7facb88963c0
    -2> 2016-04-09 18:06:16.321530 7f9a4dffe700  1 -- 10.141.16.2:6800/2986187 --> 10.141.16.11:0/1387684715 -- client_reply(???:10345461 = 0 (0) Success safe) v1 -- ?+0 0x7f9a6e41d700 con 0x7facb88963c0
    -1> 2016-04-09 18:06:16.321561 7f9a4dffe700  5 -- op tracker -- seq: 32930835, time: 2016-04-09 18:06:16.321561, event: finishing request, op: client_request(client.939726:10345461 rmdir #100015052fa/temp CR
U 2016-04-09 18:05:22.175859)
     0> 2016-04-09 18:06:16.324513 7f9a53888700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9a53888700
 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x4b6f42) [0x7f9a5a8bbf42]
 2: (()+0xf100) [0x7f9a595c9100]
 3: (PK11_FreeSlot()+0) [0x7f9a59a668a0]
 4: (PK11_FreeSlotListElement()+0x4e) [0x7f9a59a6690e]
 5: (()+0x5296e) [0x7f9a59a6696e]
 6: (PK11_FreeSlotList()+0x9) [0x7f9a59a669a9]
 7: (PK11_GetBestSlotMultipleWithAttributes()+0x220) [0x7f9a59a69470]
 8: (PK11_GetBestSlot()+0x1f) [0x7f9a59a694ff]
 9: (CryptoAES::get_key_handler(ceph::buffer::ptr const&, std::string&)+0x207) [0x7f9a5a8dc6f7]
 10: (CryptoKey::_set_secret(int, ceph::buffer::ptr const&)+0xec) [0x7f9a5a8db79c]
 11: (CryptoKey::decode(ceph::buffer::list::iterator&)+0xb2) [0x7f9a5a8dbaa2]
 12: (void decode_decrypt_enc_bl<CephXServiceTicket>(CephContext*, CephXServiceTicket&, CryptoKey, ceph::buffer::list&, std::string&)+0x4dd) [0x7f9a5a8c9a5d]
 13: (int decode_decrypt<CephXServiceTicket>(CephContext*, CephXServiceTicket&, CryptoKey const&, ceph::buffer::list::iterator&, std::string&)+0x1e2) [0x7f9a5a8c9e52]
 14: (CephXTicketHandler::verify_service_ticket_reply(CryptoKey&, ceph::buffer::list::iterator&)+0xee) [0x7f9a5a8c3ebe]
 15: (CephXTicketManager::verify_service_ticket_reply(CryptoKey&, ceph::buffer::list::iterator&)+0x13a) [0x7f9a5a8c5d8a]
 16: (CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)+0xf34) [0x7f9a5aa93724]
 17: (MonClient::handle_auth(MAuthReply*)+0xe1) [0x7f9a5a954231]
 18: (MonClient::ms_dispatch(Message*)+0x299) [0x7f9a5a9565c9]
 19: (DispatchQueue::entry()+0x649) [0x7f9a5aa9faf9]
 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f9a5a9921ed]
 21: (()+0x7dc5) [0x7f9a595c1dc5]
 22: (clone()+0x6d) [0x7f9a584ac28d]

Actions #18

Updated by Zheng Yan about 8 years ago

Can you start the MDS without resuming rm? If mds can start, please wait a few hours, to make MDS delete enough strays inodes.

Actions #19

Updated by Kenneth Waegeman about 8 years ago

I did this on friday, I wait to let the mds delete until the evening. I did restart it friday evening, and saturday evening it crashed again.
I can of course repeat these steps, but it looks like only 25TB of the 160TB I want to remove is gone (I don't have an immediate insight of the number of files, but only 20M removed objects since first run of rm). Can I tune the mdss specific to process more deletions? Nothing else is happening on the cluster right now, so it only should do deletions.

Actions #20

Updated by John Spray about 8 years ago

When doing a number of deletes greater than the MDS cache size, you need to make sure that the MDS is purging faster than you are deleting.

You can look at the "strays_purged" performance counter to find out the rate at which purges are happening.

You can configure the limits on purging:

mds_max_purge_files
mds_max_purge_ops
mds_max_purge_ops_per_pg

The effective operation limit is the minimum of max_purge_ops and max_purge_ops_per_pg*(data pg count).

You may find that when you lift the limits, the purges start outrunning the deletes, and you're OK. Otherwise, if you're bottlenecked on purges, then the only thing you can do is control the rate of the deletes: possibly issue them in batches of half your MDS cache size or so.

Actions #21

Updated by Zheng Yan about 8 years ago

I send a RP that make MDS not pin strays in memory

could you take a try:
https://github.com/ukernel/ceph/tree/infernalis-trim-purging

Actions #22

Updated by Kenneth Waegeman about 8 years ago

Are there rpms builded for this pr? I looked at http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-basic/ but I did not found it there.
Thanks!!

Actions #24

Updated by Greg Farnum almost 8 years ago

  • Category changed from 47 to Performance/Resource Usage
  • Component(FS) MDS added
Actions #25

Updated by Greg Farnum almost 8 years ago

  • Subject changed from ceph mds continiously crashes and going into laggy state to ceph mds continiously crashes and going into laggy state (stray purging problems)
  • Status changed from New to Closed

We have open tickets about improving purge, and the specific issue here seems to have been addressed.

Actions

Also available in: Atom PDF