Project

General

Profile

Bug #20390

A bug in async+dpdk module

Added by hongpeng lu over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/23/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

I start the monitor daemon with async+dpdk, and execute "ceph -s" at other host.
after few tries, the monitor daemon crash.

   -85> 2017-06-23 18:23:52.918071 7fffea3fe700 10 net dispatch_packet === rx === proto 806 48:ea:63:4:50:97 -> ff:ff:ff:ff:ff:ff length 60
   -84> 2017-06-23 18:23:52.924118 7fffea3fe700 10 net dispatch_packet === rx === proto 806 48:ea:63:2c:72:a3 -> ff:ff:ff:ff:ff:ff length 60
   -83> 2017-06-23 18:23:52.933680 7fffe53fa700  5 mon.iozone-103@0(leader).osd e1 can_mark_out no osds
   -82> 2017-06-23 18:23:52.933688 7fffe53fa700  5 mon.iozone-103@0(leader).paxos(paxos active c 1..108) is_readable = 1 - now=2017-06-23 18:23:52.933689 lease_expire=0.000000 has v0 lc 108
   -81> 2017-06-23 18:23:52.968324 7fffea3fe700 10 net dispatch_packet === rx === proto 806 48:ea:63:2c:72:a3 -> ff:ff:ff:ff:ff:ff length 60
   -80> 2017-06-23 18:23:52.969549 7fffea3fe700 10 net dispatch_packet === rx === proto 806  0:22:19:5d:1a:2d -> ff:ff:ff:ff:ff:ff length 60
   -79> 2017-06-23 18:23:52.974851 7fffea3fe700 10 net dispatch_packet === rx === proto 806 48:ea:63:2d:a5:95 -> ff:ff:ff:ff:ff:ff length 60
   -78> 2017-06-23 18:23:52.988551 7fffea3fe700 10 net dispatch_packet === rx === proto 800 f8:b1:56:be:be:66 -> ff:ff:ff:ff:ff:ff length 92 rss_hash 1945615894
   -77> 2017-06-23 18:23:52.988555 7fffea3fe700 10 dpdk handle_received_packet get 11 packet from 191.168.22.98 -> 191.168.255.255 id=3625 ip_len=78 ip_hdr_len=20 pkt_len=78 offset=0
   -76> 2017-06-23 18:23:52.995039 7fffea3fe700 10 net dispatch_packet === rx === proto 806 3c:e5:a6:27:16:86 -> ff:ff:ff:ff:ff:ff length 60
   -75> 2017-06-23 18:23:53.002564 7fffea3fe700 10 net dispatch_packet === rx === proto 806 48:ea:63:4:80:5c -> ff:ff:ff:ff:ff:ff length 60
   -74> 2017-06-23 18:23:53.011623 7fffea3fe700 10 net dispatch_packet === rx === proto 800 48:ea:63:5:72:31 -> 48:ea:63:5:72:5e length 191 rss_hash 1231707456
   -73> 2017-06-23 18:23:53.011626 7fffea3fe700 10 dpdk handle_received_packet get 6 packet from 60.60.1.102 -> 60.60.1.103 id=16 ip_len=177 ip_hdr_len=20 pkt_len=177 offset=0
   -72> 2017-06-23 18:23:53.011634 7fffea3fe700 20 dpdk handle_received_packet learn mac 48:ea:63:5:72:31 with 3c.3c.1.66
   -71> 2017-06-23 18:23:53.011636 7fffea3fe700 20 tcp 60.60.1.103:6789 -> 60.60.1.102:50627 tcb(0x55555efc36c0 fd=2 s=ESTABLISHED).input_handle_other_state tcp header seq 719263620 ack 3762778290 snd next 3762778290 unack 3762778290 rcv next 719263620 len 137 fin=0 syn=0
   -70> 2017-06-23 18:23:53.011640 7fffea3fe700 20 dpdk notify fd=2 mask=1
   -69> 2017-06-23 18:23:53.011640 7fffea3fe700 20 dpdk notify activing=2 listening=1 waiting_idx=0
   -68> 2017-06-23 18:23:53.011641 7fffea3fe700 20 dpdk notify activing=3 listening=1 waiting_idx=1 done 
   -67> 2017-06-23 18:23:53.011642 7fffea3fe700 20 tcp 60.60.1.103:6789 -> 60.60.1.102:50627 tcb(0x55555efc36c0 fd=2 s=ESTABLISHED).input_handle_other_state merged=0 do_output=0
   -66> 2017-06-23 18:23:53.011645 7fffea3fe700 10 net dispatch_packet === rx === proto 806  0:f:e2:11:54:83 -> ff:ff:ff:ff:ff:ff length 60
   -65> 2017-06-23 18:23:53.011647 7fffea3fe700 20 dpdk poll fd=2 mask=1
   -64> 2017-06-23 18:23:53.011648 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).process prev state is STATE_OPEN
   -63> 2017-06-23 18:23:53.011654 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_HEADER pgs=2 cs=1 l=1).process prev state is STATE_OPEN
   -62> 2017-06-23 18:23:53.011657 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_HEADER pgs=2 cs=1 l=1).process begin MSG
   -61> 2017-06-23 18:23:53.011660 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_HEADER pgs=2 cs=1 l=1).process got MSG header
   -60> 2017-06-23 18:23:53.011663 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_HEADER pgs=2 cs=1 l=1).process got envelope type=50 src client.94098 front=62 data=0 off 0
   -59> 2017-06-23 18:23:53.011667 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=2 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
   -58> 2017-06-23 18:23:53.011670 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=2 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
   -57> 2017-06-23 18:23:53.011673 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=2 cs=1 l=1).process wants 62 bytes from policy throttler 0/104857600
   -56> 2017-06-23 18:23:53.011676 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=2 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
   -55> 2017-06-23 18:23:53.011680 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=2 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
   -54> 2017-06-23 18:23:53.011683 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=2 cs=1 l=1).process got front 62
   -53> 2017-06-23 18:23:53.011686 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1).process aborted = 0
   -52> 2017-06-23 18:23:53.011689 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1).process got 62 + 0 + 0 byte message
   -51> 2017-06-23 18:23:53.011695 7fffea3fe700  5 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx client.94098 seq 8 0x55555f49e900 mon_command({"prefix": "status"} v 0) v1
   -50> 2017-06-23 18:23:53.011700 7fffea3fe700 20 -- 60.60.1.103:6789/0 queue 0x55555f49e900 prio 127
   -49> 2017-06-23 18:23:53.011704 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
   -48> 2017-06-23 18:23:53.011710 7fffdeff5700  1 -- 60.60.1.103:6789/0 <== client.94098 60.60.1.102:0/1080485091 8 ==== mon_command({"prefix": "status"} v 0) v1 ==== 62+0+0 (1411063408 0 0) 0x55555f49e900 con 0x55555f059800
   -47> 2017-06-23 18:23:53.011741 7fffdeff5700  0 mon.iozone-103@0(leader) e2 handle_command mon_command({"prefix": "status"} v 0) v1
   -46> 2017-06-23 18:23:53.011754 7fffdeff5700  0 log_channel(audit) log [DBG] : from='client.? 60.60.1.102:0/1080485091' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
   -45> 2017-06-23 18:23:53.011757 7fffdeff5700 10 log_client _send_to_monlog to self
   -44> 2017-06-23 18:23:53.011758 7fffdeff5700 10 log_client  log_queue is 1 last_log 8 sent 7 num 1 unsent 1 sending 1
   -43> 2017-06-23 18:23:53.011759 7fffdeff5700 10 log_client  will send 2017-06-23 18:23:53.011756 mon.0 60.60.1.103:6789/0 8 : audit [DBG] from='client.? 60.60.1.102:0/1080485091' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
   -42> 2017-06-23 18:23:53.011766 7fffdeff5700  1 -- 60.60.1.103:6789/0 --> 60.60.1.103:6789/0 -- log(1 entries from seq 8 at 2017-06-23 18:23:53.011756) v1 -- 0x55555f49e6c0 con 0
   -41> 2017-06-23 18:23:53.011771 7fffdeff5700 20 -- 60.60.1.103:6789/0 >> 60.60.1.103:6789/0 conn(0x55555f44f000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).send_message log(1 entries from seq 8 at 2017-06-23 18:23:53.011756) v1 local
   -40> 2017-06-23 18:23:53.011811 7fffddbf4700 20 -- 60.60.1.103:6789/0 queue 0x55555f49e6c0 prio 196
   -39> 2017-06-23 18:23:53.011829 7fffdeff5700  2 mon.iozone-103@0(leader) e2 send_reply 0x55555f0789a0 0x55555f49eb40 mon_command_ack([{"prefix": "status"}]=0  v0) v1
   -38> 2017-06-23 18:23:53.011832 7fffdeff5700  1 -- 60.60.1.103:6789/0 --> 60.60.1.102:0/1080485091 -- mon_command_ack([{"prefix": "status"}]=0  v0) v1 -- 0x55555f49eb40 con 0
   -37> 2017-06-23 18:23:53.011837 7fffdeff5700 15 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).send_message inline write is denied, reschedule m=0x55555f49eb40
   -36> 2017-06-23 18:23:53.011840 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).handle_write
   -35> 2017-06-23 18:23:53.011845 7fffdeff5700 10 -- 60.60.1.103:6789/0 dispatch_throttle_release 62 to dispatch throttler 62/104857600
   -34> 2017-06-23 18:23:53.011844 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).prepare_send_message m mon_command_ack([{"prefix": "status"}]=0  v0) v1
   -33> 2017-06-23 18:23:53.011848 7fffdeff5700 20 -- 60.60.1.103:6789/0 done calling dispatch on 0x55555f49e900
   -32> 2017-06-23 18:23:53.011848 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).prepare_send_message encoding features 1152323339925389307 0x55555f49eb40 mon_command_ack([{"prefix": "status"}]=0  v0) v1
   -31> 2017-06-23 18:23:53.011854 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).write_message signed m=0x55555f49eb40): sig = 0
   -30> 2017-06-23 18:23:53.011852 7fffdeff5700  1 -- 60.60.1.103:6789/0 <== mon.0 60.60.1.103:6789/0 0 ==== log(1 entries from seq 8 at 2017-06-23 18:23:53.011756) v1 ==== 0+0+0 (0 0 0) 0x55555f49e6c0 con 0x55555f44f000
   -29> 2017-06-23 18:23:53.011858 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).write_message sending message type=51 src mon.0 front=54 data=560 off 0
   -28> 2017-06-23 18:23:53.011862 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).write_message sending 9 0x55555f49eb40
   -27> 2017-06-23 18:23:53.011861 7fffdeff5700  5 mon.iozone-103@0(leader).paxos(paxos active c 1..108) is_readable = 1 - now=2017-06-23 18:23:53.011861 lease_expire=0.000000 has v0 lc 108
   -26> 2017-06-23 18:23:53.011867 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1)._try_send sent bytes 689 remaining bytes 0
   -25> 2017-06-23 18:23:53.011871 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).write_message sending 0x55555f49eb40 done.
   -24> 2017-06-23 18:23:53.011876 7fffea3fe700 20 dpdk  ipv4::get_packet len 709
   -23> 2017-06-23 18:23:53.011878 7fffea3fe700 20 dpdk  ipv4::send  id=36 60.60.1.103 -> 60.60.1.102 len 729
   -22> 2017-06-23 18:23:53.011878 7fffdeff5700 20 -- 60.60.1.103:6789/0 done calling dispatch on 0x55555f49e6c0
   -21> 2017-06-23 18:23:53.011879 7fffea3fe700 10 net === tx === proto 800 48:ea:63:5:72:5e -> 48:ea:63:5:72:31 length 743
   -20> 2017-06-23 18:23:53.015879 7fffea3fe700 10 net dispatch_packet === rx === proto 800 48:ea:63:5:72:31 -> 48:ea:63:5:72:5e length 60 rss_hash 1231707456
   -19> 2017-06-23 18:23:53.015884 7fffea3fe700 10 dpdk handle_received_packet get 6 packet from 60.60.1.102 -> 60.60.1.103 id=17 ip_len=40 ip_hdr_len=20 pkt_len=46 offset=0
   -18> 2017-06-23 18:23:53.015886 7fffea3fe700 20 dpdk handle_received_packet learn mac 48:ea:63:5:72:31 with 3c.3c.1.66
   -17> 2017-06-23 18:23:53.015888 7fffea3fe700 20 tcp 60.60.1.103:6789 -> 60.60.1.102:50627 tcb(0x55555efc36c0 fd=2 s=ESTABLISHED).input_handle_other_state tcp header seq 719263757 ack 3762778979 snd next 3762778979 unack 3762778290 rcv next 719263757 len 0 fin=1 syn=0
   -16> 2017-06-23 18:23:53.015891 7fffea3fe700 20 dpdk notify fd=2 mask=2
   -15> 2017-06-23 18:23:53.015892 7fffea3fe700 20 dpdk notify activing=2 listening=1 waiting_idx=0
   -14> 2017-06-23 18:23:53.015893 7fffea3fe700 20 dpdk notify activing=2 listening=1 waiting_idx=0 done 
   -13> 2017-06-23 18:23:53.015894 7fffea3fe700 20 tcp 60.60.1.103:6789 -> 60.60.1.102:50627 tcb(0x55555efc36c0 fd=2 s=ESTABLISHED).operator() window update seg_seq=719263757 seg_ack=3762778979 old window=29200 new window=7
   -12> 2017-06-23 18:23:53.015896 7fffea3fe700 20 dpdk notify fd=2 mask=1
   -11> 2017-06-23 18:23:53.015897 7fffea3fe700 20 dpdk notify activing=2 listening=1 waiting_idx=0
   -10> 2017-06-23 18:23:53.015898 7fffea3fe700 20 dpdk notify activing=3 listening=1 waiting_idx=1 done 
    -9> 2017-06-23 18:23:53.015898 7fffea3fe700 20 tcp 60.60.1.103:6789 -> 60.60.1.102:50627 tcb(0x55555efc36c0 fd=2 s=ESTABLISHED).input_handle_other_state fin: SYN_RECEIVED or ESTABLISHED -> CLOSE_WAIT
    -8> 2017-06-23 18:23:53.015901 7fffea3fe700 20 dpdk poll fd=2 mask=1
    -7> 2017-06-23 18:23:53.015902 7fffea3fe700 20 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).process prev state is STATE_OPEN
    -6> 2017-06-23 18:23:53.015906 7fffea3fe700  1 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).read_bulk peer close file descriptor 2
    -5> 2017-06-23 18:23:53.015909 7fffea3fe700  1 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).read_until read failed
    -4> 2017-06-23 18:23:53.015913 7fffea3fe700  1 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).process read tag failed
    -3> 2017-06-23 18:23:53.015916 7fffea3fe700  1 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).fault on lossy channel, failing
    -2> 2017-06-23 18:23:53.015918 7fffea3fe700  2 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1)._stop
    -1> 2017-06-23 18:23:53.015922 7fffea3fe700 10 -- 60.60.1.103:6789/0 >> 60.60.1.102:0/1080485091 conn(0x55555f059800 :6789 s=STATE_OPEN pgs=2 cs=1 l=1).discard_out_queue started
     0> 2017-06-23 18:23:53.017861 7fffea3fe700 -1 /home/sda4/lhp/dpdk_ceph/ceph-12.0.3/src/msg/async/Stack.h: In function 'void Worker::release_worker()' thread 7fffea3fe700 time 2017-06-23 18:23:53.015931
/home/sda4/lhp/dpdk_ceph/ceph-12.0.3/src/msg/async/Stack.h: 253: FAILED assert(oldref > 0)

 ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x128) [0x555555accea8]
 2: (AsyncConnection::_stop()+0x2be) [0x555555d4110e]
 3: (AsyncConnection::fault()+0x5fa) [0x555555d448ea]
 4: (AsyncConnection::process()+0x1388) [0x555555d507f8]
 5: (EventCenter::process_events(int)+0x352) [0x555555b7c782]
 6: (()+0x62c2da) [0x555555b802da]
 7: (()+0x63ad91) [0x555555b8ed91]
 8: (()+0xa1bb) [0x7ffff64841bb]
 9: (()+0x7df5) [0x7ffff738edf5]
 10: (clone()+0x6d) [0x7ffff34781ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
  20/20 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
  20/20 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
2017-06-23 18:23:53.079601 7fffee6b0700  1 -- 60.60.1.103:6789/0 --> 60.60.1.103:6789/0 -- log(last 8) v1 -- 0x55555f48e000 con 0
2017-06-23 18:23:53.079609 7fffee6b0700 20 -- 60.60.1.103:6789/0 >> 60.60.1.103:6789/0 conn(0x55555f44f000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).send_message log(last 8) v1 local
2017-06-23 18:23:53.079649 7fffddbf4700 20 -- 60.60.1.103:6789/0 queue 0x55555f48e000 prio 196
2017-06-23 18:23:53.079669 7fffdeff5700  1 -- 60.60.1.103:6789/0 <== mon.0 60.60.1.103:6789/0 0 ==== log(last 8) v1 ==== 0+0+0 (0 0 0) 0x55555f48e000 con 0x55555f44f000
2017-06-23 18:23:53.079695 7fffdeff5700 20 -- 60.60.1.103:6789/0 done calling dispatch on 0x55555f48e000
*** Caught signal (Aborted) **
 in thread 7fffea3fe700 thread_name:msgr-worker-0
 ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
 1: (()+0x8301f6) [0x555555d841f6]
 2: (()+0xf130) [0x7ffff7396130]
 3: (gsignal()+0x37) [0x7ffff33b75d7]
 4: (abort()+0x148) [0x7ffff33b8cc8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2a6) [0x555555acd026]
 6: (AsyncConnection::_stop()+0x2be) [0x555555d4110e]
 7: (AsyncConnection::fault()+0x5fa) [0x555555d448ea]
 8: (AsyncConnection::process()+0x1388) [0x555555d507f8]
 9: (EventCenter::process_events(int)+0x352) [0x555555b7c782]
 10: (()+0x62c2da) [0x555555b802da]
 11: (()+0x63ad91) [0x555555b8ed91]
 12: (()+0xa1bb) [0x7ffff64841bb]
 13: (()+0x7df5) [0x7ffff738edf5]
 14: (clone()+0x6d) [0x7ffff34781ad]
2017-06-23 18:23:53.542570 7fffea3fe700 -1 *** Caught signal (Aborted) **
 in thread 7fffea3fe700 thread_name:msgr-worker-0

 ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
 1: (()+0x8301f6) [0x555555d841f6]
 2: (()+0xf130) [0x7ffff7396130]
 3: (gsignal()+0x37) [0x7ffff33b75d7]
 4: (abort()+0x148) [0x7ffff33b8cc8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2a6) [0x555555acd026]
 6: (AsyncConnection::_stop()+0x2be) [0x555555d4110e]
 7: (AsyncConnection::fault()+0x5fa) [0x555555d448ea]
 8: (AsyncConnection::process()+0x1388) [0x555555d507f8]
 9: (EventCenter::process_events(int)+0x352) [0x555555b7c782]
 10: (()+0x62c2da) [0x555555b802da]
 11: (()+0x63ad91) [0x555555b8ed91]
 12: (()+0xa1bb) [0x7ffff64841bb]
 13: (()+0x7df5) [0x7ffff738edf5]
 14: (clone()+0x6d) [0x7ffff34781ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -12> 2017-06-23 18:23:53.062021 7fffe53fa700  5 mon.iozone-103@0(leader).paxos(paxos active c 1..108) queue_pending_finisher 0x55555ed32510
   -11> 2017-06-23 18:23:53.079581 7fffee6b0700  4 mgrc handle_mgr_map Got map version 1
   -10> 2017-06-23 18:23:53.079585 7fffee6b0700  4 mgrc handle_mgr_map Active mgr is now -
    -9> 2017-06-23 18:23:53.079587 7fffee6b0700  4 mgrc reconnect No active mgr available yet
    -8> 2017-06-23 18:23:53.079597 7fffee6b0700  2 mon.iozone-103@0(leader) e2 send_reply 0x55555f0789a0 0x55555f48e000 log(last 8) v1
    -7> 2017-06-23 18:23:53.079601 7fffee6b0700  1 -- 60.60.1.103:6789/0 --> 60.60.1.103:6789/0 -- log(last 8) v1 -- 0x55555f48e000 con 0
    -6> 2017-06-23 18:23:53.079609 7fffee6b0700 20 -- 60.60.1.103:6789/0 >> 60.60.1.103:6789/0 conn(0x55555f44f000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).send_message log(last 8) v1 local
    -5> 2017-06-23 18:23:53.079649 7fffddbf4700 20 -- 60.60.1.103:6789/0 queue 0x55555f48e000 prio 196
    -4> 2017-06-23 18:23:53.079669 7fffdeff5700  1 -- 60.60.1.103:6789/0 <== mon.0 60.60.1.103:6789/0 0 ==== log(last 8) v1 ==== 0+0+0 (0 0 0) 0x55555f48e000 con 0x55555f44f000
    -3> 2017-06-23 18:23:53.079682 7fffdeff5700 10 log_client handle_log_ack log(last 8) v1
    -2> 2017-06-23 18:23:53.079685 7fffdeff5700 10 log_client  logged 2017-06-23 18:23:53.011756 mon.0 60.60.1.103:6789/0 8 : audit [DBG] from='client.? 60.60.1.102:0/1080485091' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
    -1> 2017-06-23 18:23:53.079695 7fffdeff5700 20 -- 60.60.1.103:6789/0 done calling dispatch on 0x55555f48e000
     0> 2017-06-23 18:23:53.542570 7fffea3fe700 -1 *** Caught signal (Aborted) **
 in thread 7fffea3fe700 thread_name:msgr-worker-0

 ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
 1: (()+0x8301f6) [0x555555d841f6]
 2: (()+0xf130) [0x7ffff7396130]
 3: (gsignal()+0x37) [0x7ffff33b75d7]
 4: (abort()+0x148) [0x7ffff33b8cc8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2a6) [0x555555acd026]
 6: (AsyncConnection::_stop()+0x2be) [0x555555d4110e]
 7: (AsyncConnection::fault()+0x5fa) [0x555555d448ea]
 8: (AsyncConnection::process()+0x1388) [0x555555d507f8]
 9: (EventCenter::process_events(int)+0x352) [0x555555b7c782]
 10: (()+0x62c2da) [0x555555b802da]
 11: (()+0x63ad91) [0x555555b8ed91]
 12: (()+0xa1bb) [0x7ffff64841bb]
 13: (()+0x7df5) [0x7ffff738edf5]
 14: (clone()+0x6d) [0x7ffff34781ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
  20/20 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
  20/20 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
Aborted

Here is my configuration for dpdk

ms_async_op_threads=2
ms_dpdk_coremask = 0xF     
ms_dpdk_host_ipv4_addr = 60.60.1.103
ms_dpdk_netmask_ipv4_addr = 255.255.0.0
ms_dpdk_gateway_ipv4_addr = 60.60.1.0

ms_cluster_type = async+dpdk
ms_public_type = async+dpdk
public_addr = 60.60.1.103
cluster_addr = 60.60.1.103

ms_dpdk_port_id = 0
ms_dpdk_hugepages=/mnt/huge
ms_dpdk_rx_buffer_count_per_core = 2048
ms_dpdk_memory_channel = 2

ceph's version info:
[root@iozone-103 ~]# ceph -v
ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)

History

#1 Updated by Haomai Wang over 1 year ago

  • Assignee set to Haomai Wang

I will give a look at this later

#3 Updated by hongpeng lu over 1 year ago

Haomai Wang wrote:

https://github.com/ceph/ceph/pull/15897

It seems ok. Thanks.
Can I ask one more question?
Can two osd daemon share the same network port when using asny+dpdk as network backend?

#4 Updated by Haomai Wang over 1 year ago

no, you must use sriov to separate

#5 Updated by Kefu Chai over 1 year ago

  • Status changed from New to Need Review

#6 Updated by Kefu Chai over 1 year ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF