Project

General

Profile

Bug #19906

Random Segmentation fault thread_name:civetweb-worker

Added by Christopher Hubrich about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
05/11/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I'm using ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7) on Ubuntu 16.04.

Sometimes (once a week) I've got a segmentation fault on three different hosts:

   -31> 2017-05-10 20:24:30.722179 7f5ba36f2700  1 ====== starting new request req=0x7f5ba36ec400 =====
   -30> 2017-05-10 20:24:30.722227 7f5ba36f2700  2 req 600989:0.000047::HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr::initializing for trans_id = tx000000000000000092b9d-0059135ade-3583ad7-default
   -29> 2017-05-10 20:24:30.725900 7f5ba36f2700  2 req 600989:0.003701:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr::getting op 3
   -28> 2017-05-10 20:24:30.725917 7f5ba36f2700  2 req 600989:0.003738:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:authorizing
   -27> 2017-05-10 20:24:30.726439 7f5ba36f2700  2 req 600989:0.004260:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:normalizing buckets and tenants
   -26> 2017-05-10 20:24:30.726454 7f5ba36f2700  2 req 600989:0.004275:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:init permissions
   -25> 2017-05-10 20:24:30.726506 7f5ba36f2700  2 req 600989:0.004327:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:recalculating target
   -24> 2017-05-10 20:24:30.726517 7f5ba36f2700  2 req 600989:0.004338:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:reading permissions
   -23> 2017-05-10 20:24:30.727456 7f5ba36f2700  1 -- 192.168.100.218:0/3133317310 --> 10.2.2.21:6802/2504 -- osd_op(unknown.0.0:3688744 13.ba3a0099 ba8d33be-b3d5-4df6-97bc-cc5613f85f71.13054100.3_2017/04/23/2017-04-23_18:45:36_inet_Radio Top_.48.fpr [getxattrs,stat] snapc 0=[] ack+read+known_if_redirected e2229) v7 -- 0x562ed70afd40 con 0
   -22> 2017-05-10 20:24:30.727534 7f5ba36f2700  2 Event(0x562ed4e09080 nevent=5000 time_id=1381).wakeup
   -21> 2017-05-10 20:24:30.729108 7f5bc6738700  5 -- 192.168.100.218:0/3133317310 >> 10.2.2.21:6802/2504 conn(0x562ed537b800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=363970 cs=1 l=1). rx osd.4 seq 165106 0x562ed70afd40 osd_op_reply(3688744 ba8d33be-b3d5-4df6-97bc-cc5613f85f71.13054100.3_2017/04/23/2017-04-23_18:45:36_inet_Radio Top_.48.fpr [getxattrs,stat] v0'0 uv1393020 ondisk = 0) v7
   -20> 2017-05-10 20:24:30.729148 7f5bc6738700  1 -- 192.168.100.218:0/3133317310 <== osd.4 10.2.2.21:6802/2504 165106 ==== osd_op_reply(3688744 ba8d33be-b3d5-4df6-97bc-cc5613f85f71.13054100.3_2017/04/23/2017-04-23_18:45:36_inet_Radio Top_.48.fpr [getxattrs,stat] v0'0 uv1393020 ondisk = 0) v7 ==== 263+0+1272 (3748569282 0 3588045552) 0x562ed70afd40 con 0x562ed537b800
   -19> 2017-05-10 20:24:30.729982 7f5ba36f2700  2 req 600989:0.007781:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:init op
   -18> 2017-05-10 20:24:30.730002 7f5ba36f2700  2 req 600989:0.007823:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:verifying op mask
   -17> 2017-05-10 20:24:30.730005 7f5ba36f2700  2 req 600989:0.007826:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:verifying op permissions
   -16> 2017-05-10 20:24:30.730013 7f5ba36f2700  5 Searching permissions for identity=RGWDummyIdentityApplier(auth_id=sfscan, perm_mask=15, is_admin=0) mask=49
   -15> 2017-05-10 20:24:30.730057 7f5ba36f2700  5 Searching permissions for uid=sfscan
   -14> 2017-05-10 20:24:30.730069 7f5ba36f2700  5 Found permission: 15
   -13> 2017-05-10 20:24:30.730070 7f5ba36f2700  5 Searching permissions for group=1 mask=49
   -12> 2017-05-10 20:24:30.730073 7f5ba36f2700  5 Permissions for group not found
   -11> 2017-05-10 20:24:30.730075 7f5ba36f2700  5 Searching permissions for group=2 mask=49
   -10> 2017-05-10 20:24:30.730077 7f5ba36f2700  5 Permissions for group not found
    -9> 2017-05-10 20:24:30.730095 7f5ba36f2700  5 Getting permissions identity=RGWDummyIdentityApplier(auth_id=sfscan, perm_mask=15, is_admin=0) owner=sfscan perm=1
    -8> 2017-05-10 20:24:30.730101 7f5ba36f2700  2 req 600989:0.007922:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:verifying op params
    -7> 2017-05-10 20:24:30.730103 7f5ba36f2700  2 req 600989:0.007924:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:pre-executing
    -6> 2017-05-10 20:24:30.730105 7f5ba36f2700  2 req 600989:0.007926:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:executing
    -5> 2017-05-10 20:24:30.730600 7f5ba36f2700  2 req 600989:0.008421:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:completing
    -4> 2017-05-10 20:24:30.730614 7f5ba36f2700  2 req 600989:0.008434:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:op status=0
    -3> 2017-05-10 20:24:30.730616 7f5ba36f2700  2 req 600989:0.008437:s3:HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr:get_obj:http status=200
    -2> 2017-05-10 20:24:30.730621 7f5ba36f2700  1 ====== req done req=0x7f5ba36ec400 op status=0 http_status=200 ======
    -1> 2017-05-10 20:24:30.730670 7f5ba36f2700  1 civetweb: 0x562ed50e3000: 192.168.100.213 - - [10/May/2017:20:24:30 +0200] "HEAD /amsfilefpr/2017/04/23/2017-04-23_18%3A45%3A36_inet_Radio%20Top_.48.fpr HTTP/1.1" 1 0 - aws-sdk-php/3.27.0 GuzzleHttp/6.2.1 PHP/5.5.9-1ubuntu4.20
     0> 2017-05-10 20:24:31.796844 7f5b996de700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f5b996de700 thread_name:civetweb-worker

 ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
 1: (()+0x59238e) [0x562eca37138e]
 2: (()+0x11390) [0x7f5bcf3c8390]
 3: (strlen()+0x26) [0x7f5bcdae2b96]
 4: (()+0x25d0c9) [0x562eca03c0c9]
 5: (()+0x25eeb9) [0x562eca03deb9]
 6: (()+0x76ba) [0x7f5bcf3be6ba]
 7: (clone()+0x6d) [0x7f5bcdb5e82d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-client.rgw.s3gwhh2.log
--- end dump of recent events ---

Are there any tips for me?

History

#1 Updated by Ben Hines about 2 years ago

I suspect this is the same as my crash: http://tracker.ceph.com/issues/19704 - though your offsets are different because you're on Ubuntu rather than CentOs. I see strlen() there which would indicate that it's the same stack most likely.

Also available in: Atom PDF