Project

General

Profile

Bug #24667

osd: SIGSEGV in MMgrReport::encode_payload

Added by Patrick Donnelly 6 months ago. Updated 5 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
06/26/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
multimds
Component(RADOS):
Pull request ID:

Description

2018-06-23T12:40:12.014 INFO:tasks.ceph.osd.1.smithi023.stderr:tcmalloc: large alloc 4294967296 bytes == 0x55ae816bc000 @  0x7f78afbdb182 0x7f78afbfbcb2 0x7f78b0b72758 0x7f78b0c322ba 0x7f78b0c400e9 0x7f78b0ce12ee 0x7f78b0ce8537 0x7f78b0d01747 0x7f78b0d040ac 0x7f78b103a74f 0x7f78af2496ba 0x7f78ae85841d (nil)
2018-06-23T12:40:12.758 INFO:tasks.workunit.client.0.smithi160.stdout:   1    124757     1.36 MB/sec  execute 362 sec  latency 129.316 ms
2018-06-23T12:40:12.793 INFO:tasks.workunit.client.1.smithi160.stdout:   1    125338     1.37 MB/sec  execute 362 sec  latency 775.330 ms
2018-06-23T12:40:13.758 INFO:tasks.workunit.client.0.smithi160.stdout:   1    124757     1.35 MB/sec  execute 363 sec  latency 1129.401 ms
2018-06-23T12:40:13.792 INFO:tasks.workunit.client.1.smithi160.stdout:   1    125338     1.37 MB/sec  execute 363 sec  latency 1775.462 ms
2018-06-23T12:40:14.377 INFO:tasks.ceph.osd.1.smithi023.stderr:*** Caught signal (Segmentation fault) **
2018-06-23T12:40:14.377 INFO:tasks.ceph.osd.1.smithi023.stderr: in thread 7f78aac68700 thread_name:msgr-worker-1
2018-06-23T12:40:14.378 INFO:tasks.ceph.osd.1.smithi023.stderr: ceph version 14.0.0-787-g8f48616 (8f4861641855b60e687113ea0c79b428042ba302) nautilus (dev)
2018-06-23T12:40:14.378 INFO:tasks.ceph.osd.1.smithi023.stderr: 1: (()+0x11390) [0x7f78af253390]
2018-06-23T12:40:14.378 INFO:tasks.ceph.osd.1.smithi023.stderr: 2: (MMgrReport::encode_payload(unsigned long)+0x5a0) [0x7f78b0c32360]
2018-06-23T12:40:14.378 INFO:tasks.ceph.osd.1.smithi023.stderr: 3: (Message::encode(unsigned long, int)+0x29) [0x7f78b0c400e9]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 4: (AsyncConnection::prepare_send_message(unsigned long, Message*, ceph::buffer::list&)+0x4e) [0x7f78b0ce12ee]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 5: (AsyncConnection::handle_write()+0x1d7) [0x7f78b0ce8537]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 6: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa17) [0x7f78b0d01747]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 7: (()+0x4280ac) [0x7f78b0d040ac]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 8: (()+0x75e74f) [0x7f78b103a74f]
2018-06-23T12:40:14.379 INFO:tasks.ceph.osd.1.smithi023.stderr: 9: (()+0x76ba) [0x7f78af2496ba]
2018-06-23T12:40:14.380 INFO:tasks.ceph.osd.1.smithi023.stderr: 10: (clone()+0x6d) [0x7f78ae85841d]

From: /ceph/teuthology-archive/pdonnell-2018-06-23_02:20:27-multimds-wip-pdonnell-testing-20180622.235254-testing-basic-smithi/2693253/teuthology.log


Related issues

Related to RADOS - Bug #23352: osd: segfaults under normal operation Resolved 03/14/2018

History

#1 Updated by Patrick Donnelly 6 months ago

  • Project changed from Ceph to RADOS

#2 Updated by Josh Durgin 6 months ago

  • Related to Bug #23352: osd: segfaults under normal operation added

#3 Updated by Josh Durgin 6 months ago

Possibly related to a memory corruption we've been seeing related to mgr health reporting on the osd.

#4 Updated by Josh Durgin 5 months ago

  • Priority changed from Urgent to High

downgrading due to lack of recurrence

Also available in: Atom PDF