Project

General

Profile

Actions

Bug #61585

open

OSD segfault in PG::put()

Added by Ilya Dryomov 12 months ago. Updated 11 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/dis-2023-06-04_22:59:23-krbd-main-wip-exclusive-lock-snapc-default-smithi/7295992/teuthology.log
http://qa-proxy.ceph.com/teuthology/dis-2023-06-04_22:59:23-krbd-main-wip-exclusive-lock-snapc-default-smithi/7295992/remote/smithi017/coredump/1685924286.30159.core.gz

   -10> 2023-06-05T00:18:05.876+0000 7f6906255700 20 osd.5 op_wq(2) _process empty q, waiting
    -9> 2023-06-05T00:18:05.876+0000 7f6917a78700 20 maybe_unpin 0x55b06c827000 #2:40000000::::head# touched
    -8> 2023-06-05T00:18:05.876+0000 7f6917a78700 20 bluestore(/var/lib/ceph/osd/ceph-5) _kv_finalize_thread sleep
    -7> 2023-06-05T00:18:05.884+0000 7f690a25d700 20 osd.5 op_wq(2) _process empty q, waiting
    -6> 2023-06-05T00:18:05.920+0000 7f691b27f700 20 bluestore.MempoolThread(0x55b06bad1b60) _resize_shards cache_size: 2845415832 kv_alloc: 1207959552 kv_used: 1184 kv_onode_alloc: 234881024 kv_onode_used: 2809536 meta_alloc: 1140850688 meta_used: 1150534 data_alloc: 218103808 data_used: 0
    -5> 2023-06-05T00:18:05.972+0000 7f691b27f700 20 bluestore.MempoolThread(0x55b06bad1b60) _resize_shards cache_size: 2845415832 kv_alloc: 1207959552 kv_used: 1184 kv_onode_alloc: 234881024 kv_onode_used: 2809536 meta_alloc: 1140850688 meta_used: 1150534 data_alloc: 218103808 data_used: 0
    -4> 2023-06-05T00:18:06.020+0000 7f691b27f700 20 bluestore.MempoolThread(0x55b06bad1b60) _resize_shards cache_size: 2845415832 kv_alloc: 1207959552 kv_used: 1184 kv_onode_alloc: 234881024 kv_onode_used: 2809536 meta_alloc: 1140850688 meta_used: 1150534 data_alloc: 218103808 data_used: 0
    -3> 2023-06-05T00:18:06.036+0000 7f6921bf5700 10 osd.5 25 tick
    -2> 2023-06-05T00:18:06.036+0000 7f6921bf5700 10 osd.5 25 maybe_update_heartbeat_peers updating
    -1> 2023-06-05T00:18:06.048+0000 7f6921bf5700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f6921bf5700 thread_name:safe_timer

 ceph version 18.0.0-4275-g4d3e9642 (4d3e9642f733c6e311d35e619922bd92ca033fbd) reef (dev)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f692da8f420]
 2: (PG::put(char const*)+0x65) [0x55b06843ea15]
 3: (std::vector<boost::intrusive_ptr<PG>, std::allocator<boost::intrusive_ptr<PG> > >::~vector()+0x39) [0x55b0683dd729]
 4: (OSD::maybe_update_heartbeat_peers()+0xe12) [0x55b06838d5a2]
 5: (OSD::tick()+0x616) [0x55b0683b3a86]
 6: (Context::complete(int)+0xd) [0x55b0683c763d]
 7: (CommonSafeTimer<std::mutex>::timer_thread()+0x12c) [0x55b068a2afbc]
 8: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x55b068a2c051]
 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f692da83609]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2023-06-05T00:18:06.072+0000 7f691b27f700 20 bluestore.MempoolThread(0x55b06bad1b60) _resize_shards cache_size: 2845415832 kv_alloc: 1207959552 kv_used: 1184 kv_onode_alloc: 234881024 kv_onode_used: 2809536 meta_alloc: 1140850688 meta_used: 1150534 data_alloc: 218103808 data_used: 0

This happened on vanilla main with a very recent kernel installed on the nodes but I don't think it's related.

The coredump binary collection sub-task is still broken, I'm going to file a separate ticket for that.

Actions #1

Updated by Ilya Dryomov 12 months ago

Ilya Dryomov wrote:

The coredump binary collection sub-task is still broken, I'm going to file a separate ticket for that.

https://tracker.ceph.com/issues/61586

Actions #2

Updated by Radoslaw Zarzynski 12 months ago

Does it happen on main or solely in the testing branch?

Actions #3

Updated by Ilya Dryomov 12 months ago

This was a vanilla main branch as far as Ceph bits are concerned (https://github.com/ceph/ceph/commit/4d3e9642f733c6e311d35e619922bd92ca033fbd), nothing else involved apart from 6.4-rc5 based kernel. It happened just once so I don't have any further data to share.

Actions #4

Updated by Radoslaw Zarzynski 11 months ago

  • Status changed from New to Need More Info
Actions

Also available in: Atom PDF