Project

General

Profile

Bug #45437

Segmentation fault in ImageCtx and ObjectCacher::FlusherThread of removed image to CephContext

Added by Sven Anderson almost 4 years ago. Updated over 3 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When an image is opened, it can't be removed without closing it before. (Error: device is busy) However, if the image is opened read only, it is possible to remove the image without closing it. As a result the image object still contains a pointer (at least in ImageCtx and ObjectCacher::FlusherThread ) to a CephContext object, that is destroyed together with the RadosClient object during the rados_shutdown() call here . Since there is no bookkeeping, what is referencing the CephContext object (for example with a signal/slot or other publish/subscribe mechanism), the invalid objects can't be cleared up automatically when CephContext is destroyed. I therefore would expect that removing the opened image fails.

I observed several segmentation faults from these dangling pointers, some in race conditions in the FlusherThread, but the easiest way to reproduce it is to close the image after the cluster handle has been shut down (see attached CPP file).

The dereferencing of the pointer to the CephContext object usually seems to happen in the log module.

The backtrace looks like that:

(gdb) bt
#0  0x00007fde19938d00 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007fde1bdee111 in ceph::logging::Log::submit_entry(ceph::logging::Entry&&) ()
   from /usr/lib64/ceph/libceph-common.so.0
#2  0x00007fde25335995 in librbd::Watcher::block_notifies(Context*) () from /lib64/librbd.so.1
#3  0x00007fde252e1a3e in librbd::ImageWatcher<librbd::ImageCtx>::block_notifies(Context*) () from /lib64/librbd.so.1
#4  0x00007fde253c4779 in librbd::image::CloseRequest<librbd::ImageCtx>::send_block_image_watcher() ()
   from /lib64/librbd.so.1
#5  0x00007fde252d106a in librbd::ImageState<librbd::ImageCtx>::send_close_unlock() () from /lib64/librbd.so.1
#6  0x00007fde252d18c5 in librbd::ImageState<librbd::ImageCtx>::execute_next_action_unlock() ()
   from /lib64/librbd.so.1
#7  0x00007fde252d25d3 in librbd::ImageState<librbd::ImageCtx>::close(Context*) () from /lib64/librbd.so.1
#8  0x00007fde252d26ef in librbd::ImageState<librbd::ImageCtx>::close() () from /lib64/librbd.so.1
#9  0x00007fde2527e88c in librbd::Image::close() () from /lib64/librbd.so.1
#10 0x0000000000401850 in main (argc=<optimized out>, argv=0x7ffcb830d0a8) at test.cpp:15

If it happens in a race condition in FlusherThread it looks like this:

Thread 22 (Thread 0x7fffcdffb700 (LWP 2630)):
#0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x38) at ../nptl/pthread_mutex_lock.c:67
#1  0x00007fffee4cdb7c in ceph::logging::Log::submit_entry (this=0x0, e=e@entry=0x7fffc80008c0)
    at /build/ceph-12.2.12/src/log/Log.cc:213
#2  0x00007ffff7abfff3 in ObjectCacher::flusher_entry (this=0x7fffbc05ce60)
    at /build/ceph-12.2.12/src/osdc/ObjectCacher.cc:1928
#3  0x00007ffff7acd98d in ObjectCacher::FlusherThread::entry (this=<optimized out>)
    at /build/ceph-12.2.12/src/osdc/ObjectCacher.h:433
#4  0x00007ffff73246ba in start_thread (arg=0x7fffcdffb700) at pthread_create.c:333
#5  0x00007ffff705a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

test.cpp View (5.29 KB) Sven Anderson, 05/07/2020 08:58 PM

History

#1 Updated by Jason Dillaman almost 4 years ago

  • Status changed from New to Need More Info

How does this crash manifest itself using the "rbd" CLI or other RBD tooling? Your problem description is pretty vague as to what steps you are actually performing.

#2 Updated by Jason Dillaman almost 4 years ago

  • Status changed from Need More Info to Won't Fix

This is a bug in go-ceph [1] -- an open RBD image was left open and then the RADOS cluster connection was closed. This leaves all sorts of threads and stale references open so it's a bigger issue than just CephContext. However, if the API is used correctly, there is no unexpected behavior.

[1] https://github.com/ceph/go-ceph/issues/122

#3 Updated by Sven Anderson almost 4 years ago

Jason Dillaman wrote:

an open RBD image was left open and then the RADOS cluster connection was closed. This leaves all sorts of threads and stale references open so it's a bigger issue than just CephContext. However, if the API is used correctly, there is no unexpected behavior.

Of course, as I wrote in the description "the image object still contains a pointer to a CephContext object, that is destroyed together with the RadosClient object during the rados_shutdown() call", so I assumed it's clear that it happens after the cluster connection has been closed. While it's true that the image has never been closed, it is a bit misleading, since the removal of the image succeeds, and one might expect that a successfully removed image can't be open anymore and has been implicitly closed. I would at least expect that the removal of an open image fails, as it does for an opened writable image.

#4 Updated by Jason Dillaman almost 4 years ago

Sven Anderson wrote:

Of course, as I wrote in the description "the image object still contains a pointer to a CephContext object, that is destroyed together with the RadosClient object during the rados_shutdown() call", so I assumed it's clear that it happens after the cluster connection has been closed. While it's true that the image has never been closed, it is a bit misleading, since the removal of the image succeeds, and one might expect that a successfully removed image can't be open anymore and has been implicitly closed. I would at least expect that the removal of an open image fails, as it does for an opened writable image.

#1) I disagree that your problem description was in anyway clear -- hence my request for more information. #2) the C/C++/Python API methods do not have a "remove" method on the Image and it doesn't matter if you remove an image from underneath it when in R/O mode (it won't crash, it just will return all zeros for reads). This is 100% purely a bad assumption from using the go-ceph library.

#5 Updated by Michael Adam almost 4 years ago

Jason Dillaman wrote:

Sven Anderson wrote:

Of course, as I wrote in the description "the image object still contains a pointer to a CephContext object, that is destroyed together with the RadosClient object during the rados_shutdown() call", so I assumed it's clear that it happens after the cluster connection has been closed. While it's true that the image has never been closed, it is a bit misleading, since the removal of the image succeeds, and one might expect that a successfully removed image can't be open anymore and has been implicitly closed. I would at least expect that the removal of an open image fails, as it does for an opened writable image.

#1) I disagree that your problem description was in anyway clear -- hence my request for more information.

Jason, I don't understand why you are so negative/dismissive... I think the description was fairly clear. Maybe the problem with the description was that it tried to already provide a certain amount of explanation/analysis along with the problem description.

#2) the C/C++/Python API methods do not have a "remove" method on the Image

Yeah, there is the rbd.remove() method which removes the image by the name you pass in.

and it doesn't matter if you remove an image from underneath it when in R/O mode (it won't crash, it just will return all zeros for reads).

The remove call itself does not crash. But other, later calls do crash. This is what the issue points out. The C/C++/Python API does not protect against this. And the question that this issue raises is whether this is a bug in these APIs or intended behavior. At the very least it should be acknowledged that the behaviour is inconsistent with the that of the corresponding calls on a writeable image.

This is 100% purely a bad assumption from using the go-ceph library.

Well, if you read the description thoroughly, you will see that it does not mention the go-ceph library at all. It even provides a C++ test program to demonstrate the problem. (It is true that this was triggered by work on go-ceph, which mainly replicates behaviour of the C++ libs, and Sven elimianted go-ceph from the picture deliberately to be able to concentrate on the core libs. It was you who brought go-ceph into the picture in this issue.) So I am not sure why, under these circumstances, you are claiming it is is a bug in the go-ceph lib, when this is not even in the original description... :-)

Instead of picking on go-ceph, you could comment on the C++ test program. It is a program that compiles and demonstrates a problem. Maybe it doesn't use the lib as it is intended to be used, but then, it could still be called bad library design if the lib allows for this bad use and let's the user run into segfaults instead of preventing the bad use and thereby the crash...

So can we agree on the following statements?

  • Removing an open read-only rbd image is successful, which is inconsistent with the behavior for a writeable image.
  • The remove call does not crash but later calls do.
  • This is not related to go-ceph, it happens with the core C++ libs.

If you call this desired behavior, this is a point of view that we can still talk about, but it can't be blamed to go-ceph as far as I understand.

#6 Updated by Jason Dillaman over 3 years ago

Michael Adam wrote:

Jason, I don't understand why you are so negative/dismissive... I think the description was fairly clear. Maybe the problem with the description was that it tried to already provide a certain amount of explanation/analysis along with the problem description.

Michael, if you want to debate this with me, perhaps we can have a face-to-face meeting. I am not trying to be intentionally negative but I truly see this as not a bug in the API. I still don't see a clear reproducer in the problem description. In the first comment I asked for more information and then Sven sent me a dismissive email and at least finally referenced the originating ticket from the go-ceph project. I honestly was concerned that there was an in-the-wild issue where just calling "rbd remove" or just using the "rbd_remove" API would lead to a scenario where the image was opened (internally within the API), deleted, and not properly cleaned up resulting in a crash.

The remove call itself does not crash. But other, later calls do crash. This is what the issue points out. The C/C++/Python API does not protect against this. And the question that this issue raises is whether this is a bug in these APIs or intended behavior. At the very least it should be acknowledged that the behaviour is inconsistent with the that of the corresponding calls on a writeable image.

There is no "remove" call in the C/C++/Python API that takes an open image reference pointer like go-ceph API did. When using Sven's approach, I can crash librbd even without calling the "rbd_remove" API -- it's not the relevant portion of the bug. The relevant portion was that he (via go-ceph) didn't close the image and that's how he hit this case.

This is 100% purely a bad assumption from using the go-ceph library.

Well, if you read the description thoroughly, you will see that it does not mention the go-ceph library at all. It even provides a C++ test program to demonstrate the problem. (It is true that this was triggered by work on go-ceph, which mainly replicates behaviour of the C++ libs, and Sven elimianted go-ceph from the picture deliberately to be able to concentrate on the core libs. It was you who brought go-ceph into the picture in this issue.) So I am not sure why, under these circumstances, you are claiming it is is a bug in the go-ceph lib, when this is not even in the original description... :-)

Agreed, but this was a bad API in go-ceph that led to bad assumptions for how the library should be used. As already mentioned, if you don't close the image, you will have lots more issues than just a CephContext being dereferenced. i.e. even if you prevented the crash, you just introduced a massive potential memory and thread leak to your program. Since go-ceph will be used by long-running processes like the ceph-csi, I'd choose to crash to quickly see that the API was being used incorrectly.

So can we agree on the following statements?

  • Removing an open read-only rbd image is successful, which is inconsistent with the behavior for a writeable image.

Yes its different behavior, but not for the reasons listed. There is no harm reading from an image that is deleted / being deleted. However, if you could write to an image while it's being deleted, it will lead to a leak of backing RADOS objects since w/o the image there would be no process to clean those up.

  • The remove call does not crash but later calls do.

Correct -- it was the bad assumption about leaving a dangling in-use image that was never closed. It has nothing to do w/ the "remove" -- it was a red-herring in this case and the background information from the go-ceph ticket provided the needed context.

  • This is not related to go-ceph, it happens with the core C++ libs.

I can crash lots of C/C++ APIs by incorrect usage.

If you call this desired behavior, this is a point of view that we can still talk about, but it can't be blamed to go-ceph as far as I understand.

It was caused by incorrect usage of the C API by go-ceph. Again, I am willing to talk in a better forum than writing back and forth on a ticket.

#7 Updated by Sven Anderson over 3 years ago

Jason Dillaman wrote:

Michael Adam wrote:

Jason, I don't understand why you are so negative/dismissive... I think the description was fairly clear. Maybe the problem with the description was that it tried to already provide a certain amount of explanation/analysis along with the problem description.

Thanks Michael for taking the time looking into this. I appreciate your point of view.

I am not trying to be intentionally negative but I truly see this as not a bug in the API. I still don't see a clear reproducer in the problem description. In the first comment I asked for more information and then Sven sent me a dismissive email and at least finally referenced the originating ticket from the go-ceph project. I honestly was concerned that there was an in-the-wild issue where just calling "rbd remove" or just using the "rbd_remove" API would lead to a scenario where the image was opened (internally within the API), deleted, and not properly cleaned up resulting in a crash.

I just checked the two emails that I wrote to you. I explicitly appreciated your reply, and in both of them I made clear, that you might not consider that as a bug and I'm totally fine with that. So a simple "This crash is expected, we don't consider that as a bug in the API." would have solved the whole issue. If there was something dismissive that I'm missing, please let me know so I can avoid that in the future. (Also, as explained in the email, I wrote it before your comment, and found it in my draft folder later. So at that time I wasn't even aware of your comment here.)

The test.cpp crashes in the destructor of the Image object, as also shown in the provided backtrace, which is called after the cluster handle shutdown. So you can totally say that's not a bug, but if you say "I still don't see a clear reproducer" I'm quite lost.

There is no "remove" call in the C/C /Python API that takes an open image reference pointer like go-ceph API did. When using Sven's approach, I can crash librbd even without calling the "rbd_remove" API -- it's not the relevant portion of the bug. The relevant portion was that he (via go-ceph) didn't close the image and that's how he hit this case.

My second sentence in this ticket points out that it happens when removing the image without closing it and shutting down the rados handle afterwards. That's not a finding, but a precondition of this crash.

Agreed, but this was a bad API in go-ceph that led to bad assumptions for how the library should be used. As already mentioned, if you don't close the image, you will have lots more issues than just a CephContext being dereferenced. i.e. even if you prevented the crash, you just introduced a massive potential memory and thread leak to your program. Since go-ceph will be used by long-running processes like the ceph-csi, I'd choose to crash to quickly see that the API was being used incorrectly.

Would have been great to explain that calmly from the beginning, together with a friendly "not a bug", instead of indulging in an irrelevant go-ceph rant. It's neither important, where a "bad" (wrong) assumption comes from, nor is the absence of assumptions a prerequisite for filing an issue. That's an inherent part of ticket triage, to straighten up wrong assumptions.

Again, I am willing to talk in a better forum than writing back and forth on a ticket.

I agree, since this turns into a discussion about communication styles now, we should continue somewhere else.

#8 Updated by Jason Dillaman over 3 years ago

Feel free to schedule a meeting on my calendar if you are really this bothered. I re-read all my comments and for the life of me, I really can't figure out why you appear to be so upset with me. This bug will remain as a "wont-fix" / "not a bug".

Also available in: Atom PDF