Actions
Bug #51100
closedobject_recorder->is_closed() assert failure in JournalRecorder::open_object_set() in nautilus
Status:
Won't Fix - EOL
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-06-04T10:05:56.894 INFO:tasks.qemu.client.0.smithi187.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21-317-g829d9986/rpm/el7/BUILD/ceph-14.2.21-317-g829d9986/src/journal/JournalRecorder.cc: In function 'void journal::JournalRecorder::open_object_set()' thread 7f6864d3d700 time 2021-06-04 10:05:56.900984 2021-06-04T10:05:56.895 INFO:tasks.qemu.client.0.smithi187.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21-317-g829d9986/rpm/el7/BUILD/ceph-14.2.21-317-g829d9986/src/journal/JournalRecorder.cc: 266: FAILED ceph_assert(object_recorder->is_closed()) 2021-06-04T10:05:56.896 INFO:tasks.qemu.client.0.smithi187.stderr: ceph version 14.2.21-317-g829d9986445 (829d99864457b1c6cc2479d2b96f1a1ddf99f265) nautilus (stable) 2021-06-04T10:05:56.896 INFO:tasks.qemu.client.0.smithi187.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f687dd283e7] 2021-06-04T10:05:56.896 INFO:tasks.qemu.client.0.smithi187.stderr: 2: (()+0x25d5af) [0x7f687dd285af] 2021-06-04T10:05:56.896 INFO:tasks.qemu.client.0.smithi187.stderr: 3: (()+0x37437d) [0x7f688bfce37d] 2021-06-04T10:05:56.896 INFO:tasks.qemu.client.0.smithi187.stderr: 4: (()+0x374f84) [0x7f688bfcef84] 2021-06-04T10:05:56.897 INFO:tasks.qemu.client.0.smithi187.stderr: 5: (()+0xabbc9) [0x7f688bd05bc9] 2021-06-04T10:05:56.897 INFO:tasks.qemu.client.0.smithi187.stderr: 6: (()+0x387868) [0x7f688bfe1868] 2021-06-04T10:05:56.897 INFO:tasks.qemu.client.0.smithi187.stderr: 7: (()+0x918b6) [0x7f688b9a08b6] 2021-06-04T10:05:56.897 INFO:tasks.qemu.client.0.smithi187.stderr: 8: (()+0x593b9) [0x7f688b9683b9] 2021-06-04T10:05:56.898 INFO:tasks.qemu.client.0.smithi187.stderr: 9: (Finisher::finisher_thread_entry()+0x16f) [0x7f687ddb345f] 2021-06-04T10:05:56.898 INFO:tasks.qemu.client.0.smithi187.stderr: 10: (()+0x7ea5) [0x7f688c751ea5] 2021-06-04T10:05:56.898 INFO:tasks.qemu.client.0.smithi187.stderr: 11: (clone()+0x6d) [0x7f6888fce9fd]
253 void JournalRecorder::open_object_set() { 254 ceph_assert(m_lock.is_locked()); 255 256 ldout(m_cct, 10) << "opening object set " << m_current_set << dendl; 257 258 uint8_t splay_width = m_journal_metadata->get_splay_width(); 259 260 lock_object_recorders(); 261 for (ObjectRecorderPtrs::iterator it = m_object_ptrs.begin(); 262 it != m_object_ptrs.end(); ++it) { 263 ObjectRecorderPtr object_recorder = it->second; 264 uint64_t object_number = object_recorder->get_object_number(); 265 if (object_number / splay_width != m_current_set) { 266 ceph_assert(object_recorder->is_closed()); 267 268 // ready to close object and open object in active set 269 create_next_object_recorder(object_recorder); 270 } 271 } 272 unlock_object_recorders(); 273 }
Updated by Ilya Dryomov almost 3 years ago
(gdb) bt #0 0x00007f6888f063d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f6888f07ac8 in __GI_abort () at abort.c:90 #2 0x00007f687dd28436 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/common/assert.cc:73 #3 0x00007f687dd285af in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/common/assert.cc:78 #4 0x00007f688bfce37d in journal::JournalRecorder::open_object_set() () at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/journal/ObjectRecorder.h:65 #5 0x00007f688bfcef84 in journal::JournalRecorder::handle_advance_object_set(int) () at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/journal/JournalRecorder.cc:244 #6 0x00007f688bd05bc9 in Context::complete (this=0x5573cbaa0d30, r=<optimized out>) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/include/Context.h:77 #7 0x00007f688bfe1868 in complete (r=<optimized out>, this=0x5573c5f5e840) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/include/Context.h:77 #8 journal::utils::rados_ctx_callback (c=<optimized out>, arg=0x5573c5f5e840) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/journal/Utils.cc:21 #9 0x00007f688b9a08b6 in librados::C_AioComplete::finish (this=0x5573c5048d50, r=<optimized out>) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/librados/AioCompletionImpl.h:172 #10 0x00007f688b9683b9 in Context::complete (this=0x5573c5048d50, r=<optimized out>) at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/include/Context.h:77 #11 0x00007f687ddb345f in Finisher::finisher_thread_entry() () at /usr/src/debug/ceph-14.2.21-317-g829d9986/src/common/Finisher.cc:67 #12 0x00007f688c751ea5 in start_thread (arg=0x7f6864d3d700) at pthread_create.c:307 #13 0x00007f6888fce9fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Updated by Ilya Dryomov almost 3 years ago
(gdb) p ((journal::ObjectRecorder *)0x5573c5cd63c0)->m_object_closed $22 = true (gdb) p ((journal::ObjectRecorder *)0x5573c5cd63c0)->m_in_flight_appends Python Exception <class 'gdb.error'> There is no member or method named _M_value_field.: $23 = std::map with 18 elements
Updated by Mykola Golub almost 3 years ago
- Status changed from New to Duplicate
The crash was observed in nautilus and I believe it is fixed in newer versions with this PR [1] as a non-related issue caught when testing that PR. See the comment [2] with the same backtrace. The PR [1] is still not backported to nautilus [3].
[1] https://github.com/ceph/ceph/pull/31392
[2] https://github.com/ceph/ceph/pull/31392#issuecomment-553297734
[3] https://tracker.ceph.com/issues/42953
Updated by Ilya Dryomov almost 3 years ago
- Status changed from Duplicate to New
This assert is still being hit with https://github.com/ceph/ceph/pull/41748 included:
src/journal/JournalRecorder.cc: In function 'void journal::JournalRecorder::open_object_set()' thread 7f278896e700 time 2021-06-08 23:28:33.884593 src/journal/JournalRecorder.cc: 267: FAILED ceph_assert(object_recorder->is_closed())
And also a new one:
src/journal/ObjectRecorder.cc: In function 'void journal::ObjectRecorder::notify_handler_unlock(std::shared_ptr<Mutex>&, bool)' thread 7f7efb1cb700 time 2021-06-08 23:31:58.202888 src/journal/ObjectRecorder.cc: 367: FAILED ceph_assert(m_in_flight_callbacks)
Updated by Ilya Dryomov almost 3 years ago
It doesn't reproduce on master though, so it is definitely a missing backport. Perhaps https://github.com/ceph/ceph/pull/31392 was only part of the fix?
Updated by Ilya Dryomov almost 3 years ago
- Related to Backport #42953: nautilus: unittest_journal failure during make check added
Updated by Ilya Dryomov almost 3 years ago
- Related to deleted (Backport #42953: nautilus: unittest_journal failure during make check)
Updated by Ilya Dryomov almost 3 years ago
- Related to Bug #42598: unittest_journal failure during make check added
Updated by Ilya Dryomov almost 3 years ago
- Related to Bug #47880: [journal] object recorder can race while lock is temporarily release for callbacks added
Updated by Ilya Dryomov almost 3 years ago
- Subject changed from object_recorder->is_closed() assert failure in JournalRecorder::open_object_set() to object_recorder->is_closed() assert failure in JournalRecorder::open_object_set() in nautilus
Updated by Ilya Dryomov about 2 years ago
- Status changed from New to Won't Fix - EOL
nautilus is EOL
Actions