Bug #64155
openCoredumps not available in RGW Crashes
0%
Description
This run that has a segfault does not contain a coredump:
(this file is over 2 million lines long and would be best viewed on a sepia machine)
Looking through the logging statements in the teuthology log and the teuthology source code, I see this function is not executed.
https://github.com/ceph/teuthology/blob/main/teuthology/task/internal/__init__.py#L302
Upon a quick review it's unclear if this conditional is not evaluating to true:
https://github.com/ceph/teuthology/blob/main/teuthology/task/internal/__init__.py#L308
or if the function is just never getting called here:
https://github.com/ceph/teuthology/blob/main/teuthology/task/internal/__init__.py#L420
Updated by Zack Cerza 4 months ago
fetch_binaries_for_coredumps
runs locally, after any coredumps have been fetched from the remotes. The job's archive doesn't contain any coredumps at all, indicating that they weren't present on the remote to begin with.
We tell the kernel to place coredumps in /home/ubuntu/cephtest/archive/coredump/
- if you've got a reliable segfault reproducer, it might be worth investigating whether the coredumps are being written to disk on the remote at all.
Updated by Ali Maredia 4 months ago
I should have a reproducer since that run with a segfault happened yesterday. How would I investigate whether the coredumps are being written to disk on the remote?
Should I lock machines or would I try to add debugging to a teuthology task to inspect where the coredumps went.
- Ali