Project

General

Profile

Bug #58219

Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json']

Added by Venky Shankar over 1 year ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/vshankar-2022-12-08_04:33:46-fs-wip-vshankar-testing-20221130.043104-testing-default-smithi/7107719/

2022-12-08T05:32:42.974 INFO:tasks.cephfs_test_runner:======================================================================
2022-12-08T05:32:42.974 INFO:tasks.cephfs_test_runner:ERROR: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration)
2022-12-08T05:32:42.975 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-12-08T05:32:42.975 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-12-08T05:32:42.975 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_de0d7fdb293c079dc341d1b2f75089d082eb01fc/qa/tasks/cephfs/test_journal_migration.py", line 63, in test_journal_migration
2022-12-08T05:32:42.976 INFO:tasks.cephfs_test_runner:    journal_version = self.fs.get_journal_version()
2022-12-08T05:32:42.976 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_de0d7fdb293c079dc341d1b2f75089d082eb01fc/qa/tasks/cephfs/filesystem.py", line 1198, in get_journal_version
2022-12-08T05:32:42.976 INFO:tasks.cephfs_test_runner:    journal_pointer_dump = self.get_metadata_object("JournalPointer", journal_pointer_object)
2022-12-08T05:32:42.977 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_de0d7fdb293c079dc341d1b2f75089d082eb01fc/qa/tasks/cephfs/filesystem.py", line 1185, in get_metadata_object
2022-12-08T05:32:42.977 INFO:tasks.cephfs_test_runner:    j = self.dencoder(object_type, o)
2022-12-08T05:32:42.977 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_de0d7fdb293c079dc341d1b2f75089d082eb01fc/qa/tasks/cephfs/filesystem.py", line 1154, in dencoder
2022-12-08T05:32:42.978 INFO:tasks.cephfs_test_runner:    p = self.mon_manager.controller.run(args=args, stdin=BytesIO(obj_blob), stdout=BytesIO())
2022-12-08T05:32:42.978 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_4da97cf64e542f347ec47b7bdbe5eca99759f9b7/teuthology/orchestra/remote.py", line 525, in run
2022-12-08T05:32:42.979 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2022-12-08T05:32:42.979 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_4da97cf64e542f347ec47b7bdbe5eca99759f9b7/teuthology/orchestra/run.py", line 455, in run
2022-12-08T05:32:42.979 INFO:tasks.cephfs_test_runner:    r.wait()
2022-12-08T05:32:42.980 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_4da97cf64e542f347ec47b7bdbe5eca99759f9b7/teuthology/orchestra/run.py", line 161, in wait
2022-12-08T05:32:42.980 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2022-12-08T05:32:42.980 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_4da97cf64e542f347ec47b7bdbe5eca99759f9b7/teuthology/orchestra/run.py", line 179, in _raise_for_status
2022-12-08T05:32:42.981 INFO:tasks.cephfs_test_runner:    raise CommandCrashedError(command=self.command)
2022-12-08T05:32:42.981 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandCrashedError: Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'

ceph-dencoder crashed with:

2022-12-08T05:32:31.624 DEBUG:teuthology.orchestra.run.smithi136:> ceph-dencoder type JournalPointer import - decode dump_json
2022-12-08T05:32:31.682 INFO:teuthology.orchestra.run.smithi136.stderr:src/tcmalloc.cc:332] Attempt to free invalid pointer 0x558f14e3a000
2022-12-08T05:32:31.689 DEBUG:teuthology.orchestra.run:got remote process result: None

Related issues

Related to rgw - Bug #59269: test_librgw_file.sh crashes: src/tcmalloc.cc:332] Attempt to free invalid pointer 0x55e8173eebd0 Pending Backport
Copied to CephFS - Backport #62903: pacific: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] Rejected
Copied to CephFS - Backport #62904: reef: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] Rejected
Copied to CephFS - Backport #62905: quincy: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] Rejected

History

#1 Updated by Venky Shankar over 1 year ago

Another instance but for a different test - https://pulpito.ceph.com/vshankar-2022-12-08_04:33:46-fs-wip-vshankar-testing-20221130.043104-testing-default-smithi/7107733/

2022-12-08T05:59:58.084 DEBUG:teuthology.orchestra.run.smithi062:> ceph-dencoder type string_wrapper import - decode dump_json
2022-12-08T05:59:58.145 INFO:teuthology.orchestra.run.smithi062.stderr:src/tcmalloc.cc:332] Attempt to free invalid pointer 0x55c46d53a000
2022-12-08T05:59:58.150 DEBUG:teuthology.orchestra.run:got remote process result: None

#3 Updated by Venky Shankar over 1 year ago

  • Assignee set to Venky Shankar

#4 Updated by Venky Shankar over 1 year ago

  • Status changed from New to Triaged

#5 Updated by Venky Shankar over 1 year ago

  • Subject changed from Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) to Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json']

#6 Updated by Venky Shankar about 1 year ago

Most likely, this is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1569391 (memory is allocated via libc when using aligned_alloc, but freed via tcmalloc).

#7 Updated by Venky Shankar about 1 year ago

This is a build issue which has started to show up recently.

#8 Updated by Venky Shankar about 1 year ago

Tried with a recent build - https://pulpito.ceph.com/vshankar-2022-12-08_04:33:46-fs-wip-vshankar-testing-20221130.043104-testing-default-smithi/

Had to run only ubuntu distro jobs, since centos builds are current failing (the ceph-dencoder issue was seen in ubuntu distro runs anyway). I'm not seeing this issue anymore, but I'l not willing to close this tracker atm - once the centos builds are working fine and a a full fs suite run is verified, I'll keep this open.

#10 Updated by Venky Shankar about 1 year ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 49842

#11 Updated by Casey Bodley 12 months ago

very strange. i just saw this show up in an rgw job against ubuntu 20.04

i booted up an old focal vm to test under vstart, and i'm getting these crashes on startup from most ceph daemons (ceph-mon, ceph-osd, radosgw):

(gdb) bt                                                                                                                                    
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                   
#1  0x00007fffd0e1a859 in __GI_abort () at abort.c:79                                                                                       
#2  0x00007fffd21391d2 in ?? () from /lib/x86_64-linux-gnu/libtcmalloc.so.4                                                                 
#3  0x00007fffd213aca9 in ?? () from /lib/x86_64-linux-gnu/libtcmalloc.so.4                                                                 
#4  0x00007fffd214fe1d in MallocExtension::Initialize() () from /lib/x86_64-linux-gnu/libtcmalloc.so.4                                      
#5  0x00007fffd2139b1e in ?? () from /lib/x86_64-linux-gnu/libtcmalloc.so.4                                                                 
#6  0x00007ffff7fe0b9a in call_init (l=<optimized out>, argc=argc@entry=8, argv=argv@entry=0x7fffffffdff8, env=env@entry=0x7fffffffe040)    
    at dl-init.c:72                                                                                                                         
#7  0x00007ffff7fe0ca1 in call_init (env=0x7fffffffe040, argv=0x7fffffffdff8, argc=8, l=<optimized out>) at dl-init.c:30                    
#8  _dl_init (main_map=0x7ffff7ffe190, argc=8, argv=0x7fffffffdff8, env=0x7fffffffe040) at dl-init.c:119                                    
#9  0x00007ffff7fd013a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2                                                                
#10 0x0000000000000008 in ?? ()                                                                                                             
#11 0x00007fffffffe361 in ?? ()                                                                                                             
#12 0x00007fffffffe387 in ?? ()                                                                                                             
#13 0x00007fffffffe38e in ?? ()                                                                                                             
#14 0x00007fffffffe391 in ?? ()                                                                                                             
#15 0x00007fffffffe3b4 in ?? ()                                                                                                             
#16 0x00007fffffffe3b7 in ?? ()                                                                                                             
#17 0x00007fffffffe3b9 in ?? ()                                                                                                             
#18 0x00007fffffffe3d8 in ?? ()                                                                                                             
#19 0x0000000000000000 in ?? ()

#12 Updated by Casey Bodley 12 months ago

  • Related to Bug #59269: test_librgw_file.sh crashes: src/tcmalloc.cc:332] Attempt to free invalid pointer 0x55e8173eebd0 added

#14 Updated by Venky Shankar 6 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,quincy to reef,quincy,pacific

Patrick Donnelly wrote:

https://github.com/ceph/ceph/pull/49842#issuecomment-1727875012

Definitely. Probably missed updating the tracker. We should also backport this I guess.

#15 Updated by Backport Bot 6 months ago

  • Copied to Backport #62903: pacific: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] added

#16 Updated by Backport Bot 6 months ago

  • Copied to Backport #62904: reef: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] added

#17 Updated by Backport Bot 6 months ago

  • Copied to Backport #62905: quincy: Test failure: test_journal_migration (tasks.cephfs.test_journal_migration.TestJournalMigration) [Command crashed: 'ceph-dencoder type JournalPointer import - decode dump_json'] added

#18 Updated by Backport Bot 6 months ago

  • Tags set to backport_processed

#19 Updated by Venky Shankar 6 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF