Project

General

Profile

Bug #20504

FileJournal: fd leak lead to FileJournal::~FileJourna() assert failed

Added by Honggang Yang over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
-
Start date:
07/05/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:

Description

1. description

[root@yhg-1 work]# file 1498638564.27426.core                                                      
1498638564.27426.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'ceph-objectstore-tool --data-path /var/lib/ceph/osd/xtao-1 --journal-path /dev/'
(gdb) bt                                                                                           
#0 0x00007fd74b1f4fcb in raise () from /lib64/libpthread.so.0
#1 0x00007fd74da482d5 in reraise_fatal (signum=6) at global/signal_handler.cc:71
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:133
#3 <signal handler called>
#4 0x00007fd749a9e5f7 in raise () from /lib64/libc.so.6
#5 0x00007fd749a9fce8 in abort () from /lib64/libc.so.6
#6 0x00007fd74db350d7 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7fd74dcfa379 "fd == -1",
file=file@entry=0x7fd74dcfa54f "os/filestore/FileJournal.h", line=line@entry=440,
func=func@entry=0x7fd74dcfc5e0 <FileJournal::~FileJournal()::__PRETTY_FUNCTION__> "virtual FileJournal::~FileJournal()")
at common/assert.cc:78
#7 0x00007fd74d78c0a0 in FileJournal::~FileJournal (this=0x7fd7587a8600, __in_chrg=<optimized out>)
at os/filestore/FileJournal.h:440
#8 0x00007fd74d78c0c9 in FileJournal::~FileJournal (this=0x7fd7587a8600, __in_chrg=<optimized out>)
at os/filestore/FileJournal.h:443
#9 0x00007fd74d851e72 in JournalingObjectStore::journal_replay (this=this@entry=0x7fd758770000, fs_op_seq=<optimized out>)
at os/filestore/JournalingObjectStore.cc:61
#10 0x00007fd74d829db6 in FileStore::mount (this=0x7fd758770000) at os/filestore/FileStore.cc:1628
#11 0x00007fd74d41abdc in main (argc=<optimized out>, argv=0x7fff0041f7f8) at tools/ceph_objectstore_tool.cc:2479
(gdb) f 9
#9 0x00007fd74d851e72 in JournalingObjectStore::journal_replay (this=this@entry=0x7fd758770000, fs_op_seq=<optimized out>)
at os/filestore/JournalingObjectStore.cc:61
61 delete journal;
(gdb) l
56
57 int err = journal->open(op_seq);
58 if (err < 0) {
59 dout(3) << "journal_replay open failed with "
60 << cpp_strerror(err) << dendl;
61 delete journal;
62 journal = 0;
63 return err;
64 }
65

2. reasion

I wronngly speicified the journal path to another osd's journal. So FileJournal::open returned
-EINVAL.

 int FileJournal::open(uint64_t fs_op_seq)
{
  dout(2) << "open " << fn << " fsid " << fsid << " fs_op_seq " << fs_op_seq << dendl;

  uint64_t next_seq = fs_op_seq + 1;

  int err = _open(false);
  if (err)
    return err;
  ...
  if (header.fsid != fsid) {
    derr << "FileJournal::open: ondisk fsid " << header.fsid << " doesn't match expected " << fsid
         << ", invalid (someone else's?) journal" << dendl;
    return -EINVAL;
  }

Then,

    int JournalingObjectStore::journal_replay(uint64_t fs_op_seq)                                   
    {                                                                                               
      ...                                                      
      int err = journal->open(op_seq);   // return -EINVAL, with fd != -1
      if (err < 0) {                                                                                
        dout(3) << "journal_replay open failed with "                                               
                << cpp_strerror(err) << dendl;                                                      
        delete journal;   // assert failed in ~FileJourna()                                                                          
        journal = 0;                                                                                
        return err;   

Assert failed:

  ~FileJournal() {
   * assert(fd == -1);*
    delete[] zero_buf;
    g_conf->remove_observer(this);
  }

History

#1 Updated by Kefu Chai over 1 year ago

  • Category set to Correctness/Safety
  • Status changed from New to Need Review
  • Assignee changed from Samuel Just to Honggang Yang
  • Release deleted (jewel)
  • Release set to master
  • Component(RADOS) OSD added
  • Component(RADOS) deleted (objectstore-tool)

#2 Updated by Kefu Chai over 1 year ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF