Project

General

Profile

Bug #57119

Heap command prints with "ceph tell", but not with "ceph daemon"

Added by Laura Flores about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
low-hanging-fruit
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

How to reproduce:
  1. Start a vstart cluster, or access any working cluster
  2. Run `ceph tell <daemon>.<id> heap <heapcmd>` and `ceph daemon <daemon>.<id> heap <heapcmd>` on any OSD or MDS daemon (Same will apply to monitors after https://github.com/ceph/ceph/pull/47184 is merged.)
  3. Compare the difference in output

Example with osd.0 on a vstart cluster:

$ ./bin/ceph tell osd.0 heap stats
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-08-10T21:28:49.975+0000 7f11d2205700 -1 WARNING: all dangerous and experimental features are enabled.
2022-08-10T21:28:49.984+0000 7f11d2205700 -1 WARNING: all dangerous and experimental features are enabled.
osd.0 tcmalloc heap stats:------------------------------------------------
MALLOC:      373675112 (  356.4 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +      3519280 (    3.4 MiB) Bytes in central cache freelist
MALLOC: +      9373952 (    8.9 MiB) Bytes in transfer cache freelist
MALLOC: +      8892264 (    8.5 MiB) Bytes in thread cache freelists
MALLOC: +      3014656 (    2.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    398475264 (  380.0 MiB) Actual memory used (physical + swap)
MALLOC: +      1826816 (    1.7 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    400302080 (  381.8 MiB) Virtual address space used
MALLOC:
MALLOC:           3557              Spans in use
MALLOC:             81              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

$ ./bin/ceph daemon osd.0 heap stats
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***

Expected result:
The ideal result should be that the heap command prints in both the "tell" version and "daemon" version.

NOTE:
It is important to maintain the string output in the message command reply, so manager modules can also access this output.
For example, after running `ceph tell/daemon osd.0 heap stats` on a vstart cluster, you should see this result in out/osd.0.log:

2022-08-10T21:06:10.931+0000 7f64394d1700 20 osd.0 97 _dispatch 0x557db760f8c0 command(tid 2: {"prefix": "heap", "heapcmd": "stats"}) v1
2022-08-10T21:06:10.931+0000 7f644f7d4700  1 -- [v2:172.21.5.151:6802/1112500,v1:172.21.5.151:6803/1112500] --> 172.21.5.151:0/4089943627 -- command_reply(tid 2: 0 osd.0 tcmalloc heap stats:------------------------------------------------
MALLOC:      372563288 (  355.3 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +      3332552 (    3.2 MiB) Bytes in central cache freelist
MALLOC: +      9226752 (    8.8 MiB) Bytes in transfer cache freelist
MALLOC: +      8879840 (    8.5 MiB) Bytes in thread cache freelists
MALLOC: +      3014656 (    2.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    397017088 (  378.6 MiB) Actual memory used (physical + swap)
MALLOC: +       139264 (    0.1 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    397156352 (  378.8 MiB) Virtual address space used
MALLOC:
MALLOC:           3390              Spans in use
MALLOC:             73              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
) v1 -- 0x557dce502b00 con 0x557dc732e800


Related issues

Copied to RADOS - Backport #57312: quincy: Heap command prints with "ceph tell", but not with "ceph daemon" Resolved
Copied to RADOS - Backport #57313: pacific: Heap command prints with "ceph tell", but not with "ceph daemon" Resolved

History

#1 Updated by Vikhyat Umrao about 1 year ago

root@f23-h17-000-6048r:/var/run/ceph
# ceph daemon osd.98 heap stats
Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n" 

root@f23-h17-000-6048r:/var/run/ceph
# ceph tell osd.98 heap stats
osd.98 tcmalloc heap stats:------------------------------------------------
MALLOC:     3725949760 ( 3553.3 MiB) Bytes in use by application
MALLOC: +      1056768 (    1.0 MiB) Bytes in page heap freelist
MALLOC: +    370342848 (  353.2 MiB) Bytes in central cache freelist
MALLOC: +     16070656 (   15.3 MiB) Bytes in transfer cache freelist
MALLOC: +     84439296 (   80.5 MiB) Bytes in thread cache freelists
MALLOC: +     33554432 (   32.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   4231413760 ( 4035.4 MiB) Actual memory used (physical + swap)
MALLOC: +   4402339840 ( 4198.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   8633753600 ( 8233.8 MiB) Virtual address space used
MALLOC:
MALLOC:         389490              Spans in use
MALLOC:             39              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

root@f23-h17-000-6048r:/var/run/ceph
# ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-osd.98.asok heap stats

root@f23-h17-000-6048r:/var/run/ceph
# ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-osd.98.asok heap stats

#2 Updated by Vikhyat Umrao about 1 year ago

root@f28-h21-000-r630:~
# ceph tell mon.`hostname -s` heap stats
Error ENOENT: problem getting command descriptions from mon.f28-h21-000-r630

root@f28-h21-000-r630:~
# ceph tell mon.`hostname` heap stats
mon.f28-h21-000-r630.rdu2.scalelab.redhat.com tcmalloc heap stats:------------------------------------------------
MALLOC:      482852640 (  460.5 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     23908208 (   22.8 MiB) Bytes in central cache freelist
MALLOC: +     14829824 (   14.1 MiB) Bytes in transfer cache freelist
MALLOC: +     39135344 (   37.3 MiB) Bytes in thread cache freelists
MALLOC: +      3407872 (    3.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    564133888 (  538.0 MiB) Actual memory used (physical + swap)
MALLOC: +     89579520 (   85.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    653713408 (  623.4 MiB) Virtual address space used
MALLOC:
MALLOC:          15825              Spans in use
MALLOC:             22              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

root@f28-h21-000-r630:~
# ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-mon.f28-h21-000-r630.rdu2.scalelab.redhat.com.asok heap stats
mon.f28-h21-000-r630.rdu2.scalelab.redhat.com tcmalloc heap stats:------------------------------------------------
MALLOC:      482828048 (  460.5 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     24090272 (   23.0 MiB) Bytes in central cache freelist
MALLOC: +     14968064 (   14.3 MiB) Bytes in transfer cache freelist
MALLOC: +     38815056 (   37.0 MiB) Bytes in thread cache freelists
MALLOC: +      3407872 (    3.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    564109312 (  538.0 MiB) Actual memory used (physical + swap)
MALLOC: +     89604096 (   85.5 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    653713408 (  623.4 MiB) Virtual address space used
MALLOC:
MALLOC:          15825              Spans in use
MALLOC:             22              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

root@f28-h21-000-r630:~
# ceph daemon mon.`hostname` heap stats
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

#3 Updated by Laura Flores about 1 year ago

Appending the stringstream to the "outbl" bufferlist here makes the output print with the "daemon" version, but it is printed twice with the "tell" version. So, not the ideal solution, but clearly the outbl needs to have the stringstream appended to it.
https://github.com/ceph/ceph/blob/042ca579c2e4fece1853503d982b6afc0a0478dc/src/osd/OSD.cc#L3163-L3165

 out:
+  outbl.append(ss.str());
  on_finish(ret, ss.str(), outbl);
}

The same change "works" for mds: https://github.com/ceph/ceph/blob/042ca579c2e4fece1853503d982b6afc0a0478dc/src/mds/MDSDaemon.cc#L209-L211

#4 Updated by Radoslaw Zarzynski about 1 year ago

The interface between `outbl` and the formatter is described at the beginning of the function:

void OSD::asok_command(
  std::string_view prefix, const cmdmap_t& cmdmap,
  Formatter *f,
  const bufferlist& inbl,
  std::function<void(int,const std::string&,bufferlist&)> on_finish)
{
  int ret = 0;
  stringstream ss;   // stderr error message stream
  bufferlist outbl;  // if empty at end, we'll dump formatter as output

The implementation of the heap family of commands doesn't use outbl at all:

  else if (prefix == "heap") {
    ret = ceph::osd_cmds::heap(*cct, cmdmap, *f, ss);
  }

However, it takes the ss (intended to be a counterpart of stderr). Maybe cmds::heap() print there instead of formatted?

#5 Updated by Radoslaw Zarzynski about 1 year ago

Oops, this is the case:

int heap(CephContext& cct, const cmdmap_t& cmdmap, Formatter& f,
         std::ostream& os)
{
  // ...
  ceph_heap_profiler_handle_command(cmd_vec, os);

  return 0;
}

It looks we have a misunderstanding between the caller and callee.

#6 Updated by Radoslaw Zarzynski about 1 year ago

Similar problem may affect e.g. flush_store_cache:

  } else if (prefix == "flush_store_cache") {
    store->flush_cache(&ss);

#7 Updated by Radoslaw Zarzynski about 1 year ago

There is another thing we should unify: daemon and tell differently treat ss (the error stream) when the error code says there was no error.

#8 Updated by Radoslaw Zarzynski about 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 47650

#9 Updated by Radoslaw Zarzynski about 1 year ago

  hook->call_async(
    prefix, cmdmap, f, inbl,
    [f, on_finish](int r, const std::string& err, bufferlist& out) {
      // handle either existing output in bufferlist *or* via formatter
      if (r >= 0 && out.length() == 0) {
        f->flush(out);
      } 
      delete f;
      on_finish(r, err, out);
    });

For tell we always forward the error stuff to mon:

void AdminSocket::do_tell_queue()
{
  ldout(m_cct,10) << __func__ << dendl;
  std::list<cref_t<MCommand>> q; 
  std::list<cref_t<MMonCommand>> lq;
  { 
    std::lock_guard l(tell_lock);
    q.swap(tell_queue);
    lq.swap(tell_legacy_queue);
  }
  for (auto& m : q) {
    bufferlist outbl;
    execute_command(
      m->cmd,
      m->get_data(),
      [m](int r, const std::string& err, bufferlist& outbl) {
        auto reply = new MCommandReply(r, err);
        reply->set_tid(m->get_tid());
        reply->set_data(outbl);
#ifdef WITH_SEASTAR
        // TODO: crimson: handle asok commmand from alien thread
#else
        m->get_connection()->send_message(reply);
#endif
      });
  }
  for (auto& m : lq) {
    bufferlist outbl;
    execute_command(
      m->cmd,
      m->get_data(),
      [m](int r, const std::string& err, bufferlist& outbl) {
        auto reply = new MMonCommandAck(m->cmd, r, err, 0);
        reply->set_tid(m->get_tid());
        reply->set_data(outbl);
#ifdef WITH_SEASTAR
        // TODO: crimson: handle asok commmand from alien thread
#else
        m->get_connection()->send_message(reply);
#endif
      });
  }
}

For deamon we make use the error stream only if return code is < 0:

void AdminSocket::do_accept()
{
  // ...
  std::vector<std::string> cmdvec = { c };
  bufferlist empty, out;
  ostringstream err;
  int rval = execute_command(cmdvec, empty /* inbl */, err, &out);

  // Unfortunately, the asok wire protocol does not let us pass an error code,
  // and many asok command implementations return helpful error strings.  So,
  // let's prepend an error string to the output if there is an error code.
  if (rval < 0) {
    ostringstream ss;
    ss << "ERROR: " << cpp_strerror(rval) << "\n";
    ss << err.str() << "\n";
    bufferlist o;
    o.append(ss.str());
    o.claim_append(out);
    out.claim_append(o);
  }
  // ...

#10 Updated by Laura Flores about 1 year ago

  • Backport set to quincy,pacific

#11 Updated by Laura Flores about 1 year ago

  • Status changed from Fix Under Review to Pending Backport

Setting to "Pending backport" briefly so I can create backport trackers.

#12 Updated by Laura Flores about 1 year ago

  • Copied to Backport #57312: quincy: Heap command prints with "ceph tell", but not with "ceph daemon" added

#13 Updated by Laura Flores about 1 year ago

  • Copied to Backport #57313: pacific: Heap command prints with "ceph tell", but not with "ceph daemon" added

#14 Updated by Laura Flores about 1 year ago

  • Tags changed from low-hanging-fruit to low-hanging-fruit backport_processed

#15 Updated by Laura Flores about 1 year ago

  • Status changed from Pending Backport to Fix Under Review

#17 Updated by Yaarit Hatuka about 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#18 Updated by Konstantin Shalygin 12 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags changed from low-hanging-fruit backport_processed to low-hanging-fruit

Also available in: Atom PDF