Bug #57119
Heap command prints with "ceph tell", but not with "ceph daemon"
100%
Description
- Start a vstart cluster, or access any working cluster
- Run `ceph tell <daemon>.<id> heap <heapcmd>` and `ceph daemon <daemon>.<id> heap <heapcmd>` on any OSD or MDS daemon (Same will apply to monitors after https://github.com/ceph/ceph/pull/47184 is merged.)
- Compare the difference in output
Example with osd.0 on a vstart cluster:
$ ./bin/ceph tell osd.0 heap stats
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-08-10T21:28:49.975+0000 7f11d2205700 -1 WARNING: all dangerous and experimental features are enabled.
2022-08-10T21:28:49.984+0000 7f11d2205700 -1 WARNING: all dangerous and experimental features are enabled.
osd.0 tcmalloc heap stats:------------------------------------------------
MALLOC: 373675112 ( 356.4 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 3519280 ( 3.4 MiB) Bytes in central cache freelist
MALLOC: + 9373952 ( 8.9 MiB) Bytes in transfer cache freelist
MALLOC: + 8892264 ( 8.5 MiB) Bytes in thread cache freelists
MALLOC: + 3014656 ( 2.9 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 398475264 ( 380.0 MiB) Actual memory used (physical + swap)
MALLOC: + 1826816 ( 1.7 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 400302080 ( 381.8 MiB) Virtual address space used
MALLOC:
MALLOC: 3557 Spans in use
MALLOC: 81 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
$ ./bin/ceph daemon osd.0 heap stats
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
Expected result:
The ideal result should be that the heap command prints in both the "tell" version and "daemon" version.
NOTE:
It is important to maintain the string output in the message command reply, so manager modules can also access this output.
For example, after running `ceph tell/daemon osd.0 heap stats` on a vstart cluster, you should see this result in out/osd.0.log:
2022-08-10T21:06:10.931+0000 7f64394d1700 20 osd.0 97 _dispatch 0x557db760f8c0 command(tid 2: {"prefix": "heap", "heapcmd": "stats"}) v1
2022-08-10T21:06:10.931+0000 7f644f7d4700 1 -- [v2:172.21.5.151:6802/1112500,v1:172.21.5.151:6803/1112500] --> 172.21.5.151:0/4089943627 -- command_reply(tid 2: 0 osd.0 tcmalloc heap stats:------------------------------------------------
MALLOC: 372563288 ( 355.3 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 3332552 ( 3.2 MiB) Bytes in central cache freelist
MALLOC: + 9226752 ( 8.8 MiB) Bytes in transfer cache freelist
MALLOC: + 8879840 ( 8.5 MiB) Bytes in thread cache freelists
MALLOC: + 3014656 ( 2.9 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 397017088 ( 378.6 MiB) Actual memory used (physical + swap)
MALLOC: + 139264 ( 0.1 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 397156352 ( 378.8 MiB) Virtual address space used
MALLOC:
MALLOC: 3390 Spans in use
MALLOC: 73 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
) v1 -- 0x557dce502b00 con 0x557dc732e800
Related issues
History
#1 Updated by Vikhyat Umrao about 1 year ago
root@f23-h17-000-6048r:/var/run/ceph # ceph daemon osd.98 heap stats Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n" root@f23-h17-000-6048r:/var/run/ceph # ceph tell osd.98 heap stats osd.98 tcmalloc heap stats:------------------------------------------------ MALLOC: 3725949760 ( 3553.3 MiB) Bytes in use by application MALLOC: + 1056768 ( 1.0 MiB) Bytes in page heap freelist MALLOC: + 370342848 ( 353.2 MiB) Bytes in central cache freelist MALLOC: + 16070656 ( 15.3 MiB) Bytes in transfer cache freelist MALLOC: + 84439296 ( 80.5 MiB) Bytes in thread cache freelists MALLOC: + 33554432 ( 32.0 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 4231413760 ( 4035.4 MiB) Actual memory used (physical + swap) MALLOC: + 4402339840 ( 4198.4 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 8633753600 ( 8233.8 MiB) Virtual address space used MALLOC: MALLOC: 389490 Spans in use MALLOC: 39 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. root@f23-h17-000-6048r:/var/run/ceph # ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-osd.98.asok heap stats root@f23-h17-000-6048r:/var/run/ceph # ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-osd.98.asok heap stats
#2 Updated by Vikhyat Umrao about 1 year ago
root@f28-h21-000-r630:~ # ceph tell mon.`hostname -s` heap stats Error ENOENT: problem getting command descriptions from mon.f28-h21-000-r630 root@f28-h21-000-r630:~ # ceph tell mon.`hostname` heap stats mon.f28-h21-000-r630.rdu2.scalelab.redhat.com tcmalloc heap stats:------------------------------------------------ MALLOC: 482852640 ( 460.5 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 23908208 ( 22.8 MiB) Bytes in central cache freelist MALLOC: + 14829824 ( 14.1 MiB) Bytes in transfer cache freelist MALLOC: + 39135344 ( 37.3 MiB) Bytes in thread cache freelists MALLOC: + 3407872 ( 3.2 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 564133888 ( 538.0 MiB) Actual memory used (physical + swap) MALLOC: + 89579520 ( 85.4 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 653713408 ( 623.4 MiB) Virtual address space used MALLOC: MALLOC: 15825 Spans in use MALLOC: 22 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. root@f28-h21-000-r630:~ # ceph --admin-daemon /var/run/ceph/0b67fc9a-1cc2-11ed-8fb3-000af7995d6c/ceph-mon.f28-h21-000-r630.rdu2.scalelab.redhat.com.asok heap stats mon.f28-h21-000-r630.rdu2.scalelab.redhat.com tcmalloc heap stats:------------------------------------------------ MALLOC: 482828048 ( 460.5 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 24090272 ( 23.0 MiB) Bytes in central cache freelist MALLOC: + 14968064 ( 14.3 MiB) Bytes in transfer cache freelist MALLOC: + 38815056 ( 37.0 MiB) Bytes in thread cache freelists MALLOC: + 3407872 ( 3.2 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 564109312 ( 538.0 MiB) Actual memory used (physical + swap) MALLOC: + 89604096 ( 85.5 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 653713408 ( 623.4 MiB) Virtual address space used MALLOC: MALLOC: 15825 Spans in use MALLOC: 22 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. root@f28-h21-000-r630:~ # ceph daemon mon.`hostname` heap stats admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
#3 Updated by Laura Flores about 1 year ago
Appending the stringstream to the "outbl" bufferlist here makes the output print with the "daemon" version, but it is printed twice with the "tell" version. So, not the ideal solution, but clearly the outbl needs to have the stringstream appended to it.
https://github.com/ceph/ceph/blob/042ca579c2e4fece1853503d982b6afc0a0478dc/src/osd/OSD.cc#L3163-L3165
out:
+ outbl.append(ss.str());
on_finish(ret, ss.str(), outbl);
}
The same change "works" for mds: https://github.com/ceph/ceph/blob/042ca579c2e4fece1853503d982b6afc0a0478dc/src/mds/MDSDaemon.cc#L209-L211
#4 Updated by Radoslaw Zarzynski about 1 year ago
The interface between `outbl` and the formatter is described at the beginning of the function:
void OSD::asok_command( std::string_view prefix, const cmdmap_t& cmdmap, Formatter *f, const bufferlist& inbl, std::function<void(int,const std::string&,bufferlist&)> on_finish) { int ret = 0; stringstream ss; // stderr error message stream bufferlist outbl; // if empty at end, we'll dump formatter as output
The implementation of the heap
family of commands doesn't use outbl
at all:
else if (prefix == "heap") { ret = ceph::osd_cmds::heap(*cct, cmdmap, *f, ss); }
However, it takes the ss
(intended to be a counterpart of stderr
). Maybe cmds::heap()
print there instead of formatted?
#5 Updated by Radoslaw Zarzynski about 1 year ago
Oops, this is the case:
int heap(CephContext& cct, const cmdmap_t& cmdmap, Formatter& f, std::ostream& os) { // ... ceph_heap_profiler_handle_command(cmd_vec, os); return 0; }
It looks we have a misunderstanding between the caller and callee.
#6 Updated by Radoslaw Zarzynski about 1 year ago
Similar problem may affect e.g. flush_store_cache
:
} else if (prefix == "flush_store_cache") { store->flush_cache(&ss);
#7 Updated by Radoslaw Zarzynski about 1 year ago
There is another thing we should unify: daemon
and tell
differently treat ss
(the error stream) when the error code says there was no error.
#8 Updated by Radoslaw Zarzynski about 1 year ago
- Status changed from New to Fix Under Review
- Pull request ID set to 47650
#9 Updated by Radoslaw Zarzynski about 1 year ago
hook->call_async( prefix, cmdmap, f, inbl, [f, on_finish](int r, const std::string& err, bufferlist& out) { // handle either existing output in bufferlist *or* via formatter if (r >= 0 && out.length() == 0) { f->flush(out); } delete f; on_finish(r, err, out); });
For tell
we always forward the error stuff to mon:
void AdminSocket::do_tell_queue() { ldout(m_cct,10) << __func__ << dendl; std::list<cref_t<MCommand>> q; std::list<cref_t<MMonCommand>> lq; { std::lock_guard l(tell_lock); q.swap(tell_queue); lq.swap(tell_legacy_queue); } for (auto& m : q) { bufferlist outbl; execute_command( m->cmd, m->get_data(), [m](int r, const std::string& err, bufferlist& outbl) { auto reply = new MCommandReply(r, err); reply->set_tid(m->get_tid()); reply->set_data(outbl); #ifdef WITH_SEASTAR // TODO: crimson: handle asok commmand from alien thread #else m->get_connection()->send_message(reply); #endif }); } for (auto& m : lq) { bufferlist outbl; execute_command( m->cmd, m->get_data(), [m](int r, const std::string& err, bufferlist& outbl) { auto reply = new MMonCommandAck(m->cmd, r, err, 0); reply->set_tid(m->get_tid()); reply->set_data(outbl); #ifdef WITH_SEASTAR // TODO: crimson: handle asok commmand from alien thread #else m->get_connection()->send_message(reply); #endif }); } }
For deamon
we make use the error stream only if return code is < 0
:
void AdminSocket::do_accept() { // ... std::vector<std::string> cmdvec = { c }; bufferlist empty, out; ostringstream err; int rval = execute_command(cmdvec, empty /* inbl */, err, &out); // Unfortunately, the asok wire protocol does not let us pass an error code, // and many asok command implementations return helpful error strings. So, // let's prepend an error string to the output if there is an error code. if (rval < 0) { ostringstream ss; ss << "ERROR: " << cpp_strerror(rval) << "\n"; ss << err.str() << "\n"; bufferlist o; o.append(ss.str()); o.claim_append(out); out.claim_append(o); } // ...
#10 Updated by Laura Flores about 1 year ago
- Backport set to quincy,pacific
#11 Updated by Laura Flores about 1 year ago
- Status changed from Fix Under Review to Pending Backport
Setting to "Pending backport" briefly so I can create backport trackers.
#12 Updated by Laura Flores about 1 year ago
- Copied to Backport #57312: quincy: Heap command prints with "ceph tell", but not with "ceph daemon" added
#13 Updated by Laura Flores about 1 year ago
- Copied to Backport #57313: pacific: Heap command prints with "ceph tell", but not with "ceph daemon" added
#14 Updated by Laura Flores about 1 year ago
- Tags changed from low-hanging-fruit to low-hanging-fruit backport_processed
#15 Updated by Laura Flores about 1 year ago
- Status changed from Pending Backport to Fix Under Review
#16 Updated by Yuri Weinstein about 1 year ago
#17 Updated by Yaarit Hatuka about 1 year ago
- Status changed from Fix Under Review to Pending Backport
#18 Updated by Konstantin Shalygin 12 months ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
- Tags changed from low-hanging-fruit backport_processed to low-hanging-fruit