Project

General

Profile

Feature #37298

Tasks #36451: mgr/dashboard: Scalability testing

Bug #36453: mgr/dashboard: Some REST endpoints grow linearly with OSD count

mgr/dashboard: Support a more compact data format (MessagePack, BSON)

Added by Zack Cerza over 5 years ago. Updated almost 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

History

#1 Updated by Zack Cerza over 5 years ago

  • Status changed from New to Rejected

I actually went ahead and implemented MessagePack support in the backend. I'll go into detail about what I found, but in summary - it's not helpful.

Below are the results of querying /api/dashboard/health. Note that Content-Length reports the size of the response (which may or may not be compressed) and Size reports the size of the response after decompression, if applicable.

Current master:
Content-Type: application/json
Content-Encoding: gzip
Content-Length: 4986
Size: 27261

msgpack without gzip:
Content-Type: application/msgpack
Content-Length: 20487
Size: 20487

msgpack with gzip:
Content-Type: application/msgpack
Content-Encoding: gzip
Content-Length: 5015
Size: 20487

The requests were performed in quick succession, so the resulting objects should be close to identical. Clearly, MessagePack objects are a bit smaller when uncompressed, but a gzipped MessagePack object is actually larger than a gzipped JSON object.

For the interested, I'll keep the branch around:
https://github.com/zmc/ceph/tree/wip-msgpack

#2 Updated by Zack Cerza over 5 years ago

I should add that, from everything I've read, BSON is less efficient than MessagePack.

#3 Updated by Ernesto Puerta over 5 years ago

Thanks a lot for the info, Zack!

A few comments:
- What's the sizing of the cluster? The key scaling factor in Ceph is OSDs and that brings a lot of metadata to the Ceph maps. I tried to use vstart.sh with the default 3 OSD setup, and added 'fake' extra OSDs (with the "ceph osd create" command), but that resulted in less payload than 'real' ones. So I went for using vstart.sh with CEPH_NUM_OSD=10,20,30, and the results were pretty different (for the worse).
- Apart from the network traffic, in terms of scalability, we should also care about the performance (CPU/memory) at both sides (back-end serialization, browser deserialization). While I think pagination will mostly solve this, if Messagepack/BSON worked as drop-in replacements, it could, in the meantime, help speed-up parsing with large payloads (>100 OSD deployments). I read that BSON/Messagepack serialization/deserialization both beat standard Python/browser's JSON ones.
- In the mid-term, we should also look for formats allowing delta updates, as the natural (and planned) evolution of the dashboard is to switch from client-pull updates to server-push (websockets, SSE).

With all the above I don't mean we have to go for this right now, but maybe let's not discard it so soon, as it might be still useful.

#4 Updated by Zack Cerza over 5 years ago

Thanks for the feedback!

I don't plan on tossing this work based on my initial findings; rather, I didn't want to go so far as adapting the entire frontend to work with it just yet since it didn't seem to improve things any.

The numbers I initially gave were from a default vstart cluster - but you raise a good point, and I'll do some investigation on a larger one today.

I'll also look into some performance profiling of JSON/BSON/MessagePack, because while I initially also heard that the binary formats were faster to (de)serialize, I more recently was reading that they were not. Definitely a question we should answer for ourselves.

I'd wondered how we would implement delta updates with our existing pull method, but it makes much more sense to me given the plan to move towards a push method. Is there a ticket or other document for that yet? If not, I can create a subtask of this ticket's parent.

#5 Updated by Zack Cerza over 5 years ago

I did some profiling of the various formats' time to serialize and deserialize an /api/dashboard/health payload on vstart clusters (n=1000). The Python results:

                               serial  | deserial
vstart-10-osd.json   bson    : 32.131s | 13.802s
                     json    : 1.506s  | 1.304s
                     msgpack : 0.394s  | 0.469s

vstart-15-osd.json   bson    : 39.397s | 15.596s
                     json    : 1.344s  | 1.473s
                     msgpack : 0.369s  | 0.603s

vstart-3-osd.json    bson    : 23.587s | 9.499s
                     json    : 1.051s  | 0.954s
                     msgpack : 0.264s  | 0.309s

vstart was failing when trying to create a 30-OSD cluster, but I think this does pretty clearly show that MsgPack is quite a bit faster than JSON... which is in turn far faster than BSON.

I'll try to do a similar analysis of memory usage, and also repeat this with JavaScript.

Edit: The times reported do not include the initial conversion from their on-disk JSON to native Python objects, of course. :)

Edit 2: Here are the JS results with the same payloads and loop count. I used msgpack-lite for this, though I did test the much slower msgpack as well.

                               serial  | deserial
vstart-10-osd.json   bson    : 2.30s   | 2.28s
                     json    : 0.470s  | 0.368s
                     msgpack : 1.59s   | 1.45s

vstart-15-osd.json   bson    : 2.17s   | 2.16s
                     json    : 0.504s  | 0.507s
                     msgpack : 1.79s   | 1.87s

vstart-3-osd.json    bson    : 1.25s   | 1.27s
                     json    : 0.334s  | 0.259s
                     msgpack : 1.10s   | 1.06s

#6 Updated by Zack Cerza over 5 years ago

  • Subject changed from mgs/dashboard: Support a more compact data format (MessagePack, BSON) to mgr/dashboard: Support a more compact data format (MessagePack, BSON)

Also available in: Atom PDF