Bug #55842: Upgrading to 16.2.9 with 9M strays files causes MDS OOM - CephFS - Ceph

Actions

Copy link

Bug #55842

open

Upgrading to 16.2.9 with 9M strays files causes MDS OOM

Added by Arthur Outhenin-Chalandre almost 2 years ago. Updated 7 months ago.

Status:

Triaged

Priority:

Normal

Assignee:

Venky Shankar

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

Yesterday we upgraded one of our test cephfs clusters from 16.2.6 to 16.2.9. This cluster was mainly used to test and stress test snapshots in CephFS.
This cluster had about 9M strays a few connected clients but not much active. We have only one active mds and when we upgraded it, it went OOM. We are collocating MDS and OSD in this case, so after we stopped the OSD to free up some RAM for the MDS, it started successfully using 70G of memory at his peak and then the resident memory went back to about 1G.

I don't have massive evidences of this but as this cluster had a huge number of strays and that's mainly the only specific thing about this cluster, I am suspecting the following PR https://github.com/ceph/ceph/pull/44342 to have caused this issue. Also the number of stray file decreased to 0 after the upgrade so I believe we shouldn't have this problem anymore...

Actions

Copy link

Updated by Venky Shankar almost 2 years ago

Category set to Correctness/Safety
Status changed from New to Triaged
Assignee set to Venky Shankar
Target version set to v18.0.0

Actions

Copy link

Updated by Patrick Donnelly almost 2 years ago

Do you know what state the MDS was in (up:replay?) when its memory ballooned to 70G?

Actions

Copy link

Updated by Arthur Outhenin-Chalandre almost 2 years ago

Patrick Donnelly wrote:

Do you know what state the MDS was in (up:replay?) when its memory ballooned to 70G?

Apparently it was up:active if I check the logs (also during that time I wasn't able to issue a `ceph fs status` as it was hanging somehow).

Here some logs of a MDS failing to fully start during the upgrade:

2022-06-02T11:24:30.325+0200 7fbb4fd2c900  0 ceph version 16.2.9-1 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process ceph-mds, pid 1754985
2022-06-02T11:24:30.325+0200 7fbb4fd2c900  1 main not setting numa affinity
2022-06-02T11:24:30.325+0200 7fbb4fd2c900  0 pidfile_write: ignore empty --pid-file
2022-06-02T11:24:30.342+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19408 from mon.2
2022-06-02T11:24:30.824+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19409 from mon.2
2022-06-02T11:24:30.824+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Monitors have assigned me to become a standby.
2022-06-02T11:24:30.830+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19410 from mon.2
2022-06-02T11:24:30.832+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map i am now mds.0.19410
2022-06-02T11:24:30.832+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map state change up:boot --> up:replay
2022-06-02T11:24:30.832+0200 7fbb3cf18700  1 mds.0.19410 replay_start
2022-06-02T11:24:30.832+0200 7fbb3cf18700  1 mds.0.19410  waiting for osdmap 77425 (which blocklists prior instance)
2022-06-02T11:24:30.832+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 446576, rss 29760, heap 315644, baseline 315644, 0 / 0 inodes have caps, 0 caps, 0 caps per inode
2022-06-02T11:24:30.833+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: opening inotable
2022-06-02T11:24:30.834+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: opening sessionmap
2022-06-02T11:24:30.834+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: opening mds log
2022-06-02T11:24:30.834+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: opening purge queue (async)
2022-06-02T11:24:30.834+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: loading open file table (async)
2022-06-02T11:24:30.834+0200 7fbb36f0c700  2 mds.0.19410 Booting: 0: opening snap table
2022-06-02T11:24:30.884+0200 7fbb3670b700  2 mds.0.19410 Booting: 1: loading/discovering base inodes
2022-06-02T11:24:30.884+0200 7fbb3670b700  0 mds.0.cache creating system inode with ino:0x100
2022-06-02T11:24:30.884+0200 7fbb3670b700  0 mds.0.cache creating system inode with ino:0x1
2022-06-02T11:24:30.914+0200 7fbb36f0c700  2 mds.0.19410 Booting: 2: replaying mds log
2022-06-02T11:24:30.914+0200 7fbb36f0c700  2 mds.0.19410 Booting: 2: waiting for purge queue recovered
2022-06-02T11:24:31.188+0200 7fbb36f0c700  1 mds.0.19410 Finished replaying journal
2022-06-02T11:24:31.188+0200 7fbb36f0c700  1 mds.0.19410 making mds journal writeable
2022-06-02T11:24:31.188+0200 7fbb36f0c700  2 mds.0.19410 i am alone, moving to state reconnect
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19411 from mon.2
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map i am now mds.0.19410
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map state change up:replay --> up:reconnect
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.0.19410 reconnect_start
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.0.19410 reopen_log
2022-06-02T11:24:31.828+0200 7fbb3cf18700  1 mds.0.server reconnect_clients -- 5 sessions
2022-06-02T11:24:31.829+0200 7fbb3cf18700  0 log_channel(cluster) log [DBG] : reconnect by client.97604720 10.116.5.17:0/3422752518 after 0.00100001
2022-06-02T11:24:31.833+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 497308, rss 58416, heap 340220, baseline 315644, 1 / 3 inodes have caps, 1 caps, 0.333333 caps per inode
2022-06-02T11:24:31.834+0200 7fbb3cf18700  0 log_channel(cluster) log [DBG] : reconnect by client.73360997 137.138.52.43:0/2137110702 after 0.00600009
2022-06-02T11:24:31.841+0200 7fbb3cf18700  0 log_channel(cluster) log [DBG] : reconnect by client.93526775 v1:137.138.149.42:0/3186712137 after 0.0130002
2022-06-02T11:24:31.842+0200 7fbb3cf18700  0 log_channel(cluster) log [DBG] : reconnect by client.19010395 v1:137.138.157.69:0/158998062 after 0.0140002
2022-06-02T11:24:31.842+0200 7fbb3cf18700  0 log_channel(cluster) log [DBG] : reconnect by client.93579429 v1:137.138.149.42:0/3881963723 after 0.0140002
2022-06-02T11:24:31.842+0200 7fbb3cf18700  1 mds.0.19410 reconnect_done
2022-06-02T11:24:32.013+0200 7fbb3cf18700  0 ms_deliver_dispatch: unhandled message 0x55ab432a2160 client_metrics [client_metric_type: READ_LATENCY latency: 0.000000][client_metric_type: WRITE_LATENCY latency: 0.000000][client_metric_type: METADATA_LATENCY latency: 2611.415517][client_metric_type: CAP_INFO cap_hits: 1463 cap_misses: 473 num_caps: 0][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 251 num_dentries: 0][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 2][client_metric_type: PINNED_ICAPS pinned_icaps: 81 total_inodes: 2][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 2][client_metric_type: READ_IO_SIZES total_ops: 0 total_size: 0][client_metric_type: WRITE_IO_SIZES total_ops: 0 total_size: 0] v1 from client.97604720 10.116.5.17:0/3422752518
2022-06-02T11:24:32.830+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19412 from mon.2
2022-06-02T11:24:32.830+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map i am now mds.0.19410
2022-06-02T11:24:32.830+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map state change up:reconnect --> up:rejoin
2022-06-02T11:24:32.830+0200 7fbb3cf18700  1 mds.0.19410 rejoin_start
2022-06-02T11:24:32.830+0200 7fbb3cf18700  1 mds.0.19410 rejoin_joint_start
2022-06-02T11:24:32.833+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 497308, rss 58416, heap 340220, baseline 315644, 1 / 40 inodes have caps, 2 caps, 0.05 caps per inode
2022-06-02T11:24:32.850+0200 7fbb36f0c700  1 mds.0.19410 rejoin_done
2022-06-02T11:24:32.850+0200 7fbb36f0c700  1 mds.0.19410 rejoin_done
2022-06-02T11:24:33.013+0200 7fbb3cf18700  0 ms_deliver_dispatch: unhandled message 0x55ab432a26e0 client_metrics [client_metric_type: READ_LATENCY latency: 0.000000][client_metric_type: WRITE_LATENCY latency: 0.000000][client_metric_type: METADATA_LATENCY latency: 2611.415517][client_metric_type: CAP_INFO cap_hits: 1463 cap_misses: 473 num_caps: 0][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 251 num_dentries: 0][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 2][client_metric_type: PINNED_ICAPS pinned_icaps: 81 total_inodes: 2][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 2][client_metric_type: READ_IO_SIZES total_ops: 0 total_size: 0][client_metric_type: WRITE_IO_SIZES total_ops: 0 total_size: 0] v1 from client.97604720 10.116.5.17:0/3422752518
2022-06-02T11:24:33.150+0200 7fbb3cf18700  0 ms_deliver_dispatch: unhandled message 0x55ab432a2160 client_metrics [client_metric_type: CAP_INFO cap_hits: 28150 cap_misses: 6 num_caps: 1][client_metric_type: READ_LATENCY latency: 0.000000][client_metric_type: WRITE_LATENCY latency: 0.000000][client_metric_type: METADATA_LATENCY latency: 2036-08-03T03:39:52.999000+0200][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 0 num_dentries: 1] v1 from client.93579429 v1:137.138.149.42:0/3881963723
2022-06-02T11:24:33.150+0200 7fbb3cf18700  0 ms_deliver_dispatch: unhandled message 0x55ab432a2840 client_metrics [client_metric_type: CAP_INFO cap_hits: 80200883 cap_misses: 73 num_caps: 1][client_metric_type: READ_LATENCY latency: 0.000000][client_metric_type: WRITE_LATENCY latency: 1996-10-13T11:51:43.594725+0200][client_metric_type: METADATA_LATENCY latency: 2035-10-05T21:59:31.045000+0200][client_metric_type: DENTRY_LEASE dlease_hits: 62579392 dlease_misses: 88886 num_dentries: 1] v1 from client.93526775 v1:137.138.149.42:0/3186712137
2022-06-02T11:24:33.835+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 497308, rss 58416, heap 340220, baseline 315644, 5 / 46 inodes have caps, 6 caps, 0.130435 caps per inode
2022-06-02T11:24:33.836+0200 7fbb3cf18700  1 mds.cephdata20b-50d9d7f698 Updating MDS map to version 19413 from mon.2
2022-06-02T11:24:33.836+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map i am now mds.0.19410
2022-06-02T11:24:33.836+0200 7fbb3cf18700  1 mds.0.19410 handle_mds_map state change up:rejoin --> up:active
2022-06-02T11:24:33.836+0200 7fbb3cf18700  1 mds.0.19410 recovery_done -- successful recovery!
2022-06-02T11:24:33.836+0200 7fbb3cf18700  1 mds.0.19410 active_start
2022-06-02T11:24:33.837+0200 7fbb3cf18700  1 mds.0.19410 cluster recovered.
2022-06-02T11:24:34.835+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 513700, rss 59480, heap 356604, baseline 315644, 5 / 56 inodes have caps, 6 caps, 0.107143 caps per inode
2022-06-02T11:24:35.835+0200 7fbb3a713700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2022-06-02T11:24:35.835+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 672484, rss 218460, heap 356604, baseline 315644, 5 / 30240 inodes have caps, 6 caps, 0.000198413 caps per inode
2022-06-02T11:24:36.846+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 989276, rss 536876, heap 356604, baseline 315644, 5 / 86910 inodes have caps, 6 caps, 6.90369e-05 caps per inode
2022-06-02T11:24:37.858+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 1302660, rss 852856, heap 356604, baseline 315644, 5 / 143035 inodes have caps, 6 caps, 4.19478e-05 caps per inode
2022-06-02T11:24:38.895+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 1625084, rss 1170260, heap 356604, baseline 315644, 5 / 199298 inodes have caps, 6 caps, 3.01057e-05 caps per inode

[...]

2022-06-02T11:26:27.023+0200 7fbb38f10700  2 mds.0.cache Memory usage:  total 33626612, rss 33168268, heap 356604, baseline 315644, 5 / 5891826 inodes have caps, 6 caps, 1.01836e-06 caps per inode
2022-06-02T11:26:44.855+0200 7fbb3af14700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s
2022-06-02T11:26:44.855+0200 7fbb3af14700  0 mds.beacon.cephdata20b-50d9d7f698 Skipping beacon heartbeat to monitors (last acked 4.00006s ago); MDS internal heartbeat is not healthy!
2022-06-02T11:26:45.355+0200 7fbb3af14700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s
2022-06-02T11:26:45.355+0200 7fbb3af14700  0 mds.beacon.cephdata20b-50d9d7f698 Skipping beacon heartbeat to monitors (last acked 4.50007s ago); MDS internal heartbeat is not healthy!

[...]

Actions

Copy link