Bug #58082: cephfs:filesystem became read only after Quincy upgrade - CephFS - Ceph

Actions

Copy link

Bug #58082

closed

cephfs:filesystem became read only after Quincy upgrade

Added by Xiubo Li over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Xiubo Li

Category:

Correctness/Safety

Target version:

Ceph - v18.0.0

% Done:

100%

Source:

Tags:

backport_processed

Backport:

quincy,pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

49048

Crash signature (v1):

Crash signature (v2):

Description

Copy the info from ceph-user mail list by Adrien:

Hi,

We upgraded this morning a Pacific Ceph cluster to the last Quincy version.
The cluster was healthy before the upgrade, everything was done according to the upgrade procedure (non-cephadm) [1], all services have restarted correctly but the filesystem switched to read only mode when it became active.
|
||HEALTH_WARN 1 MDSs are read only||
||[WRN] MDS_READ_ONLY: 1 MDSs are read only||
||    mds.cccephadm32(mds.0): MDS in read-only mode|

This is the only warning we got on the cluster.
In the MDS log, this error "failed to commit dir 0x1 object, errno -22" seems to be the root cause :
|
||2022-11-23T12:41:09.843+0100 7f930f56d700 -1 log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22||
||2022-11-23T12:41:09.843+0100 7f930f56d700 -1 mds.0.11963 unhandled write error (22) Invalid argument, force readonly...||
||2022-11-23T12:41:09.843+0100 7f930f56d700  1 mds.0.cache force file system read-only||
||2022-11-23T12:41:09.843+0100 7f930f56d700  0 log_channel(cluster) log [WRN] : force file system read-only||
||2022-11-23T12:41:09.843+0100 7f930f56d700 10 mds.0.server force_clients_readonly|

I couldn't get more info with ceph config set mds.x debug_mds 20

|ceph fs status||
||cephfs - 17 clients||
||======||
||RANK  STATE       MDS         ACTIVITY     DNS INOS   DIRS   CAPS ||
|| 0    active  cccephadm32  Reqs:    0 /s  12.9k 12.8k   673   1538 ||
||      POOL         TYPE     USED  AVAIL ||
||cephfs_metadata  metadata   513G  48.6T ||
||  cephfs_data      data    2558M  48.6T ||
||  cephfs_data2     data     471G  48.6T ||
||  cephfs_data3     data     433G  48.6T ||
||STANDBY MDS ||
||cccephadm30 ||
||cccephadm31 ||
||MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)|

Any idea what could go wrong and how to solve it before starting a disaster recovery procedure?

Cheers,
Adrien

This bug looks very similar to this issue opened last year and closed without any solution : https://tracker.ceph.com/issues/52260

Hi Xiubo,

We did the upgrade in rolling mode as always, with only few kubernetes pods as clients accessing their PVC on CephFS.

I can reproduce the problem everytime I restart the MDS daemon.
You can find the MDS log with debug_mds 25 and debug_ms 1 here : https://filesender.renater.fr/?s=download&token=4b413a71-480c-4c1a-b80a-7c9984e4decd
(The last timestamp : 2022-11-24T09:18:12.965+0100 7fe02ffe2700 10 mds.0.server force_clients_readonly)

I couldn't find any errors in the OSD logs, anything specific should I looking for?

Best,
Adrien

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #58082

cephfs:filesystem became read only after Quincy upgrade

Updated by Xiubo Li over 1 year ago

Updated by Xiubo Li over 1 year ago

Updated by Konstantin Shalygin over 1 year ago

Updated by Xiubo Li over 1 year ago

Updated by Prayank Saxena over 1 year ago

Updated by Prayank Saxena over 1 year ago

Updated by Xiubo Li over 1 year ago

Updated by Prayank Saxena over 1 year ago

Updated by Xiubo Li over 1 year ago

Updated by Prayank Saxena over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Konstantin Shalygin about 1 year ago

Updated by Patrick Donnelly about 2 months ago