Bug #63259: mds: failed to store backtrace and force file system read-only - CephFS - Ceph

Actions

Copy link

Bug #63259

open

mds: failed to store backtrace and force file system read-only

Added by Xiubo Li 7 months ago. Updated 3 months ago.

Status:

Fix Under Review

Priority:

High

Assignee:

Venky Shankar

Category:

Correctness/Safety

Target version:

Ceph - v19.0.0

% Done:

Source:

Tags:

Backport:

quincy,reef

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

55421

Crash signature (v1):

Crash signature (v2):

Description

https://pulpito.ceph.com/yuriw-2023-10-16_14:43:00-fs-wip-yuri4-testing-2023-10-11-0735-reef-distro-default-smithi/7429549/

From teuthology file:

2023-10-17T00:51:05.801 INFO:tasks.workunit.client.0.smithi114.stderr:+ set -e
2023-10-17T00:51:05.802 INFO:tasks.workunit.client.0.smithi114.stderr:+ sudo rsync -av /tmp/multiple_rsync_payload.271406 payload.1
2023-10-17T00:51:05.855 INFO:tasks.workunit.client.0.smithi114.stdout:sending incremental file list
2023-10-17T00:51:05.865 INFO:tasks.workunit.client.0.smithi114.stderr:rsync: mkdir "/home/ubuntu/cephtest/mnt.0/client.0/tmp/payload.1" failed: Read-only file system (30)
2023-10-17T00:51:05.866 INFO:tasks.workunit.client.0.smithi114.stderr:rsync error: error in file IO (code 11) at main.c(664) [Receiver=3.1.3]
2023-10-17T00:51:05.869 DEBUG:teuthology.orchestra.run:got remote process result: 11
2023-10-17T00:51:05.966 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0...

And from mds.b:

2023-10-17T00:50:12.583+0000 7fa035298700 10 mds.0.cache.ino(0x1000000c567) clear_dirty_parent
2023-10-17T00:50:12.583+0000 7fa035298700  1 mds.0.cache.ino(0x1000000c566) store backtrace error -2 v 36221
2023-10-17T00:50:12.583+0000 7fa035298700 -1 log_channel(cluster) log [ERR] : failed to store backtrace on ino 0x1000000c566 object, pool 2, errno -2
2023-10-17T00:50:12.583+0000 7fa035298700 -1 mds.0.14 unhandled write error (2) No such file or directory, force readonly...
2023-10-17T00:50:12.583+0000 7fa035298700  1 mds.0.cache force file system read-only
2023-10-17T00:50:12.583+0000 7fa035298700  0 log_channel(cluster) log [WRN] : force file system read-only
2023-10-17T00:50:12.583+0000 7fa035298700 10 mds.0.server force_clients_readonly
2023-10-17T00:50:12.583+0000 7fa035298700 10 mds.0.14 send_message_client client.15078 192.168.0.1:0/1906848847 client_session(force_ro) v5
2023-10-17T00:50:12.583+0000 7fa035298700  1 -- [v2:172.21.15.139:6838/1590276002,v1:172.21.15.139:6839/1590276002] --> 192.168.0.1:0/1906848847 -- client_session(force_ro) v5 -- 0x5630e2d58e00 con 0x5630ce089800
2023-10-17T00:50:12.583+0000 7fa035298700 10 mds.0.locker eval 3648 [inode 0x1000000c568 [...2,head] /volumes/_nogroup/sv_1/327a5477-e5c2-4ade-b24e-f477c29c079e/client.0/tmp/ auth v828 ap=1 DIRTYPARENT f() n(v0 1=0+1) (iauth excl) (inest lock) (ifile excl) (ixattr excl) (iversion lock) caps={15078=pAsxLsXsxFsx/-@1},l=15078 | request=0 dirfrag=0 caps=1 dirtyparent=1 dirty=0 authpin=1 0x5630dadb6580]
2023-10-17T00:50:12.583+0000 7fa035298700 10 mds.0.locker eval want loner: client.-1 but failed to set it
2023-10-17T00:50:12.583+0000 7fa035298700  7 mds.0.locker file_eval wanted= loner_wanted= other_wanted=  filelock=(ifile excl) on [inode 0x1000000c568 [...2,head] /volumes/_nogroup/sv_1/327a5477-e5c2-4ade-b24e-f477c29c079e/client.0/tmp/ auth v828 ap=1 DIRTYPARENT f() n(v0 1=0+1) (iauth excl) (inest lock) (ifile excl) (ixattr excl) (iversion lock) caps={15078=pAsxLsXsxFsx/-@1},l=15078(-1) | request=0 dirfrag=0 caps=1 dirtyparent=1 dirty=0 authpin=1 0x5630dadb6580]

It failed to store the backtrace and force the cephfs to be readonly.

Actions

Copy link

Updated by Milind Changire 6 months ago

Assignee set to Kotresh Hiremath Ravishankar

Actions

Copy link

Updated by Venky Shankar 6 months ago

Category set to Correctness/Safety
Status changed from New to Triaged
Target version set to v19.0.0
Backport set to quincy,reef
Component(FS) MDS added

Actions

Copy link

Updated by Kotresh Hiremath Ravishankar 6 months ago

Hi Xiubo,

The logs for the job link in the description is not matching the logs snippet provided by you.

I see the job has failed with following Traceback

2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: 2023-10-17T00:24:07.359+0000 7f3f16afe700 -1 log_channel(cephadm) log [ERR] : Can't communicate with remote host `172.21.15.70`, possibly because the host is not reac
hable or python3 is not installed on the host. [Errno 113] Connect call failed ('172.21.15.70', 22)
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: Traceback (most recent call last):
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 122, in redirect_log
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     yield
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 101, in _remote_connection
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     preferred_auth=['publickey'], options=ssh_options)
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib/python3.6/site-packages/asyncssh/connection.py", line 6804, in connect
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     'Opening SSH connection to')
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib/python3.6/site-packages/asyncssh/connection.py", line 299, in _connect
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     local_addr=local_addr)
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib64/python3.6/asyncio/base_events.py", line 794, in create_connection
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     raise exceptions[0]
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib64/python3.6/asyncio/base_events.py", line 781, in create_connection
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     yield from self.sock_connect(sock, address)
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib64/python3.6/asyncio/selector_events.py", line 439, in sock_connect
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     return (yield from fut)
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:   File "/lib64/python3.6/asyncio/selector_events.py", line 469, in _sock_connect_cb
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]:     raise OSError(err, 'Connect call failed %s' % (address,))
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: OSError: [Errno 113] Connect call failed ('172.21.15.70', 22)

And 25 % pgs are degraded


2023-10-17T00:23:52.324 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: pgmap v4: 129 pgs: 3 down, 32 active+clean, 40 active+undersized, 31 undersized+peered, 1 unknown, 16 active+undersized+degraded, 6 undersized+degraded+peered; 20 MiB data, 356 MiB used, 715 GiB / 715 GiB 
avail; 59/227 objects degraded (25.991%)
2023-10-17T00:23:52.324 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: Health check failed: Reduced data availability: 12 pgs inactive, 3 pgs down (PG_AVAILABILITY)
2023-10-17T00:23:52.325 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: Health check failed: Degraded data redundancy: 59/227 objects degraded (25.991%), 22 pgs degraded (PG_DEGRADED)
2023-10-17T00:23:52.325 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: mgrmap e29: z(active, since 2s), standbys: y
2023-10-17T00:23:52.401 INFO:journalctl@ceph.mon.c.smithi177.stdout:Oct 17 00:23:52 smithi177 ceph-mon[103807]: pgmap v4: 129 pgs: 3 down, 32 active+clean, 40 active+undersized, 31 undersized+peered, 1 unknown, 16 active+undersized+degraded, 6 undersized+degraded+peered; 20 MiB data, 356 MiB used, 715 GiB / 715 GiB 
avail; 59/227 objects degraded (25.991%)

And I also see the following on `smithi070`

2023-10-17T00:19:19.766315+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:19:19.766380+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:19:19.766415+00:00 smithi070 kernel: device brx.0 entered promiscuous mode
2023-10-17T00:19:19.776687+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:19:19.776728+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered forwarding state
2023-10-17T00:20:43.463574+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:43.474848+00:00 smithi070 kernel: device brx.0 left promiscuous mode
2023-10-17T00:20:43.474898+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:46.599641+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
2023-10-17T00:20:46.733074+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_UP): brx.0: link is not ready
2023-10-17T00:20:46.733124+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): brx.0: link becomes ready
2023-10-17T00:20:46.733145+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
2023-10-17T00:20:46.770505+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:20:46.770583+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:46.770609+00:00 smithi070 kernel: device brx.0 entered promiscuous mode
2023-10-17T00:20:46.782244+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:20:46.782290+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered forwarding state
2023-10-17T00:20:46.992612+00:00 smithi070 kernel: Key type dns_resolver registered
2023-10-17T00:20:47.020613+00:00 smithi070 kernel: Key type ceph registered
2023-10-17T00:20:47.027618+00:00 smithi070 kernel: libceph: loaded (mon/osd proto 15/24)
2023-10-17T00:20:47.069320+00:00 smithi070 kernel: ceph: loaded (mds proto 32)
2023-10-17T00:20:47.092579+00:00 smithi070 kernel: ceph: device name is missing path (no : separator in 0@b278b73a-6c81-11ee-8db6-212e2dc638e7.cephfs=/volumes/_nogroup/sv_1/01bcc01b-872b-44d5-ae90-55cda190fd63)
2023-10-17T00:20:47.101591+00:00 smithi070 kernel: libceph: mon1 (1)172.21.15.79:6789 session established
2023-10-17T00:20:47.109585+00:00 smithi070 kernel: libceph: client25127 fsid b278b73a-6c81-11ee-8db6-212e2dc638e7
2023-10-17T00:20:47.117605+00:00 smithi070 kernel: ceph: mds1 session blocklisted
2023-10-17T00:20:47.172627+00:00 smithi070 kernel: ceph: mds0 session blocklisted

Could you please double check ?

Actions

Copy link

Updated by Xiubo Li 6 months ago

Kotresh Hiremath Ravishankar wrote:

Hi Xiubo,

The logs for the job link in the description is not matching the logs snippet provided by you.

I see the job has failed with following Traceback

[...]

And 25 % pgs are degraded

[...]

And I also see the following on `smithi070`
[...]

Could you please double check ?

Hi Kotresh,

I noted this before, but this happened around 30 minutes early before the backtrace failure and during this period I didn't see any failure for the test, so I just thought that wasn't related to the rsync failure directly. maybe the above logs were expected for this teuthology test.

We need to found out what has caused the backtrace storing failure, maybe it's related to the pgs are degraded issue.

Actions

Copy link

Updated by Xiubo Li 6 months ago

Sorry, it should be this link https://pulpito.ceph.com/yuriw-2023-10-16_14:43:00-fs-wip-yuri4-testing-2023-10-11-0735-reef-distro-default-smithi/7429561/.

Actions

Copy link

Updated by Venky Shankar 4 months ago

Assignee changed from Kotresh Hiremath Ravishankar to Venky Shankar
Priority changed from Normal to High
Severity changed from 3 - minor to 2 - major

Reproduced again in main branch integration run: https://pulpito.ceph.com/vshankar-2024-01-10_15:00:23-fs-wip-vshankar-testing-20240103.072409-1-testing-default-smithi/7511468/

Kotresh, I'm taking this one.

Actions

Copy link

Updated by Venky Shankar 4 months ago

Venky Shankar wrote:

Reproduced again in main branch integration run: https://pulpito.ceph.com/vshankar-2024-01-10_15:00:23-fs-wip-vshankar-testing-20240103.072409-1-testing-default-smithi/7511468/

The backtrace update failure is this run is:

2024-01-11T17:50:32.510 INFO:journalctl@ceph.mds.b.smithi102.stdout:Jan 11 17:50:32 smithi102 ceph-aab47ca4-b0a7-11ee-95ab-87774f69a715-mds-b[72601]: 2024-01-11T17:50:32.190+0000 7effb2392700 -1 log_channel(cluster) log [ERR] : failed to store backtrace on ino 0x100000060f5 object, pool 2, errno -2
2024-01-11T17:50:32.510 INFO:journalctl@ceph.mds.b.smithi102.stdout:Jan 11 17:50:32 smithi102 ceph-aab47ca4-b0a7-11ee-95ab-87774f69a715-mds-b[72601]: 2024-01-11T17:50:32.190+0000 7effb2392700 -1 mds.0.14 unhandled write error (2) No such file or directory, force readonly...
2024-01-11T17:50:32.510 INFO:journalctl@ceph.mds.b.smithi102.stdout:Jan 11 17:50:32 smithi102 ceph-aab47ca4-b0a7-11ee-95ab-87774f69a715-mds-b[72601]: 2024-01-11T17:50:32.191+0000 7effb2392700 -1 log_channel(cluster) log [ERR] : failed to store backtrace on ino 0x100000060f7 object, pool 2, errno -2

Inode: 0x100000060f5 which is a directory, so the backtrace updation is on the metadata pool.

The OSD (osd.5) throws the following error:

2024-01-11T17:50:32.177+0000 7f9db5e1b700 15 bluestore(/var/lib/ceph/osd/ceph-5) getattr 2.d_head #2:b330f730:::100000060f5.00000000:head# _
2024-01-11T17:50:32.177+0000 7f9db5e1b700 20 bluestore(/var/lib/ceph/osd/ceph-5).collection(2.d_head 0x55dcc713c1e0) get_onode oid #2:b330f730:::100000060f5.00000000:head# key 0x7F8000000000000002B330F7'0!100000060f5.00000000!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F
2024-01-11T17:50:32.177+0000 7f9db3616700 10 osd.5 pg_epoch: 93 pg[2.12( v 89'418 (0'0,89'418] local-lis/les=78/79 n=6 ec=78/78 lis/c=78/78 les/c/f=79/79/0 sis=78) [5,3,7] r=0 lpr=78 crt=89'418 lcod 89'417 mlcod 89'417 active+clean]  final snaps
et 0=[]:{} in 2:4bec3886:::10000000002.00000000:head
2024-01-11T17:50:32.177+0000 7f9db3616700 20 osd.5 pg_epoch: 93 pg[2.12( v 89'418 (0'0,89'418] local-lis/les=78/79 n=6 ec=78/78 lis/c=78/78 les/c/f=79/79/0 sis=78) [5,3,7] r=0 lpr=78 crt=89'418 lcod 89'417 mlcod 89'417 active+clean] finish_ctx o
bject 2:4bec3886:::10000000002.00000000:head marks clean_regions clean_offsets: [(0, 18446744073709551615)], clean_omap: false, new_object: false
2024-01-11T17:50:32.177+0000 7f9db5e1b700 20 bluestore(/var/lib/ceph/osd/ceph-5).collection(2.d_head 0x55dcc713c1e0)  r -2 v.len 0
2024-01-11T17:50:32.177+0000 7f9db5e1b700 10 bluestore(/var/lib/ceph/osd/ceph-5) getattr 2.d_head #2:b330f730:::100000060f5.00000000:head# _ = -2

which means it failed to get the onode for the object which happens in BlueStore.cc:;get_onode():

  bufferlist v;
  int r = -ENOENT;
  Onode *on;
  if (!is_createop) {
    r = store->db->get(PREFIX_OBJ, key.c_str(), key.size(), &v);
    ldout(store->cct, 20) << " r " << r << " v.len " << v.length() << dendl;
  }
  if (v.length() == 0) {
    ceph_assert(r == -ENOENT);
    if (!create)
      return OnodeRef();
  } else {
    ceph_assert(r >= 0);
  }

I.e., an OnodeRef() is returned to the caller (getattr) which returns back -ENOENT:

    OnodeRef o = c->get_onode(oid, false);
    if (!o || !o->exists) {
      r = -ENOENT;
      goto out;
    }

RADOS is unable to find the object and since the backtrace updation operation is done as follows:

void CInodeCommitOperation::update(ObjectOperation &op, inode_backtrace_t &bt) {
  using ceph::encode;

  op.priority = priority;
  op.create(false);

  bufferlist parent_bl;
  encode(bt, parent_bl);
  op.setxattr("parent", parent_bl);

I.e., with op.create(false), the object is expected to exist - which is correct. But for some reason, the directory object is missing o_O

Actions

Copy link

Updated by Venky Shankar 4 months ago

Venky Shankar wrote:

Venky Shankar wrote:

Reproduced again in main branch integration run: https://pulpito.ceph.com/vshankar-2024-01-10_15:00:23-fs-wip-vshankar-testing-20240103.072409-1-testing-default-smithi/7511468/

The backtrace update failure is this run is:

[...]

Inode: 0x100000060f5 which is a directory, so the backtrace updation is on the metadata pool.

The OSD (osd.5) throws the following error:

[...]

which means it failed to get the onode for the object which happens in BlueStore.cc:;get_onode():

[...]

I.e., an OnodeRef() is returned to the caller (getattr) which returns back -ENOENT:

[...]

RADOS is unable to find the object and since the backtrace updation operation is done as follows:

[...]

I.e., with op.create(false), the object is expected to exist - which is correct. But for some reason, the directory object is missing o_O

I might be misreading this - exclusive=false implies that the operation continues even if the object exists.

Actions

Copy link

Updated by Venky Shankar 4 months ago

I'll continue debugging this tomorrow given than now the "-2" from rados is likely not the actual problem.

Actions

Copy link

#10

Updated by Venky Shankar 3 months ago

Couldn't get to this today. Will continue tomorrow.

Actions

Copy link

#11

Updated by Venky Shankar 3 months ago

So, the issue is that one of the commit operation in the set of backtrace to update failed with ENOENT since the previous test added a data pool, create a file, deleted it and then removed the data pool. The mdlog had reference to the (now gone) data pool for which backtrace updation fails (pool nuked). Since the commit ops use C_Gather, the error from one of the failed commit ops gets trickled to every other commit op in the set.

Actions

Copy link

#12

Updated by Venky Shankar 3 months ago

One way to solve this would be to "split" the backtrace commit operation based on file or directory. File backtrace updates go to the data pool (which can be removed) and directory backtrace updates go to the metadata pool - which technically can be removed too, but if users choose to shoot themselves in the foot, then let them :)

But that does not totally avoid the problem, since files can have different layouts and the errno can then trickle to the commit ops for which the data pool exists. So, we could split the backtrace commit based on the pool-id, but maybe that's too much to solve this.

Actions

Copy link

#13

Updated by Venky Shankar 3 months ago

Venky Shankar wrote:

One way to solve this would be to "split" the backtrace commit operation based on file or directory. File backtrace updates go to the data pool (which can be removed) and directory backtrace updates go to the metadata pool - which technically can be removed too, but if users choose to shoot themselves in the foot, then let them :)

But that does not totally avoid the problem, since files can have different layouts and the errno can then trickle to the commit ops for which the data pool exists. So, we could split the backtrace commit based on the pool-id, but maybe that's too much to solve this.

Slight correction - the ops_vec vector is for a CInode, so the set of updates it tracks is the backtrace for the file and dirty parent and that's where I believe the issue is stemming from.

Actions

Copy link

#14

Updated by Xiubo Li 3 months ago

Venky Shankar wrote:

So, the issue is that one of the commit operation in the set of backtrace to update failed with ENOENT since the previous test added a data pool, create a file, deleted it and then removed the data pool. The mdlog had reference to the (now gone) data pool for which backtrace updation fails (pool nuked). Since the commit ops use C_Gather, the error from one of the failed commit ops gets trickled to every other commit op in the set.

Venky, doesn't this a correct operation to mark the filesystem to be read-only in case the corresponding the data pool was deleted ?

Actions

Copy link

#15

Updated by Venky Shankar 3 months ago

Xiubo Li wrote:

Venky Shankar wrote:

So, the issue is that one of the commit operation in the set of backtrace to update failed with ENOENT since the previous test added a data pool, create a file, deleted it and then removed the data pool. The mdlog had reference to the (now gone) data pool for which backtrace updation fails (pool nuked). Since the commit ops use C_Gather, the error from one of the failed commit ops gets trickled to every other commit op in the set.

Venky, doesn't this a correct operation to mark the filesystem to be read-only in case the corresponding the data pool was deleted ?

There is special handling for ENOENT where the operation is treated as a success and rightly so since cephfs allowed removing a data pool (but an event in mdlog can have a reference to it and fail when the event is flushed out at a later point in time).

Actions

Copy link

#16

Updated by Venky Shankar 3 months ago

Status changed from Triaged to Fix Under Review
Pull request ID set to 55421

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #63259

mds: failed to store backtrace and force file system read-only

Updated by Milind Changire 6 months ago

Updated by Venky Shankar 6 months ago

Updated by Kotresh Hiremath Ravishankar 6 months ago

Updated by Xiubo Li 6 months ago

Updated by Xiubo Li 6 months ago

Updated by Venky Shankar 4 months ago

Updated by Venky Shankar 4 months ago

Updated by Venky Shankar 4 months ago

Updated by Venky Shankar 4 months ago

Updated by Venky Shankar 3 months ago

Updated by Venky Shankar 3 months ago

Updated by Venky Shankar 3 months ago

Updated by Venky Shankar 3 months ago

Updated by Xiubo Li 3 months ago

Updated by Venky Shankar 3 months ago

Updated by Venky Shankar 3 months ago