Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-11-09T09:06:19ZCeph
Redmine CephFS - Bug #48148 (Triaged): mds: Server.cc:6764 FAILED assert(in->filelock.can_read(mdr->get_c...https://tracker.ceph.com/issues/481482020-11-09T09:06:19Zwei qiaomiaowei.qiaomiao@zte.com.cn
<p>In my cluster with a single MDS, ceph version is 12.2.13, Assert will be encountered when a large number of deletion operations are performed. Now I can't reproduce it, so I didn't catch more logs.<br /><strong>backtrace:</strong><br /><pre>
0> 2020-11-03 15:32:35.316352 7f47dd5aa700 -1 /share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f47dd5aa700 time 2020-11-03 15:32:35.311722
/share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: 6783: FAILED assert(in->filelock.can_read(mdr->get_client()))
ceph version 12.2.13-1-560-g87ea0b6 (87ea0b6e94eaa3544572dd676db0e8932f56d7a8) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55c2f7b60640]
2: (Server::_dir_is_nonempty(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x1a8) [0x55c2f7813ab8]
3: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x13df) [0x55c2f7843bef]
4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xdb9) [0x55c2f7869479]
5: (MDSInternalContextBase::complete(int)+0x1fb) [0x55c2f7a9c34b]
6: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x16c) [0x55c2f77c10bc]
7: (MDSCacheObject::finish_waiting(unsigned long, int)+0x46) [0x55c2f7ab66e6]
8: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >*)+0x124f) [0x55c2f798958f]
9: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x341) [0x55c2f798b261]
10: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x55c2f798f14c]
11: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x55c2f798f586]
12: (Locker::scatter_writebehind_finish(ScatterLock*, boost::intrusive_ptr<MutationImpl>&)+0xd0) [0x55c2f798f6d0]
13: (MDSIOContextBase::complete(int)+0xa5) [0x55c2f7a9c4e5]
14: (MDSLogContextBase::complete(int)+0x3c) [0x55c2f7a9caec]
15: (Finisher::finisher_thread_entry()+0x198) [0x55c2f7b5f2d8]
16: (()+0x7e65) [0x7f47e9f64e65]
17: (clone()+0x6d) [0x7f47e92588ad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre></p>
<p>I have a suspicion. I'm not sure whether this problem is related to it.<br />In Server::handle_client_unlinkļ¼first, rdlock the filelock of the inode to be deleted, and then in Server::_dir_is_nonempty, confirm that filelock can read through "assert(in->filelock.can_read(mdr->get_client()))".But filelock allows can_rdlock but not allow can_read in two states, like the following two states:<br /><pre>
[LOCK_EXCL] = { 0, true, LOCK_LOCK, 0, 0, XCL, XCL, 0, 0, 0,
[LOCK_EXCL_XSYN] = { LOCK_XSYN, false, LOCK_LOCK, 0, 0, XCL, 0, 0, 0, 0,
</pre></p>
<p>If the filelock is in these two states, will the above assert appears?</p> CephFS - Documentation #47449 (New): doc: complete ec pool configuration section with an examplehttps://tracker.ceph.com/issues/474492020-09-14T20:30:29ZPatrick Donnellypdonnell@redhat.com
<p><a class="external" href="https://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs">https://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs</a></p>
<p>The section should provide a complete example.</p> CephFS - Bug #46535 (In Progress): mds: Importer MDS failing right after EImportStart event is jo...https://tracker.ceph.com/issues/465352020-07-14T13:23:15ZSidharth Anupkrishnan
<p>An MDS hitting mds_kill_import_at = 7 (after EImportStart is journaled but before sending the ImportAck to the exporter), during import and subsequent taking over by a standby MDS causes will cause imported client session to be blacklisted. The reason for this is that before this killpoint (<a class="external" href="https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L3026">https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L3026</a>) is hit, there is a prepare_force_open_sessions() method being called(<a class="external" href="https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L2699">https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L2699</a>) in handle_export_dir() and this call marks a dirty open session which later gets persisted as<br />part of the journal event EImportStart. Now during journal replay of the new MDS, this information is relayed to the new MDS (during EImportStart::replay()) and the new MDS thinks that there is an open session with the client whereas in reality, that session was never really opened with the client. Now during up:reconnect, it tries to reconnect with the client and gets no response and ends up blacklisting the client.</p>
<p>We probably want to try to force open the dirty session during EImportStart replay.</p> CephFS - Bug #45538 (Triaged): qa: Fix string/byte comparison mismatch in test_exportshttps://tracker.ceph.com/issues/455382020-05-13T19:59:49ZSidharth Anupkrishnan
<p>mount.getfattr() returns string rather than bytes after <a class="external" href="https://github.com/ceph/ceph/pull/34941">https://github.com/ceph/ceph/pull/34941</a>. This produces assertion failure for</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="predefined-constant">self</span>.assertEqual(<span class="predefined-constant">self</span>.mount_a.getfattr(<span class="string"><span class="delimiter">"</span><span class="content">1</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">ceph.dir.pin</span><span class="delimiter">"</span></span>), <span class="binary"><span class="modifier">b</span><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>)
</span></code></pre> CephFS - Bug #42491 (New): "probably no MDS server is up?" in upgrade:jewel-x-wip-yuri-luminous_1...https://tracker.ceph.com/issues/424912019-10-25T14:39:38ZYuri Weinsteinyweinste@redhat.com
<p><a class="external" href="http://qa-proxy.ceph.com/teuthology/yuriw-2019-10-23_19:22:44-upgrade:jewel-x-wip-yuri-luminous_10.22.19-distro-basic-smithi/4438822/teuthology.log">http://qa-proxy.ceph.com/teuthology/yuriw-2019-10-23_19:22:44-upgrade:jewel-x-wip-yuri-luminous_10.22.19-distro-basic-smithi/4438822/teuthology.log</a></p>
<pre>
2019-10-23T21:17:21.038 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-10-23T21:17:21.038 INFO:teuthology.orchestra.run.smithi027:Running:
2019-10-23T21:17:21.038 INFO:teuthology.orchestra.run.smithi027:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections
2019-10-23T21:17:21.137 DEBUG:teuthology.orchestra.run:got remote process result: 32
2019-10-23T21:17:21.165 INFO:teuthology.orchestra.run.smithi027.stderr:mount: /sys/fs/fuse/connections is already mounted or /sys/fs/fuse/connections busy
2019-10-23T21:17:21.166 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-10-23T21:17:21.166 INFO:teuthology.orchestra.run.smithi027:Running:
2019-10-23T21:17:21.166 INFO:teuthology.orchestra.run.smithi027:> ls /sys/fs/fuse/connections
2019-10-23T21:17:21.228 INFO:tasks.cephfs.fuse_mount.ceph-fuse.3.smithi027.stderr:2019-10-23 21:17:21.296881 7fa1889920c0 -1 init, newargv = 0x563e74a6d380 newargc=9ceph-fuse[11945]: starting ceph client
2019-10-23T21:17:21.228 INFO:tasks.cephfs.fuse_mount.ceph-fuse.3.smithi027.stderr:
2019-10-23T21:17:21.232 INFO:tasks.cephfs.fuse_mount.ceph-fuse.3.smithi027.stderr:ceph-fuse[11945]: probably no MDS server is up?
2019-10-23T21:17:21.232 INFO:tasks.cephfs.fuse_mount.ceph-fuse.3.smithi027.stderr:ceph-fuse[11945]: ceph mount failed with (65536) Unknown error 65536
2019-10-23T21:17:21.364 INFO:tasks.cephfs.fuse_mount.ceph-fuse.3.smithi027.stderr:daemon-helper: command failed with exit status 1
2019-10-23T21:17:22.338 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-10-23T21:17:22.338 INFO:teuthology.orchestra.run.smithi027:Running:
2019-10-23T21:17:22.338 INFO:teuthology.orchestra.run.smithi027:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections
2019-10-23T21:17:22.440 DEBUG:teuthology.orchestra.run:got remote process result: 32
</pre> CephFS - Feature #36663 (In Progress): mds: adjust cache memory limit automatically via target th...https://tracker.ceph.com/issues/366632018-10-31T18:34:56ZPatrick Donnellypdonnell@redhat.com
<p>Basic idea is to have a new config like `mds_memory_target` that, if set, automatically adjusts `mds_cache_memory_limit` in response to RSS memory usage.</p>
<p>Recent experiments found that real RSS usage is about 1.25x the `mds_cache_memory_limit`. So, another layer which tracks RSS and makes finer adjustments would be valuable.</p> CephFS - Documentation #23897 (In Progress): doc: create snapshot user dochttps://tracker.ceph.com/issues/238972018-04-27T03:55:51ZPatrick Donnellypdonnell@redhat.com
<p>Include suggested upgrade procedure: <a class="external" href="https://github.com/ceph/ceph/pull/21374/commits/e05ebd08ea895626f4a2a52805f17e61f7c2edab">https://github.com/ceph/ceph/pull/21374/commits/e05ebd08ea895626f4a2a52805f17e61f7c2edab</a></p>
<p>and refer to this new doc from the PendingReleaseNotes. Add warning to <a class="external" href="http://docs.ceph.com/docs/luminous/cephfs/upgrading/#upgrading-the-mds-cluster">http://docs.ceph.com/docs/luminous/cephfs/upgrading/#upgrading-the-mds-cluster</a> for upgrades from pre-Mimic to Mimic clusters.</p> CephFS - Feature #17835 (Fix Under Review): mds: enable killpoint tests for MDS-MDS subtree exporthttps://tracker.ceph.com/issues/178352016-11-09T13:43:27ZJohn Sprayjcspray@gmail.com
<pre>
OPTION(mds_kill_export_at, OPT_INT, 0)
OPTION(mds_kill_import_at, OPT_INT, 0)
</pre>
<p>I guess we should iterate through the valid values of these settings, triggering an export each time and verifying that the active MDS crashes, and standby makes it into active state.</p> CephFS - Feature #17532 (New): qa: repeated "rsync --link-dest" workloadhttps://tracker.ceph.com/issues/175322016-10-07T11:41:50ZJohn Sprayjcspray@gmail.com
<p>Related but distinct: <a class="external" href="http://tracker.ceph.com/issues/17434">http://tracker.ceph.com/issues/17434</a></p>
<p>We should have a test that uses the --link-dest feature of rsync to generate lots of hard links, and also reads them back, and also deletes some old backup folders so that the original files have gone away and we're exercising our "reintegration" of hardlinks the next time they're read from a more recent backup.</p> CephFS - Feature #17434 (Fix Under Review): qa: background rsync task for FS workunitshttps://tracker.ceph.com/issues/174342016-09-29T11:48:58ZJohn Sprayjcspray@gmail.com
<p>A client that just sits there trying to rsync the contents of the filesystem. Running this at the same time as any other workunit should enable us to get some basic coverage of multi-client access to the same files.</p>
<p>Of course, it's also a very realistic workload! This is a stress test rather than a full correctness test, as when we rsync from a filesystem being modified we don't have a straightforward way to verify that what got copied is right.</p>
<p>For bonus points, do a version of this that does a "snapshot, rsync from snapshot, delete snapshot" cycle, which is also a super-realistic workload that would exercise multi-client and snapshots.</p>