Project

General

Profile

Activity

From 01/06/2017 to 02/04/2017

02/04/2017

07:12 AM Bug #18816: MDS crashes with log disabled
test Ahmed Akhuraidah
06:51 AM Bug #18816 (Resolved): MDS crashes with log disabled
Note the "mds_log = false" below. If you do that, this happens:
Have crushed MDS daemon during executing different...
Ahmed Akhuraidah

02/03/2017

05:30 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Cache dump is here:
https://knowledgenetworkbc-my.sharepoint.com/personal/darrelle_knowledge_ca/_layouts/15/guesta...
Darrell Enns
07:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Does the mds cache dump need to be done while it is hung? It's a production system, so I wasn't able to leave it in a... Darrell Enns
06:57 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0c61cy Zheng Yan
01:40 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
please run 'ceph daemon mds.kb-ceph03 dump cache /tmp/cachedump.0' and upload /tmp/cachedump.0. Besides, please check... Zheng Yan
12:30 PM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
Comparing logs I noticed that MDS clock is ~30s behind client. ntpd was dead on one of test servers... Will try to co... Henrik Korkuc
10:17 AM Bug #18802: Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003:...
Actually, one more thing: following this crash, the clients did not fail over to the standby MDS. Processes accessing... Dan van der Ster
09:57 AM Bug #18802: Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003:...
Excellent! Sorry that my search of the tracker didn't find that issue.
We'll apply that backport when it's ready.
Dan van der Ster
09:50 AM Bug #18802: Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc: 6003:...
duplicate of http://tracker.ceph.com/issues/18578. The fix is pending backport Zheng Yan
09:29 AM Bug #18802 (New): Jewel fuse client not connecting to new MDS after failover (was: mds/Server.cc:...
A user just did:... Dan van der Ster

02/02/2017

07:13 PM Bug #18798 (Resolved): FS activity hung, MDS reports client "failing to respond to capability rel...
I've had two occurrences in the past 3 weeks where filesystem activity hangs, with the MDS report a client "failing t... Darrell Enns
05:15 PM Bug #18797: valgrind jobs hanging in fs suite
They're going dead during teardown when teuthology tries to list cephtest/ but something went wrong much earlier with... John Spray
05:08 PM Bug #18797: valgrind jobs hanging in fs suite
Ignore the last passing link above, that was actually a hammer run (which shows up as fs-master for some reason).
...
John Spray
04:55 PM Bug #18797 (Duplicate): valgrind jobs hanging in fs suite

First one to show the issue was:
http://pulpito.ceph.com/teuthology-2017-01-28_17:15:01-fs-master---basic-smithi
...
John Spray
11:12 AM Bug #18754 (Fix Under Review): multimds: MDCache.cc: 8569: FAILED assert(!info.ancestors.empty())
https://github.com/ceph/ceph/pull/13227/commits/3b899fb0c6153c30ad6ef782499f79ba5c7b2a22 Zheng Yan
09:53 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
forgot to add a timeline:
08:08:29 mounted
08:09:54 iptables up
08:10:50 ls stuck
08:16:02 iptables down
08:17:5...
Henrik Korkuc
08:25 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
Attaching client and MDS logs. This time I was mounting from another server and firewalling both input and output to ... Henrik Korkuc
04:10 AM Bug #18755 (Fix Under Review): multimds: MDCache.cc: 4735: FAILED assert(in)
https://github.com/ceph/ceph/pull/13227/commits/097dc86b330392c4aa89f0033ce6e40528481857 Zheng Yan

02/01/2017

11:14 PM Bug #18757 (New): Jewel ceph-fuse does not recover after lost connection to MDS
Hmm, now that I actually read the log (like a reasonable person :-)) it is a little bit strange that the server is se... John Spray
10:30 PM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
Yes, I am reproducing it artificially after short network blip caused permanent mount point hangs of multiple servers... Henrik Korkuc
07:59 PM Bug #18757 (Rejected): Jewel ceph-fuse does not recover after lost connection to MDS
Clients which lose connectivity to the MDS are evicted after a timeout given by the "mds session timeout" setting. E... John Spray
12:26 PM Bug #18757 (Resolved): Jewel ceph-fuse does not recover after lost connection to MDS
After ceph-fuse loses connection to MDS for few minutes, it does not recover - accessing mountpoint hangs processes.
...
Henrik Korkuc
06:53 PM Feature #18763 (New): Ganesha instances should restart themselves when blacklisted

If I evict the client session that a ganesha instance was using, then the libcephfs instance will start to see sess...
John Spray
02:25 PM Bug #18759 (Fix Under Review): multimds suite tries to run norstats tests against kclient
https://github.com/ceph/ceph/pull/13089 John Spray
01:11 PM Bug #18759 (Resolved): multimds suite tries to run norstats tests against kclient

You get failures on the workunit like:
jspray-2017-02-01_02:51:18-multimds-wip-jcsp-testing-20170201-testing-basic...
John Spray
03:33 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
Zheng Yan
03:32 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
... Zheng Yan
01:11 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
There are actually 3 assertion failures (on 3 different MDS) in this run!... Patrick Donnelly
01:07 AM Bug #18755 (Resolved): multimds: MDCache.cc: 4735: FAILED assert(in)
... Patrick Donnelly
02:37 AM Bug #18717: multimds: FAILED assert(0 == "got export_cancel in weird state")
Zheng Yan
02:19 AM Bug #18717: multimds: FAILED assert(0 == "got export_cancel in weird state")
ceph-mds.a.log... Zheng Yan
12:55 AM Bug #18754 (Resolved): multimds: MDCache.cc: 8569: FAILED assert(!info.ancestors.empty())
... Patrick Donnelly

01/31/2017

01:34 PM Bug #17801 (Resolved): Cleanly reject "session evict" command when in replay
Nathan Cutler
01:34 PM Backport #18010 (Resolved): jewel: Cleanly reject "session evict" command when in replay
Nathan Cutler
01:25 PM Bug #17193: truncate can cause unflushed snapshot data lose
Re-enable test: https://github.com/ceph/ceph/pull/13200 Nathan Cutler
01:17 PM Bug #17193 (Resolved): truncate can cause unflushed snapshot data lose
Nathan Cutler
01:17 PM Backport #18103 (Resolved): jewel: truncate can cause unflushed snapshot data lose
Nathan Cutler
01:16 PM Bug #18408 (Resolved): lookup of /.. in jewel returns -ENOENT
Nathan Cutler
01:16 PM Backport #18413 (Resolved): jewel: lookup of /.. in jewel returns -ENOENT
Nathan Cutler
01:15 PM Bug #18519 (Resolved): speed up readdir by skipping unwanted dn
Nathan Cutler
01:15 PM Backport #18520 (Resolved): jewel: speed up readdir by skipping unwanted dn
Nathan Cutler
01:14 PM Backport #18565 (Resolved): jewel: MDS crashes on missing metadata object
Nathan Cutler
01:14 PM Backport #18551 (Resolved): jewel: ceph-fuse crash during snapshot tests
Nathan Cutler
01:12 PM Backport #18282 (Resolved): jewel: monitor cannot start because of "FAILED assert(info.state == M...
Nathan Cutler
01:12 PM Bug #18086 (Resolved): cephfs: fix missing ll_get for ll_walk
Nathan Cutler
01:11 PM Backport #18195 (Resolved): jewel: cephfs: fix missing ll_get for ll_walk
Nathan Cutler
01:11 PM Bug #17954 (Resolved): standby-replay daemons can sometimes miss events
Nathan Cutler
01:10 PM Backport #18192 (Resolved): jewel: standby-replay daemons can sometimes miss events
Nathan Cutler
01:01 PM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Current status of lab cluster is:
* Fixed the "missing dirfrag object" damage with a script that removed the offe...
John Spray
11:59 AM Bug #18743 (Resolved): Scrub considers dirty backtraces to be damaged, puts in damage table even ...

Two things are wrong here:
* When running scrub_path /teuthology-archive on the lab filesystem, I get a flurry ...
John Spray
07:40 AM Bug #9935 (Resolved): client: segfault on ceph_rmdir path "/"
Nathan Cutler
07:39 AM Backport #18611 (Resolved): jewel: client: segfault on ceph_rmdir path "/"
Nathan Cutler

01/30/2017

07:28 PM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Would that have caused ESTALE? Dan Mick
01:41 PM Bug #18730 (Closed): mds: backtrace issues getxattr for every file with cap on rejoin

In Server::handle_client_reconnect, a inode numbers that had client caps but were not in cache are passed into MDCa...
John Spray

01/28/2017

11:17 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Dan Mick wrote:
> [...]
teuthology-2016-12-18_02:01:14-rbd-master-distro-basic-smithi is not in root directory,...
Zheng Yan
04:37 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Tried them again tonight after repairing the broken stray object, and they worked this time. <shrug>
I guess the ...
Dan Mick

01/27/2017

10:55 PM Bug #17236 (Resolved): MDS goes damaged on blacklist (failed to read JournalPointer: -108 ((108) ...
Nathan Cutler
10:54 PM Bug #17832 (Resolved): "[ FAILED ] LibCephFS.InterProcessLocking" in jewel v10.2.4
Nathan Cutler
10:53 PM Bug #17270 (Resolved): [cephfs] fuse client crash when adding a new osd
Nathan Cutler
10:53 PM Bug #17275 (Resolved): MDS long-time blocked ops. ceph-fuse locks up with getattr of file
Nathan Cutler
10:52 PM Bug #17611 (Resolved): mds: false "failing to respond to cache pressure" warning
Nathan Cutler
10:51 PM Bug #17716 (Resolved): MDS: false "failing to respond to cache pressure" warning
Nathan Cutler
10:51 PM Bug #18131 (Resolved): ceph-fuse not clearing setuid/setgid bits on chown
Nathan Cutler
10:48 PM Bug #18361 (Resolved): Test failure: test_session_reject (tasks.cephfs.test_sessionmap.TestSessio...
Nathan Cutler
06:56 PM Bug #18717 (Resolved): multimds: FAILED assert(0 == "got export_cancel in weird state")
... Patrick Donnelly
06:31 PM Backport #18708 (Resolved): jewel: failed filelock.can_read(-1) assertion in Server::_dir_is_none...
https://github.com/ceph/ceph/pull/13459 Nathan Cutler
06:31 PM Backport #18707 (Resolved): kraken: failed filelock.can_read(-1) assertion in Server::_dir_is_non...
https://github.com/ceph/ceph/pull/13555 Nathan Cutler
06:31 PM Backport #18706 (Resolved): kraken: fragment space check can cause replayed request fail
https://github.com/ceph/ceph/pull/14568 Nathan Cutler
06:31 PM Backport #18705 (Resolved): jewel: fragment space check can cause replayed request fail
https://github.com/ceph/ceph/pull/14668 Nathan Cutler
04:44 PM Backport #18700: kraken: client: fix the cross-quota rename boundary check conditions
master one: https://github.com/ceph/ceph/pull/12489 Nathan Cutler
04:34 PM Backport #18700 (Resolved): kraken: client: fix the cross-quota rename boundary check conditions
https://github.com/ceph/ceph/pull/14567 John Spray
04:40 PM Bug #18660 (Pending Backport): fragment space check can cause replayed request fail
John Spray
04:38 PM Bug #11124 (Resolved): MDSMonitor: refuse to do "fs new" on metadata pools containing objects
John Spray
04:37 PM Bug #18578 (Pending Backport): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
John Spray
04:37 PM Backport #18699: jewel: client: fix the cross-quota rename boundary check conditions
master one: https://github.com/ceph/ceph/pull/12489 Nathan Cutler
04:34 PM Backport #18699 (Resolved): jewel: client: fix the cross-quota rename boundary check conditions
https://github.com/ceph/ceph/pull/14667 John Spray
01:18 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
They all return ESTALE. Not sure what else I need to be doing
Dan Mick
01:07 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
... Dan Mick
01:04 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Duh, ls was fine:
ls -ld * | sort -n -k 5
drwxrwxr-x 1 1001 1001 18446744057908416832 Jan 23 02:23 teuthology-201...
Dan Mick
01:00 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
I would have sworn Greg directed me to try that, but perhaps we didn't include 'force'. Shrug. Thanks for the help.... Dan Mick

01/26/2017

11:28 PM Bug #18691 (New): multimds thrash: FAILED assert(_head.empty())
... Patrick Donnelly
05:30 PM Backport #18100 (In Progress): jewel: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel ...
Here it is: https://github.com/ceph/ceph/pull/13139 John Spray
09:18 AM Backport #18100 (Need More Info): jewel: ceph-mon crashed after upgrade from hammer 0.94.7 to jew...
@John - In http://tracker.ceph.com/issues/17837#note-18 you mentioned that you backported this to jewel. If you still... Nathan Cutler
04:57 PM Bug #18661 (Fix Under Review): Test failure: test_open_inode
https://github.com/ceph/ceph/pull/13137 John Spray
04:16 PM Bug #18661: Test failure: test_open_inode
You're right, that piece of test code is racy :-/ Need to get the remote python code running inside open_background ... John Spray
09:14 AM Bug #18661: Test failure: test_open_inode
Zheng Yan
08:32 AM Bug #18661: Test failure: test_open_inode
... Zheng Yan
09:23 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Zheng Yan
07:13 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
"ceph daemon mds.mira049 scrub_path / repair recursive force" will find and fix any other issue. But it will take ver... Zheng Yan
07:06 AM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Fixed by:
ceph daemon mds.mira049 scrub_path /teuthology-archive/sage-2016-11-12_02:26:45-rados-wip-sage-testing--...
Zheng Yan
09:09 AM Backport #18192 (In Progress): jewel: standby-replay daemons can sometimes miss events
Nathan Cutler
09:08 AM Backport #18195 (In Progress): jewel: cephfs: fix missing ll_get for ll_walk
Nathan Cutler
09:03 AM Bug #18675 (Fix Under Review): client: during multimds thrashing FAILED assert(session->requests....
https://github.com/ceph/ceph/pull/13124 Zheng Yan
09:03 AM Backport #18282 (In Progress): jewel: monitor cannot start because of "FAILED assert(info.state =...
Nathan Cutler
06:28 AM Backport #18652 (Resolved): jewel: Test failure: test_session_reject (tasks.cephfs.test_sessionma...
Loïc Dachary
05:49 AM Bug #18680 (Resolved): multimds: cluster can assign active mds beyond max_mds during failures
From: http://pulpito.ceph.com/pdonnell-2017-01-25_22:42:21-multimds:thrash-wip-multimds-tests-testing-basic-mira/7483... Patrick Donnelly
04:18 AM Backport #18551 (In Progress): jewel: ceph-fuse crash during snapshot tests
Nathan Cutler
04:12 AM Backport #18565 (In Progress): jewel: MDS crashes on missing metadata object
Nathan Cutler

01/25/2017

11:52 PM Backport #18678: kraken: failed to reconnect caps during snapshot tests
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-07_17:15:02-fs-master---basic-smithi/698957/
Loïc Dachary
11:20 PM Backport #18678 (In Progress): kraken: failed to reconnect caps during snapshot tests
https://github.com/ceph/ceph/pull/13112 John Spray
11:18 PM Backport #18678 (Resolved): kraken: failed to reconnect caps during snapshot tests
https://github.com/ceph/ceph/pull/13112 John Spray
11:52 PM Backport #18679: jewel: failed to reconnect caps during snapshot tests
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-07_17:15:02-fs-master---basic-smithi/698957/ Loïc Dachary
11:23 PM Backport #18679 (In Progress): jewel: failed to reconnect caps during snapshot tests
https://github.com/ceph/ceph/pull/13113 John Spray
11:22 PM Backport #18679 (Resolved): jewel: failed to reconnect caps during snapshot tests
https://github.com/ceph/ceph/pull/13113 John Spray
11:35 PM Bug #18574 (Resolved): cephfs test failures (ceph.com/qa is broken, should be download.ceph.com/qa)
John Spray
11:34 PM Backport #18604 (Resolved): kraken: cephfs test failures (ceph.com/qa is broken, should be downlo...
John Spray
11:32 PM Bug #18309 (Resolved): TestVolumeClient.test_evict_client failure creating pidfile
John Spray
11:32 PM Backport #18439 (Resolved): kraken: TestVolumeClient.test_evict_client failure creating pidfile
John Spray
11:31 PM Backport #18540 (Resolved): kraken: Test failure: test_session_reject (tasks.cephfs.test_sessionm...
John Spray
11:29 PM Backport #18612 (Resolved): kraken: client: segfault on ceph_rmdir path "/"
John Spray
11:28 PM Backport #18531 (Resolved): kraken: speed up readdir by skipping unwanted dn
John Spray
11:27 PM Bug #18311 (Resolved): Decode errors on backtrace will crash MDS
John Spray
11:26 PM Backport #18463 (Resolved): kraken: Decode errors on backtrace will crash MDS
John Spray
10:11 PM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
I don't know how to repair this or even identify other instances. Dan Mick
06:57 PM Bug #16397: nfsd selinux denials causing knfs tests to fail
I'll plan to leave this open for the next week or two and we can see if any failures crop up between now and then. If... Jeff Layton
05:21 PM Bug #18675 (Resolved): client: during multimds thrashing FAILED assert(session->requests.empty())
... Patrick Donnelly
05:07 PM Backport #17705: jewel: ceph_volume_client: recovery of partial auth update is broken
The test commits from https://github.com/ceph/ceph-qa-suite/pull/1221 were moved into the main ceph/ceph.git PR after... Nathan Cutler
02:04 PM Backport #17705 (Resolved): jewel: ceph_volume_client: recovery of partial auth update is broken
John Spray
03:49 PM Bug #18670 (Duplicate): cephfs get strange permission deny when playing with git.
Aha, yes , remove the caps works.
Thanks for pointing me out such quick:)
Xiaoxi Chen
03:24 PM Bug #18670: cephfs get strange permission deny when playing with git.
It is likely that you are hitting http://tracker.ceph.com/issues/17858
Try modifying your client's auth caps to re...
John Spray
03:12 PM Bug #18670 (Duplicate): cephfs get strange permission deny when playing with git.
Can reproduce it really stable , via git pull <any_projct>. The failed file changed between each attempt, but always... Xiaoxi Chen
02:16 PM Bug #18646 (Fix Under Review): mds: rejoin_import_cap FAILED assert(session)
Zheng Yan
02:16 PM Bug #18646: mds: rejoin_import_cap FAILED assert(session)
https://github.com/ceph/ceph/pull/12974/commits/eef9e568dcccc0bb84322527c45028d3dc275c6b Zheng Yan
02:06 PM Backport #17974 (Resolved): jewel: ceph/Client segfaults in handle_mds_map when switching mds
John Spray
02:06 PM Bug #17858 (Resolved): Cannot create deep directories when caps contain "path=/somepath"
John Spray
02:05 PM Backport #18008 (Resolved): jewel: Cannot create deep directories when caps contain "path=/somepath"
John Spray
02:05 PM Bug #17216 (Resolved): ceph_volume_client: recovery of partial auth update is broken
John Spray
02:04 PM Backport #18615 (Resolved): jewel: segfault in handle_client_caps
John Spray
02:04 PM Bug #17800 (Resolved): ceph_volume_client.py : Error: Can't handle arrays of non-strings
John Spray
02:03 PM Backport #18026 (Resolved): jewel: ceph_volume_client.py : Error: Can't handle arrays of non-strings
John Spray
02:03 PM Bug #17798 (Resolved): Clients without pool-changing caps shouldn't be allowed to change pool_nam...
John Spray
02:03 PM Backport #17956 (Resolved): jewel: Clients without pool-changing caps shouldn't be allowed to cha...
John Spray
02:02 PM Backport #18603 (Resolved): jewel: cephfs test failures (ceph.com/qa is broken, should be downloa...
John Spray
01:59 PM Backport #18462 (Resolved): jewel: Decode errors on backtrace will crash MDS
John Spray
12:32 PM Bug #18663 (Fix Under Review): teuthology teardown hangs if kclient umount fails
https://github.com/ceph/ceph/pull/13099 John Spray
12:09 PM Bug #18663 (Resolved): teuthology teardown hangs if kclient umount fails

http://qa-proxy.ceph.com/teuthology/jspray-2017-01-25_02:52:36-multimds-wip-jcsp-testing-20170124-testing-basic-smi...
John Spray
10:53 AM Bug #18662: TestClientLimits hang on teuthology teardown
... John Spray
08:53 AM Bug #18662 (New): TestClientLimits hang on teuthology teardown
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-16_17:25:01-kcephfs-master-testing-basic-mira/722735/teutholog... Zheng Yan
08:43 AM Bug #18661 (Resolved): Test failure: test_open_inode
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-15_10:10:01-fs-jewel---basic-smithi/719557/ Zheng Yan
07:01 AM Bug #18660 (Fix Under Review): fragment space check can cause replayed request fail
https://github.com/ceph/ceph/pull/13095 Zheng Yan
06:57 AM Bug #18660 (Resolved): fragment space check can cause replayed request fail
Zheng Yan

01/24/2017

11:36 PM Bug #18600 (Fix Under Review): multimds suite tries to run quota tests against kclient, fails
https://github.com/ceph/ceph/pull/13089 John Spray
10:09 PM Bug #16397 (Fix Under Review): nfsd selinux denials causing knfs tests to fail
Patch to unpin from ubuntu, waiting for that test to go through https://github.com/ceph/ceph/pull/13088 John Spray
10:04 PM Bug #16397: nfsd selinux denials causing knfs tests to fail
... John Spray
09:41 PM Feature #16219: test: smallfile benchmark tool
Just saw this, I don't know teuthology but I wrote smallfile. Ben Turner in QE has automated Gluster performance reg... Ben England
04:14 PM Backport #18652 (In Progress): jewel: Test failure: test_session_reject (tasks.cephfs.test_sessio...
Nathan Cutler
02:21 PM Backport #18652 (Fix Under Review): jewel: Test failure: test_session_reject (tasks.cephfs.test_s...
https://github.com/ceph/ceph/pull/13085 John Spray
02:18 PM Backport #18652 (Resolved): jewel: Test failure: test_session_reject (tasks.cephfs.test_sessionma...
https://github.com/ceph/ceph/pull/13085 John Spray
02:18 PM Bug #18361: Test failure: test_session_reject (tasks.cephfs.test_sessionmap.TestSessionMap)
Oops, this needed backporting to jewel as well. John Spray
01:28 PM Bug #18646: mds: rejoin_import_cap FAILED assert(session)
I've been thinking a bit about how we handle eviction in the multimds case, and whether we perhaps ought to centraliz... John Spray
06:40 AM Bug #18646: mds: rejoin_import_cap FAILED assert(session)
Yes, it's reasonable. If client did not close the session volunteerly. It's likely the session was killed (due to tim... Zheng Yan
04:43 AM Bug #18646 (Resolved): mds: rejoin_import_cap FAILED assert(session)
... Patrick Donnelly

01/23/2017

09:25 PM Bug #18641 (Can't reproduce): mds: stalled clients apparently due to stale sessions
4/16 clients building the kernel with 2 active MDS blocked on IO. After digging into the ceph-fuse log, I found that ... Patrick Donnelly
05:35 PM Backport #18615: jewel: segfault in handle_client_caps
Thanks for the backport, Alexey! Nathan Cutler
05:34 PM Backport #18615 (In Progress): jewel: segfault in handle_client_caps
Nathan Cutler
08:44 AM Backport #18615 (Fix Under Review): jewel: segfault in handle_client_caps
Alexey Sheplyakov
08:43 AM Backport #18615: jewel: segfault in handle_client_caps
https://github.com/ceph/ceph/pull/13060 Alexey Sheplyakov
02:15 PM Bug #18600 (In Progress): multimds suite tries to run quota tests against kclient, fails
John Spray
02:13 PM Feature #18537: libcephfs cache invalidation upcalls
iirc, we still have the file handle cache, even if we disable caching in general, right?
Is it the case that we wo...
John Spray
11:54 AM Backport #18602 (Resolved): hammer: cephfs test failures (ceph.com/qa is broken, should be downlo...
Nathan Cutler
09:58 AM Bug #17370: knfs ffsb hang on master
new one http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-21_17:35:01-knfs-master-testing-basic-mira/736357/
...
Zheng Yan
06:00 AM Bug #17594 (In Progress): cephfs: permission checking not working (MDS should enforce POSIX permi...
Greg Farnum

01/22/2017

06:20 AM Bug #16768 (Fix Under Review): multimds: check_rstat assertion failure
https://github.com/ceph/ceph/pull/13052 Zheng Yan

01/20/2017

06:26 PM Bug #17531 (Resolved): mds fails to respawn if executable has changed
Patrick Donnelly
06:25 PM Bug #17670 (Resolved): multimds: mds entering up:replay and processing down mds aborts
Patrick Donnelly
06:24 PM Bug #17518 (Resolved): monitor assertion failure when deactivating mds in (invalid) fscid 0
Patrick Donnelly
05:36 PM Backport #18612 (In Progress): kraken: client: segfault on ceph_rmdir path "/"
Nathan Cutler
03:43 PM Backport #18612 (Resolved): kraken: client: segfault on ceph_rmdir path "/"
https://github.com/ceph/ceph/pull/13030 Nathan Cutler
05:16 PM Backport #18611 (In Progress): jewel: client: segfault on ceph_rmdir path "/"
Nathan Cutler
03:43 PM Backport #18611 (Resolved): jewel: client: segfault on ceph_rmdir path "/"
https://github.com/ceph/ceph/pull/13029 Nathan Cutler
05:03 PM Backport #18531 (In Progress): kraken: speed up readdir by skipping unwanted dn
Nathan Cutler
05:02 PM Backport #18531: kraken: speed up readdir by skipping unwanted dn
deleted description Nathan Cutler
03:42 PM Backport #18531 (New): kraken: speed up readdir by skipping unwanted dn
Nathan Cutler
04:12 PM Backport #18604 (In Progress): kraken: cephfs test failures (ceph.com/qa is broken, should be dow...
Nathan Cutler
03:41 PM Backport #18604 (Resolved): kraken: cephfs test failures (ceph.com/qa is broken, should be downlo...
https://github.com/ceph/ceph/pull/13024 Nathan Cutler
04:03 PM Backport #18603 (In Progress): jewel: cephfs test failures (ceph.com/qa is broken, should be down...
Nathan Cutler
03:41 PM Backport #18603 (Resolved): jewel: cephfs test failures (ceph.com/qa is broken, should be downloa...
https://github.com/ceph/ceph/pull/13023 Nathan Cutler
03:52 PM Backport #18602 (In Progress): hammer: cephfs test failures (ceph.com/qa is broken, should be dow...
Nathan Cutler
03:41 PM Backport #18602 (Resolved): hammer: cephfs test failures (ceph.com/qa is broken, should be downlo...
https://github.com/ceph/ceph/pull/13022 Nathan Cutler
03:45 PM Backport #18616 (Resolved): kraken: segfault in handle_client_caps
https://github.com/ceph/ceph/pull/14566 Nathan Cutler
03:45 PM Backport #18615 (Resolved): jewel: segfault in handle_client_caps
https://github.com/ceph/ceph/pull/13060 Nathan Cutler
03:12 PM Bug #18600 (Resolved): multimds suite tries to run quota tests against kclient, fails
http://pulpito.ceph.com/jspray-2017-01-19_17:59:40-multimds-wip-jcsp-testing-20170119b-testing-basic-smithi/731274/
...
John Spray
02:29 PM Bug #18487 (Resolved): Crash in MDCache::split_dir -- FAILED assert(dir->is_auth())
John Spray
02:24 PM Bug #16768: multimds: check_rstat assertion failure
Hmm, well it didn't grab the logs for some reason but I did get the crashing MDS's log before the test tore down. It... John Spray
12:27 PM Bug #16768: multimds: check_rstat assertion failure
I noticed that failure while the test was still stuck trying to unmount the kernel client, so I went in and killed th... John Spray
12:19 PM Bug #16768: multimds: check_rstat assertion failure
Another instance:
http://qa-proxy.ceph.com/teuthology/jspray-2017-01-19_17:59:40-multimds-wip-jcsp-testing-20170119b...
John Spray
12:29 PM Bug #8090 (Duplicate): multimds: mds crash in check_rstats
This is very similar to #16768 (which is happening in more recent runs), so I'm going to close this. John Spray

01/19/2017

08:16 AM Bug #9935 (Pending Backport): client: segfault on ceph_rmdir path "/"
John Spray
08:10 AM Bug #8808 (Resolved): multimds: stale NFS file handle on delete
https://github.com/ceph/ceph/commit/530bd6e67a0842733edff7ac036f18ed990788f9 Zheng Yan
07:52 AM Bug #4829: client: handling part of MClientForward incorrectly?
I can't see why we shouldn't encode cap releases after receiving MClientForward. The old releases is for the old MDS,... Zheng Yan
07:31 AM Bug #18487 (Fix Under Review): Crash in MDCache::split_dir -- FAILED assert(dir->is_auth())
https://github.com/ceph/ceph/pull/12994 Zheng Yan
05:18 AM Bug #18589 (Duplicate): ceph_volume_client.py doesn't create enough mds caps
Assuming that you encounted this issue with kernel client, the bug was http://tracker.ceph.com/issues/17191, which wa... John Spray
02:12 AM Bug #18578 (Fix Under Review): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
https://github.com/ceph/ceph/pull/12973 Zheng Yan

01/18/2017

06:37 PM Bug #18589: ceph_volume_client.py doesn't create enough mds caps
This report applies to mounting with the kernel, not ceph-fuse, right? I think that makes this a kernel issue where i... Greg Farnum
06:02 PM Bug #18589: ceph_volume_client.py doesn't create enough mds caps
Pull request at https://github.com/ceph/ceph/pull/12985 Huamin Chen
05:48 PM Bug #18589 (Duplicate): ceph_volume_client.py doesn't create enough mds caps
In _authorize_ceph() at https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L1032, the caps is ... Huamin Chen
02:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
I'll take this one. Patrick Donnelly
09:23 AM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
Logs in /home/jspray/18579 (should be world readable) on teuthology John Spray
08:58 AM Bug #18579 (Resolved): Fuse client has "opening" session to nonexistent MDS rank after MDS cluste...
... John Spray
08:30 AM Bug #18578 (Resolved): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
An user said he encountered this assertion while running vdbench Zheng Yan

01/17/2017

05:38 PM Bug #18574 (Pending Backport): cephfs test failures (ceph.com/qa is broken, should be download.ce...
Greg Farnum
04:16 PM Bug #18574 (Fix Under Review): cephfs test failures (ceph.com/qa is broken, should be download.ce...
https://github.com/ceph/ceph/pull/12964 John Spray
04:14 PM Bug #18574 (Resolved): cephfs test failures (ceph.com/qa is broken, should be download.ceph.com/qa)
Like http://tracker.ceph.com/issues/18542 but for the remaining references to ceph.com. John Spray
03:13 PM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
maybe there is a bad remote link in the directory Zheng Yan
11:13 AM Bug #16842: mds: replacement MDS crashes on InoTable release
Hmm, this one got lost somehow, targeting to Luminous so that it gets at least looked at. John Spray
08:35 AM Backport #18566 (Resolved): kraken: MDS crashes on missing metadata object
https://github.com/ceph/ceph/pull/14565 Nathan Cutler
08:35 AM Backport #18565 (Resolved): jewel: MDS crashes on missing metadata object
https://github.com/ceph/ceph/pull/13119 Nathan Cutler
08:35 AM Backport #18562 (Resolved): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
https://github.com/ceph/ceph/pull/14564 Nathan Cutler
08:34 AM Backport #18552 (Resolved): kraken: ceph-fuse crash during snapshot tests
https://github.com/ceph/ceph/pull/14563 Nathan Cutler
08:34 AM Backport #18551 (Resolved): jewel: ceph-fuse crash during snapshot tests
https://github.com/ceph/ceph/pull/13120 Nathan Cutler

01/16/2017

09:45 PM Backport #18540 (Fix Under Review): kraken: Test failure: test_session_reject (tasks.cephfs.test_...
John Spray
09:45 PM Backport #18540 (Resolved): kraken: Test failure: test_session_reject (tasks.cephfs.test_sessionm...
https://github.com/ceph/ceph/pull/12951 John Spray
03:09 PM Bug #18363 (Can't reproduce): Test failure: test_ops_throttle (tasks.cephfs.test_strays.TestStrays)
This appears to have been something intermittent, and the deletion code is going to change anyway so I'm going to ski... John Spray
03:08 PM Bug #18362 (Duplicate): Test failure: test_evict_client (tasks.cephfs.test_volume_client.TestVolu...
John Spray
02:18 PM Feature #18537: libcephfs cache invalidation upcalls
My intent in this experiment was, yes, that upcalls be synchronous. To be brief I'd say that my understanding was, if... Matt Benjamin
12:42 PM Feature #18537: libcephfs cache invalidation upcalls
This really seems like the wrong approach to me. Is this callback going to be synchronous or async? Imagine you get a... Jeff Layton
12:24 PM Feature #18537 (Rejected): libcephfs cache invalidation upcalls
Matt Benjamin did some work in this area:
https://github.com/linuxbox2/nfs-ganesha/tree/ceph-invalidates
https://...
John Spray
12:24 PM Feature #18490: client: implement delegation support in userland cephfs
Ah, I think I had (incorrectly) assumed that the work Matt did on invalidations before had been merged, but if that's... John Spray
12:19 PM Bug #18530: ceph tell mds prints warning about ms_handle_reset
I think the OSD does do roughly the same sort of thing (hidden inside Objecter), so it might be instructive to look a... John Spray
12:15 PM Bug #18532: mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirstat is...
Without having looked into this in detail yet, my presumption would be that the bug is that the repair code isn't fix... John Spray

01/15/2017

03:39 PM Feature #17855 (Fix Under Review): Don't evict a slow client if it's the only client
https://github.com/ceph/ceph/pull/12935 Michal Jarzabek

01/14/2017

01:33 AM Bug #18532 (New): mds: forward scrub failing to repair dir stats (was: subdir with corrupted dirs...
Somehow a path in the long-running cluster got a corrupted number of files/subdirs, and responds to "rm -rf" with "ca... Dan Mick
12:42 AM Backport #18531 (Resolved): kraken: speed up readdir by skipping unwanted dn
https://github.com/ceph/ceph/pull/13028 Xiaoxi Chen
12:28 AM Feature #18514: qa: don't use a node for each kclient
Probably, though I'm not really sure. I just found https://github.com/ceph/ceph-qa-suite/pull/1156/ in the depths of ... Greg Farnum
12:26 AM Bug #17193: truncate can cause unflushed snapshot data lose
So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310... Greg Farnum
12:00 AM Bug #18530 (New): ceph tell mds prints warning about ms_handle_reset
Client::ms_handle_reset logs at level 0, and ceph tell mds seems to always print two of those log messages. I *think... Dan Mick

01/13/2017

04:00 PM Feature #18490: client: implement delegation support in userland cephfs
Matt B. also had some upcall/invalidate work that may be relevant here that he has in these branches:
https://gith...
Jeff Layton
01:39 PM Backport #18520 (In Progress): jewel: speed up readdir by skipping unwanted dn
Xiaoxi Chen
01:30 PM Backport #18520 (Resolved): jewel: speed up readdir by skipping unwanted dn
https://github.com/ceph/ceph/pull/12921 Xiaoxi Chen
01:29 PM Bug #18519: speed up readdir by skipping unwanted dn
https://github.com/ceph/ceph/pull/12870 Xiaoxi Chen
01:29 PM Bug #18519 (Resolved): speed up readdir by skipping unwanted dn
we hit MDS CPU bottleneck (100% on one core as it is single thread) in our cephFS production enviroment.
Troublesh...
Xiaoxi Chen
11:46 AM Feature #18514: qa: don't use a node for each kclient
What's the proposed solution here? To isolate the tests that require killing mounts in a directory with a different ... John Spray
07:07 AM Feature #18514 (Resolved): qa: don't use a node for each kclient
https://github.com/ceph/ceph-qa-suite/pull/1156/commits/c5f6dfc14f47cca251dcac5c53f6369fd36ace1a
Right now each ke...
Greg Farnum
11:24 AM Bug #18396 (Pending Backport): Test Failure: kcephfs test_client_recovery.TestClientRecovery
John Spray
11:23 AM Bug #18306 (Pending Backport): segfault in handle_client_caps
John Spray
11:23 AM Bug #18361 (Pending Backport): Test failure: test_session_reject (tasks.cephfs.test_sessionmap.Te...
John Spray
11:22 AM Bug #18179 (Pending Backport): MDS crashes on missing metadata object
John Spray
11:22 AM Bug #18460 (Pending Backport): ceph-fuse crash during snapshot tests
John Spray
02:59 AM Feature #18513 (Resolved): MDS: scrub: forward scrub reports missing backtraces on new files as d...
It appears that running a recursive repair scrub_path results in reports of MDS damage on files that are new enough t... Greg Farnum
01:09 AM Feature #18509: MDS: damage reporting by ino number is useless
Path string is certainly the one I was thinking of. Greg Farnum
12:58 AM Feature #18509: MDS: damage reporting by ino number is useless

The log message reporting the path is still there:...
John Spray

01/12/2017

11:30 PM Feature #18509 (Resolved): MDS: damage reporting by ino number is useless
We had two damaged directories on the long-running cluster, but examining the directories in question (other than thr... Greg Farnum
10:54 PM Feature #18490: client: implement delegation support in userland cephfs
This is basically what we've discussed previously in this area. My main concern is just designing an interface that c... Greg Farnum
10:51 PM Bug #18461: failed to reconnect caps during snapshot tests
Greg Farnum

01/11/2017

08:07 PM Feature #18490: client: implement delegation support in userland cephfs
I've created an nfs-ganesha category to match our Samba category. John Spray
05:52 PM Feature #18490 (Resolved): client: implement delegation support in userland cephfs
To properly implement NFSv4 delegations in ganesha, we need something that operates a little like Linux's fcntl(..., ... Jeff Layton
05:49 PM Feature #11950 (Fix Under Review): Strays enqueued for purge cause MDCache to exceed size limit
https://github.com/ceph/ceph/pull/12786 John Spray
05:00 PM Feature #18489 (New): mds: Multi-MDS-aware dirfrag split/join test
Similar to the existing dirfrag tests, but do some import/exporting of the resulting fragments so that they span mult... John Spray
01:04 PM Bug #18487 (Resolved): Crash in MDCache::split_dir -- FAILED assert(dir->is_auth())
... John Spray
12:40 PM Feature #18477: O_TMPFILE support in libcephfs
Yeah, with Linux' O_TMPFILE you can definitely do I/O to the inode before it's linked, and I think it'd be good to mi... Jeff Layton
11:49 AM Feature #18477: O_TMPFILE support in libcephfs
I was assuming that when doing it ephemerally we would not be allowing any data IO operations on the inode until it w... John Spray
11:50 AM Feature #18483: Forward scrub ops are not in Op Tracker
See also http://tracker.ceph.com/issues/17852 John Spray
01:22 AM Feature #18483 (New): Forward scrub ops are not in Op Tracker
We started a forward scrub on the LRC today to look for any busted rstats since we appear to have leaked data somewhe... Greg Farnum

01/10/2017

10:38 PM Feature #18477: O_TMPFILE support in libcephfs
I'm pretty skeptical that doing it ephemerally (without initially setting it up as a journaled stray) is a feasible s... Greg Farnum
08:50 PM Feature #18477: O_TMPFILE support in libcephfs
The stray would end up getting journaled, probably never written to backing store as long as the link operation came ... John Spray
07:19 PM Feature #18477: O_TMPFILE support in libcephfs
I think it makes sense to optimize for the success case here. In most cases, the link will be successful and it'll en... Jeff Layton
07:12 PM Feature #18477: O_TMPFILE support in libcephfs
Main decision here is probably whether it should be a stray or some new mechanism.
Strays feel like overkill here ...
John Spray
05:39 PM Feature #18477 (New): O_TMPFILE support in libcephfs
nfs-ganesha could make use of the ability to create a disconnected inode (pinned only by an open file descriptor) tha... Jeff Layton
02:40 PM Feature #18475 (Resolved): qa: run xfstests in the nightlies
We have manually run xfstests against ceph-fuse and kceph before, but apparently don't do so in the nightlies. Jeff r... Greg Farnum
02:16 PM Support #16526 (Resolved): cephfs client side quotas - nfs-ganesha
Yep, I think so. John Spray
01:53 PM Support #16526: cephfs client side quotas - nfs-ganesha
In a Ganesha (V2.5-dev-6) and Ceph (latest Jewel) setup, I set `client quota = true` in the client section of the cep... Ramana Raja
09:45 AM Bug #18460 (Fix Under Review): ceph-fuse crash during snapshot tests
https://github.com/ceph/ceph/pull/12859 Zheng Yan
03:40 AM Bug #18461 (Fix Under Review): failed to reconnect caps during snapshot tests
Zheng Yan
03:40 AM Bug #18461: failed to reconnect caps during snapshot tests
https://github.com/ceph/ceph/pull/12852 Zheng Yan

01/09/2017

01:58 PM Backport #18462 (In Progress): jewel: Decode errors on backtrace will crash MDS
Nathan Cutler
11:21 AM Backport #18462 (Resolved): jewel: Decode errors on backtrace will crash MDS
https://github.com/ceph/ceph/pull/12836 Nathan Cutler
01:57 PM Backport #18463 (In Progress): kraken: Decode errors on backtrace will crash MDS
Nathan Cutler
11:21 AM Backport #18463 (Resolved): kraken: Decode errors on backtrace will crash MDS
https://github.com/ceph/ceph/pull/12835 Nathan Cutler
01:02 PM Bug #18396 (Fix Under Review): Test Failure: kcephfs test_client_recovery.TestClientRecovery
kernel_mount.py does implement force umount
https://github.com/ceph/ceph/pull/12833
Zheng Yan
08:36 AM Bug #18396: Test Failure: kcephfs test_client_recovery.TestClientRecovery
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-05_11:20:01-kcephfs-kraken-testing-basic-smithi/691532/
http:...
Zheng Yan
11:04 AM Bug #18311 (Pending Backport): Decode errors on backtrace will crash MDS
John Spray
08:31 AM Bug #18461 (Resolved): failed to reconnect caps during snapshot tests
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-07_17:15:02-fs-master---basic-smithi/698957/ Zheng Yan
07:47 AM Bug #18460 (Resolved): ceph-fuse crash during snapshot tests
http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-05_11:10:02-fs-kraken---basic-smithi/691432/teuthology.log
...
Zheng Yan

01/08/2017

08:04 PM Bug #11124 (Fix Under Review): MDSMonitor: refuse to do "fs new" on metadata pools containing obj...
https://github.com/ceph/ceph/pull/12825 Michal Jarzabek

01/06/2017

03:42 PM Backport #18439 (In Progress): kraken: TestVolumeClient.test_evict_client failure creating pidfile
Nathan Cutler
03:40 PM Backport #18439 (Resolved): kraken: TestVolumeClient.test_evict_client failure creating pidfile
https://github.com/ceph/ceph/pull/12813 Nathan Cutler
03:29 PM Bug #18309 (Pending Backport): TestVolumeClient.test_evict_client failure creating pidfile
Kefu Chai
07:59 AM Bug #18306 (Fix Under Review): segfault in handle_client_caps
https://github.com/ceph/ceph/pull/12808 Zheng Yan
 

Also available in: Atom