Bug #12869
override debug settings are not being applied
0%
Description
commit aa84941cf9fb1c52c8992da21be157e70fe99b98 Author: John Spray <john.spray@redhat.com> Date: Thu Aug 13 19:08:16 2015 +0100 tasks/kcephfs: enable MDS debug To help us debug #11482 Signed-off-by: John Spray <john.spray@redhat.com> diff --git a/suites/kcephfs/cephfs/conf.yaml b/suites/kcephfs/cephfs/conf.yaml index 30da870..b3ef404 100644 --- a/suites/kcephfs/cephfs/conf.yaml +++ b/suites/kcephfs/cephfs/conf.yaml @@ -3,3 +3,5 @@ overrides: conf: global: ms die on skipped message: false + mds: + debug mds: 20
applied in
commit 641169f2542d8fa23c1452b53288fe732be74503 Merge: 48a8b23 aa84941 Author: Yan, Zheng <ukernel@gmail.com> Date: Tue Aug 18 17:38:34 2015 +0800 Merge pull request #531 from ceph/wip-mds-debug tasks/kcephfs: enable MDS debug
Is still not showing up in the kcephfs suite runs. I believe these should be auto-updating so I'm not sure how it could be failing. For instance, http://pulpito.ceph.com/teuthology-2015-08-24_23:08:02-kcephfs-master-testing-basic-multi/1030719/ does not have any mention of "debug mds". That's a master run. I've checked the master branch and the yaml fragment is good:
gregf@rex004:~/src/ceph-qa-suite [master]$ cat suites/kcephfs/cephfs/conf.yaml overrides: ceph: conf: global: ms die on skipped message: false mds: debug mds: 20
The only thing I can think of is that there are override stanzas in other conf.yaml files which are also being included in the run (from different fragment directories).
History
#1 Updated by Zack Cerza over 8 years ago
teuthology-suite --dry-run -v -s kcephfs:cephfs -l 1 [...] 2015-08-31 11:28:03,236.236 INFO:teuthology.suite:dry-run: /Users/zack/inkdev/teuthology/virtualenv/bin/teuthology-schedule --name zack-2015-08-31_11:27:58-kcephfs:cephfs-master---basic-magna --num 1 --worker magna --priority 1000 -v --description 'kcephfs:cephfs/{conf.yaml clusters/fixed-3-cephfs.yaml fs/btrfs.yaml inline/no.yaml tasks/kclient_workunit_direct_io.yaml}' -- /var/folders/lh/723f8c417xz2n8dfzjqnmk3c0000gn/T/schedule_suite_bco4er /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/clusters/fixed-3-cephfs.yaml /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/conf.yaml /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/fs/btrfs.yaml /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/inline/no.yaml /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/tasks/kclient_workunit_direct_io.yaml
conf.yaml is indeed being ignored because of fs/btrfs.yaml.
When multiple fragments are specified, they are concatenated and then parsed by PyYAML (as opposed to being parsed first, then deep-merged). Changing this would probably break lots of things :)
#2 Updated by Zack Cerza over 8 years ago
- Status changed from New to Won't Fix
So, it would be best to merge your overrides by hand, I think.
#3 Updated by Greg Farnum over 8 years ago
- Status changed from Won't Fix to 12
- Priority changed from Normal to Urgent
Copying my argument from irc:
[18:49:59] <gregsfortytwo> if you can't have two overrides the whole fragment system breaks down [18:50:40] <gregsfortytwo> we need to override the ceph.conf default values for all kinds of stuff [18:50:59] <vasu> but it can be still done with one overrides, this would be less confusing? [18:51:11] <gregsfortytwo> having *conflicting* overrides would be user error, and the system could barf on that and it would be fine/good [18:51:24] <gregsfortytwo> but right now it's apparently silently discarding one of the override values [18:51:32] <gregsfortytwo> vasu: no, it can't be done with one override [18:51:41] <vasu> its a static file? [18:52:04] <gregsfortytwo> I can't say "this suite needs a different value for foo than the ceph task normally does, and this one workload also needs a different value for bar" [18:52:10] <gregsfortytwo> in only one file [18:52:44] <gregsfortytwo> I could push the "foo" override into all the workload folder yamls and that would work [18:53:07] <gregsfortytwo> but then if I also need to specify a different value for eg the btrfs-backed OSDs, it breaks down and I'd need to put them in different suites or something
And I've also discovered that most of the suites include a msgr-failures folder that sets override values, and then have overrides in the workloads folder. If these two override stanzas aren't merging we have a serious problem and it needs to be fixed in teuthology to do a proper merge. (I'm not sure if that's actually the case or if this is insufficiently diagnosed.)
#4 Updated by Zack Cerza over 8 years ago
Upon closer inspection with a hacked teuthology-suite
:
2015-08-31 12:28:11,913.913 INFO:teuthology.suite:Scheduling kcephfs:cephfs/{conf.yaml clusters/fixed-3-cephfs.yaml fs/btrfs.yaml inline/no.yaml tasks/kclient_workunit_direct_io.yaml} ['/Users/zack/inkdev/teuthology/virtualenv/bin/teuthology-schedule', '--name', 'zack-2015-08-31_12:28:08-kcephfs:cephfs-master---basic-magna', '--num', '1', '--worker', 'magna', '--dry-run', '--priority', '1000', '-v', '--description', 'kcephfs:cephfs/{conf.yaml clusters/fixed-3-cephfs.yaml fs/btrfs.yaml inline/no.yaml tasks/kclient_workunit_direct_io.yaml}', '--', '/var/folders/lh/723f8c417xz2n8dfzjqnmk3c0000gn/T/schedule_suite_NerDkv', '/Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/clusters/fixed-3-cephfs.yaml', '/Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/conf.yaml', '/Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/fs/btrfs.yaml', '/Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/inline/no.yaml', '/Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/tasks/kclient_workunit_direct_io.yaml'] {'branch': 'master', 'description': 'kcephfs:cephfs/{conf.yaml clusters/fixed-3-cephfs.yaml fs/btrfs.yaml inline/no.yaml tasks/kclient_workunit_direct_io.yaml}', 'email': 'zack@redhat.com', 'last_in_suite': False, 'log-rotate': {'ceph-mds': '10G', 'ceph-osd': '10G'}, 'machine_type': 'magna', 'name': 'zack-2015-08-31_12:28:08-kcephfs:cephfs-master---basic-magna', 'nuke-on-error': True, 'overrides': {'admin_socket': {'branch': 'master'}, 'ceph': {'conf': {'global': {'ms die on skipped message': False}, 'mds': {'debug mds': 20}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20}, 'osd': {'debug filestore': 20, 'debug journal': 20, 'debug ms': 1, 'debug osd': 20, 'osd op thread timeout': 60, 'osd sloppy crc': True}}, 'fs': 'btrfs', 'log-whitelist': ['slow request'], 'sha1': '6dc9ed581441aade22750d1eb541cdbeddeb37d2'}, 'ceph-deploy': {'branch': {'dev-commit': '6dc9ed581441aade22750d1eb541cdbeddeb37d2'}, 'conf': {'client': {'log file': '/var/log/ceph/ceph-$name.$pid.log'}, 'mon': {'debug mon': 1, 'debug ms': 20, 'debug paxos': 20, 'osd default pool size': 2}}}, 'install': {'ceph': {'sha1': '6dc9ed581441aade22750d1eb541cdbeddeb37d2'}}, 'workunit': {'sha1': '6dc9ed581441aade22750d1eb541cdbeddeb37d2'}}, 'owner': 'scheduled_zack@zwork.local', 'priority': 1000, 'roles': [['mon.a', 'mds.a', 'osd.0', 'osd.1'], ['mon.b', 'mds.a-s', 'mon.c', 'osd.2', 'osd.3'], ['client.0']], 'sha1': '6dc9ed581441aade22750d1eb541cdbeddeb37d2', 'suite': 'kcephfs:cephfs', 'suite_branch': 'master', 'tasks': [{'ansible.cephlab': None}, {'clock.check': None}, {'install': None}, {'ceph': None}, {'kclient': None}, {'workunit': {'clients': {'all': ['direct_io']}}}], 'teuthology_branch': 'master', 'tube': 'magna', 'verbose': True} 2015-08-31 12:28:12,239.239 INFO:teuthology.suite:Suite kcephfs:cephfs in /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs scheduled 1 jobs. 2015-08-31 12:28:12,239.239 INFO:teuthology.suite:Suite kcephfs:cephfs in /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs -- 23 jobs were filtered out. 2015-08-31 12:28:12,239.239 INFO:teuthology.suite:Suite kcephfs:cephfs in /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs scheduled 0 jobs with missing packages. 2015-08-31 12:28:12,502.502 INFO:teuthology.suite:Test results viewable at http://pulpito.ceph.com/zack-2015-08-31_12:28:08-kcephfs:cephfs-master---basic-magna/ (virtualenv)12:28:12 zack@zwork teuthology master ? cat /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/conf.yaml overrides: ceph: conf: global: ms die on skipped message: false mds: debug mds: 20 (virtualenv)12:29:05 zack@zwork teuthology master ? cat /Users/zack/src/ceph-qa-suite_master/suites/kcephfs/cephfs/fs/btrfs.yaml overrides: ceph: fs: btrfs conf: osd: osd sloppy crc: true osd op thread timeout: 60
This actually appears to be working as you want it to. I just scheduled a job with a stock teuthology-suite
:teuthology-suite -v -s kcephfs:cephfs -l 1
Here it is in pulpito:
http://pulpito.ceph.com/zack-2015-08-31_11:33:52-kcephfs:cephfs-master---basic-multi/1039781/
Here is its overrides stanza:
overrides: admin_socket: branch: master ceph: conf: global: ms die on skipped message: false mds: debug mds: 20 mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 osd op thread timeout: 60 osd sloppy crc: true fs: btrfs log-whitelist: - slow request sha1: 6dc9ed581441aade22750d1eb541cdbeddeb37d2 ceph-deploy: branch: dev-commit: 6dc9ed581441aade22750d1eb541cdbeddeb37d2 conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 6dc9ed581441aade22750d1eb541cdbeddeb37d2 workunit: sha1: 6dc9ed581441aade22750d1eb541cdbeddeb37d2
This of course does not explain why the correct bits didn't get included on the particular job which inspired this ticket. I would be curious to see the teuthology-suite
line which scheduled that job.
#5 Updated by Zack Cerza over 8 years ago
I just noticed that this was a scheduled rados run and those are scheduled using teuthology-suite --subset
. At this point this really seems like a bug in --subset
.
#6 Updated by Zack Cerza over 8 years ago
I just filed a teuthology PR (https://github.com/ceph/teuthology/pull/607) to give teuthology-suite
a -vv
(double-verbose) mode, which causes teuthology-schedule --dry-run
to be run for each generated job in the suite.
Here is its output for teuthology-suite --dry-run -vv -s kcephfs:cephfs
:
http://fpaste.org/261703/14410566/
#7 Updated by Zack Cerza over 8 years ago
- Status changed from 12 to Need More Info
- Assignee set to Greg Farnum
Greg, the feature I mentioned in the previous comment is merged into master. Please see if you can reproduce the problem using that.
#8 Updated by Greg Farnum over 8 years ago
- Assignee changed from Greg Farnum to Zack Cerza
Well, I tried to on three different machines.
rex004 hangs when I execute it, I think because it doesn't have a VPN connection to sepia and so can't reach the lock server.
Sepia's teuthology box seems to have a version of teuthology from June, and when I attempted to update everything barfed because it wanted libpython-dev (which doesn't seem to exist for its Ubuntu, although python-dev is installed).
magna002 failed on some stupid thing but started working when I updated, deleted the virtualenv, and recreated it. I get similar/the same output to what you do. I also tried checking out the version of teuthology that sepia's machine seems to be running and scheduled a single job, and things looked fine. (This was my only idea as to what might be broken.)
So at this point all I can say is yes, the things you are showing me here seem to indicate that it's working. But nonetheless, when running scheduled jobs we are not getting "debug mds: 20" included in the configs. We need it to do so. Perhaps the sepia teuthology box is not actually grabbing the newest ceph-qa-suite checkouts to schedule against? But if I look at /home/teuthology/src/ceph-qa-suite_master it's currently on
commit 8331556b9b8f947319433b6a0bb234088ba073c0 Merge: 5df0ceb e8d4cf1 Author: David Zafman <dzafman@redhat.com> Date: Tue Sep 1 12:27:12 2015 -0700
which looks to be the newest one. Despite that, looking at http://pulpito.ceph.com/teuthology-2015-08-31_23:08:01-kcephfs-master-testing-basic-multi/ (the newest one in pulpito) we still aren't getting the "debug mds" line included. I would wonder if I had done something truly stupid like put it in the wrong branch or suite or something if all our other attempts to reproduce it elsewhere weren't behaving as expected...
#9 Updated by Greg Farnum over 8 years ago
- Status changed from Need More Info to Rejected
Ugh. Okay, the last run that didn't have the debug mds settings I was interested in...didn't have that yaml fragment included, by virtue of being in a different subsuite. I should have spotted this earlier but was primed for it to be broken because it also took a long time from us including the yaml fragment to it being included in a suite result, but I think that's just the unfortunately long queue times we've got going on right now.