Feature #9094
closedceph: test different journal modes (file, aio file, block device) modes in rados suite
100%
Updated by Tamilarasi muthamizhan over 9 years ago
- Target version set to sprint13
Updated by Anonymous over 9 years ago
It appears that there is already a rados test for aio journals and a test for file journals.
suites/rados/objectstore/filestore-idempotent-aio-journal.yaml
suites/rados/objectstore/filejournal.yaml
It also appears that block journal support exists in tasks/ceph.py
So I think what needs to be added here is block device journal test, unless that is a default in which case, all the code is probably here.
Updated by Anonymous over 9 years ago
- Target version changed from sprint13 to sprint14
Updated by Anonymous over 9 years ago
I am not entirely convince that aio journalling is working. Running the following yaml
roles: - [mon.0, osd.0, osd.1, client.0] tasks: - install: - ceph: conf: global: journal aio: true - filestore_idempotent:
generated the following messages on teuthology.log
2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-09-17T18:11:19.269 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.269631 7ffa5ac61780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
It also appears that the checking of this journalling parameter is not complete. The same messages were generated when the 'journal aio:true' line was replace with
'journal aardvark:true'
Updated by Anonymous over 9 years ago
I am not entirely convince that aio journalling is working. Running the following yaml
roles: - [mon.0, osd.0, osd.1, client.0] tasks: - install: - ceph: conf: global: journal aio: true - filestore_idempotent:
generated the following messages on teuthology.log
2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-09-17T18:11:19.269 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.269631 7ffa5ac61780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
It also appears that the checking of this journalling parameter is not complete. The same messages were generated when the 'journal aio:true' line was replace with
'journal aardvark:true'
Updated by Anonymous over 9 years ago
It appears that we currently have suites that actually check the different journal modes, but they are not organized as one journal suite. The dates of the checkins of some of the files seem to be just after this ticket was opened, so I suspect these are possible implementations of this testing:
For Aio testing, ceph-qa-suite/suites/rados/objectstore/filestore-idempotent-aio-journal.yaml
contains the following
roles: - [mon.0, osd.0, osd.1, client.0] tasks: - install: - ceph: conf: global: journal aio: true - filestore_idempotent:
Running this yaml on teuthology leads to the follow output:
2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway . . .
This terminated with a pass but I am not sure how much was tested.
The raw disk I/O journaling was handled in ceph-qa-suite/suites/rados/objectstore/filejournal.yaml in this snippet:
roles: - [mon.0, osd.0, osd.1, client.0] tasks: - install: - ceph: - exec: client.0: - ceph_test_filejournal
Using this yaml in a teuthology run produced a log that contained the following lines:
2014-09-19T13:06:27.590 INFO:teuthology.orchestra.run.vpm134:Running: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_filejournal' 2014-09-19T13:06:27.868 INFO:teuthology.orchestra.run.vpm134.stdout:path /tmp/ceph_test_filejournal.tmp.2029965701 2014-09-19T13:06:27.868 INFO:teuthology.orchestra.run.vpm134.stdout:DIRECTIO OFF AIO OFF 2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[==========] Running 12 tests from 1 test case. 2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[----------] Global test environment set-up. 2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[----------] 12 tests from TestFileJournal 2014-09-19T13:06:27.870 INFO:teuthology.orchestra.run.vpm134.stdout:[ RUN ] TestFileJournal.Create 2014-09-19T13:06:27.887 INFO:teuthology.orchestra.run.vpm134.stdout:[ OK ] TestFileJournal.Create (20 ms) 2014-09-19T13:06:27.888 INFO:teuthology.orchestra.run.vpm134.stdout:[ RUN ] TestFileJournal.WriteSmall
I concluded that this test is probably exercising what was intended and is therefore complete.
For file-system I/O, ceph-qa-suites/fs/basic/tasks/cephffs_journal_tool.yaml looks like:
tasks: - ceph-fuse: - workunit: clients: all: [fs/misc/trivial_sync.sh] - ceph-fuse: client.0: mounted: false - ceph.stop: [mds.*] - workunit: clients: client.0: [suites/cephfs_journal_tool_smoke.sh] - ceph.restart: [mds.*] - ceph-fuse: client.0: mounted: true - workunit: clients: all: [fs/misc/trivial_sync.sh]
and produces a run that takes hours (maybe forever), with repeated lines that look like:
2014-09-19 20:17:27,610.610 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0" 2014-09-19 20:17:32,622.622 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0" 2014-09-19 20:17:37,639.639 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0"
The %T looks suspicious to me. It is quite possible that I need to tweek this yaml file
a bit to get this to run.
Conclusion: All the code exists (both in ceph-qa-suite, and teuthology). The ceph block device write test looks good. The aio test may or may not be good (I need to figure if the
use aio comment is an instruction or a statement of fact). The cephfs test is failing but the
errors may be due to my setup.
Updated by Anonymous over 9 years ago
That last job ran all weekend and was still running when I killed it after running for at least 60 hours.
Updated by Anonymous over 9 years ago
The command that ran forever is:
sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-fuse -f --name client.0 /home/ubuntu/cephtest/mnt.0
I am guessing that the daemon-helper argparsing is now extracting the -f and --name which were meant to be used by the ceph-fuse command.
Updated by Anonymous over 9 years ago
Scratch that. The version of daemon-helper that I am running does not have that change checked in.
Updated by Ian Colle over 9 years ago
- Target version changed from sprint14 to sprint15
Updated by Anonymous over 9 years ago
- Status changed from In Progress to Resolved
#9641 appears to be a problem with file journals. The rest of the tests work fine.
Once that problem is fixed, then it appears that the tests that we need are in ceph-qa-suite.