Project

General

Profile

Actions

Feature #9094

closed

ceph: test different journal modes (file, aio file, block device) modes in rados suite

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

100%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Subtasks 1 (0 open1 closed)

Subtask #9641: file system journal test takes forever to run.Resolved10/01/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

  • Target version deleted (sprint11)
Actions #2

Updated by Tamilarasi muthamizhan over 9 years ago

  • Target version set to sprint13
Actions #3

Updated by Anonymous over 9 years ago

  • Assignee set to Anonymous
Actions #4

Updated by Anonymous over 9 years ago

It appears that there is already a rados test for aio journals and a test for file journals.

suites/rados/objectstore/filestore-idempotent-aio-journal.yaml
suites/rados/objectstore/filejournal.yaml

It also appears that block journal support exists in tasks/ceph.py

So I think what needs to be added here is block device journal test, unless that is a default in which case, all the code is probably here.

Actions #5

Updated by Anonymous over 9 years ago

  • Target version changed from sprint13 to sprint14
Actions #6

Updated by Anonymous over 9 years ago

I am not entirely convince that aio journalling is working. Running the following yaml

roles:
- [mon.0, osd.0, osd.1, client.0]
tasks:
- install:
- ceph:
    conf:
      global:
        journal aio: true
- filestore_idempotent:

generated the following messages on teuthology.log

2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-17T18:11:19.269 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.269631 7ffa5ac61780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory

It also appears that the checking of this journalling parameter is not complete. The same messages were generated when the 'journal aio:true' line was replace with
'journal aardvark:true'

Actions #7

Updated by Anonymous over 9 years ago

I am not entirely convince that aio journalling is working. Running the following yaml

roles:
- [mon.0, osd.0, osd.1, client.0]
tasks:
- install:
- ceph:
    conf:
      global:
        journal aio: true
- filestore_idempotent:

generated the following messages on teuthology.log

2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-17T18:11:19.269 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.269631 7ffa5ac61780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory

It also appears that the checking of this journalling parameter is not complete. The same messages were generated when the 'journal aio:true' line was replace with
'journal aardvark:true'

Actions #8

Updated by Anonymous over 9 years ago

It appears that we currently have suites that actually check the different journal modes, but they are not organized as one journal suite. The dates of the checkins of some of the files seem to be just after this ticket was opened, so I suspect these are possible implementations of this testing:

For Aio testing, ceph-qa-suite/suites/rados/objectstore/filestore-idempotent-aio-journal.yaml
contains the following

roles:
- [mon.0, osd.0, osd.1, client.0]
tasks:
- install:
- ceph:
    conf:
      global:
        journal aio: true
- filestore_idempotent:

Running this yaml on teuthology leads to the follow output:

2014-09-17T18:11:19.223 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.223198 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-17T18:11:19.268 INFO:teuthology.orchestra.run.vpm160.stderr:2014-09-18 01:11:19.268388 7ffa5ac61780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
.
.
.

This terminated with a pass but I am not sure how much was tested.

The raw disk I/O journaling was handled in ceph-qa-suite/suites/rados/objectstore/filejournal.yaml in this snippet:

roles:
- [mon.0, osd.0, osd.1, client.0]
tasks:
- install:
- ceph:
- exec:
    client.0:
      - ceph_test_filejournal

Using this yaml in a teuthology run produced a log that contained the following lines:

2014-09-19T13:06:27.590 INFO:teuthology.orchestra.run.vpm134:Running: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_filejournal'
2014-09-19T13:06:27.868 INFO:teuthology.orchestra.run.vpm134.stdout:path /tmp/ceph_test_filejournal.tmp.2029965701
2014-09-19T13:06:27.868 INFO:teuthology.orchestra.run.vpm134.stdout:DIRECTIO OFF  AIO OFF
2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[==========] Running 12 tests from 1 test case.
2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[----------] Global test environment set-up.
2014-09-19T13:06:27.869 INFO:teuthology.orchestra.run.vpm134.stdout:[----------] 12 tests from TestFileJournal
2014-09-19T13:06:27.870 INFO:teuthology.orchestra.run.vpm134.stdout:[ RUN      ] TestFileJournal.Create
2014-09-19T13:06:27.887 INFO:teuthology.orchestra.run.vpm134.stdout:[       OK ] TestFileJournal.Create (20 ms)
2014-09-19T13:06:27.888 INFO:teuthology.orchestra.run.vpm134.stdout:[ RUN      ] TestFileJournal.WriteSmall

I concluded that this test is probably exercising what was intended and is therefore complete.

For file-system I/O, ceph-qa-suites/fs/basic/tasks/cephffs_journal_tool.yaml looks like:

tasks:
- ceph-fuse:
- workunit:
    clients:
      all: [fs/misc/trivial_sync.sh]
- ceph-fuse:
    client.0: 
        mounted: false
- ceph.stop: [mds.*]
- workunit:
    clients:
        client.0: [suites/cephfs_journal_tool_smoke.sh]
- ceph.restart: [mds.*]
- ceph-fuse:
    client.0: 
        mounted: true
- workunit:
    clients:
      all: [fs/misc/trivial_sync.sh]

and produces a run that takes hours (maybe forever), with repeated lines that look like:

2014-09-19 20:17:27,610.610 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0" 
2014-09-19 20:17:32,622.622 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0" 
2014-09-19 20:17:37,639.639 INFO:teuthology.orchestra.run.vpm025:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0" 

The %T looks suspicious to me. It is quite possible that I need to tweek this yaml file
a bit to get this to run.

Conclusion: All the code exists (both in ceph-qa-suite, and teuthology). The ceph block device write test looks good. The aio test may or may not be good (I need to figure if the
use aio comment is an instruction or a statement of fact). The cephfs test is failing but the
errors may be due to my setup.

Actions #9

Updated by Anonymous over 9 years ago

That last job ran all weekend and was still running when I killed it after running for at least 60 hours.

Actions #10

Updated by Anonymous over 9 years ago

The command that ran forever is:

sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-fuse -f --name client.0 /home/ubuntu/cephtest/mnt.0

I am guessing that the daemon-helper argparsing is now extracting the -f and --name which were meant to be used by the ceph-fuse command.

Actions #11

Updated by Anonymous over 9 years ago

Scratch that. The version of daemon-helper that I am running does not have that change checked in.

Actions #12

Updated by Ian Colle over 9 years ago

  • Status changed from New to In Progress
Actions #13

Updated by Ian Colle over 9 years ago

  • Target version changed from sprint14 to sprint15
Actions #14

Updated by Anonymous over 9 years ago

  • Status changed from In Progress to Resolved

#9641 appears to be a problem with file journals. The rest of the tests work fine.

Once that problem is fixed, then it appears that the tests that we need are in ceph-qa-suite.

Actions

Also available in: Atom PDF