Feature #626
qa: add IOR, rompio, or other parallel workloads suite
Description
We've had reports that rompio is just terrifically unstable, and shows serious scaling issues.
IOR is a more common benchamrk in this area.
History
#1 Updated by Sage Weil over 11 years ago
- Tracker changed from Tasks to Feature
- Subject changed from Test rompio on cfuse to qa: add IOR, rompio, or other parallel workloads suite
- Target version set to v0.37
#2 Updated by Sage Weil over 11 years ago
- Target version deleted (
v0.37) - translation missing: en.field_position set to 21
#3 Updated by Sage Weil over 11 years ago
IOR depends on mpi. mpich2 is pretty easy to set up (there's a package).
I think an ior task would need to:
- take a list of clients
- push a machine list to a 'master' node that starts the job (probably not where teuth itself is running)
- set up a temp ssh key so they can connect to each other (or we could make this part of the teuth worker config?)
- set up a symlink on each client so that a single path (/tmp/cephtest/sharedmnt.0 or something) gets you into the mount point on all machines
- download the ior tarball, compile and build it (needs to build with mpicc, which requires mpich2 be installed)
- run ior with whatever parameters the task specifies.
Might make sense to make a generic 'mpi' task that sets up the mpi environment, and separate that from the ior bits.
#4 Updated by Sage Weil over 11 years ago
- Target version set to v0.36
- translation missing: en.field_position deleted (
25) - translation missing: en.field_position set to 31
#5 Updated by Sage Weil over 11 years ago
- translation missing: en.field_position deleted (
53) - translation missing: en.field_position set to 878
#6 Updated by Sage Weil over 11 years ago
- translation missing: en.field_position deleted (
886) - translation missing: en.field_position set to 1
- translation missing: en.field_position changed from 1 to 899
#7 Updated by Sage Weil over 11 years ago
- Target version changed from v0.36 to v0.37
#8 Updated by Sage Weil over 11 years ago
- Target version deleted (
v0.37)
#9 Updated by Sage Weil about 10 years ago
- Project changed from Ceph to CephFS
- Category deleted (
qa)
#10 Updated by Sage Weil about 10 years ago
- translation missing: en.field_position deleted (
1339) - translation missing: en.field_position set to 2
#11 Updated by Greg Farnum about 10 years ago
SamL has done some work on getting MPI going under teuthology, and on running some multi-client FS tests. I'm not sure what the status of that work is, but whoever does this bug will need to check into that.
#12 Updated by Sage Weil about 10 years ago
- Status changed from New to In Progress
- Assignee set to Sam Lang
Yeah, that's what slang's working on to enable this. Assigning this to him.
#13 Updated by Sage Weil about 10 years ago
- translation missing: en.field_position deleted (
4) - translation missing: en.field_position set to 1
#14 Updated by Sage Weil about 10 years ago
- Target version set to v0.57b
- translation missing: en.field_position deleted (
1) - translation missing: en.field_position set to 5
#15 Updated by Sam Lang about 10 years ago
- Status changed from In Progress to Closed
Added tests to the marginal qa suite that run IOR, mdtest, and fsx-mpi.