Project

General

Profile

Actions

Bug #12861

closed

rados/test_alloc_hint.sh failure

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2015-08-29_05:13:00-rados-wip-sage-testing---basic-multi/1037883

2015-08-29T23:42:24.594 INFO:tasks.workunit.client.0.plana75.stderr:+ sudo ceph daemon osd.0 flush_journal
2015-08-29T23:42:25.549 INFO:tasks.workunit.client.0.plana75.stdout:
2015-08-29T23:42:25.559 INFO:tasks.workunit.client.0.plana75.stderr:+ fns=(${OSD_DATA[i]}/current/${PGID}*_head/${OBJ}_*)
2015-08-29T23:42:25.559 INFO:tasks.workunit.client.0.plana75.stderr:+ local fns
2015-08-29T23:42:25.559 INFO:tasks.workunit.client.0.plana75.stderr:+ local count=0
2015-08-29T23:42:25.559 INFO:tasks.workunit.client.0.plana75.stderr:+ '[' 0 -ne 1 ']'
2015-08-29T23:42:25.560 INFO:tasks.workunit.client.0.plana75.stderr:+ echo 'bad fns count: 0'
2015-08-29T23:42:25.560 INFO:tasks.workunit.client.0.plana75.stderr:bad fns count: 0

it looks like flush_journal completed and wrote foo, though.. perhaps filestore sharded things? :/


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #12878: ceph user breaks scrub_testResolvedKefu Chai08/31/2015

Actions
Actions #1

Updated by Sage Weil over 8 years ago

/a/sage-2015-08-30_05:52:47-rados-master---basic-multi/1039290

Actions #2

Updated by Kefu Chai over 8 years ago

i spot this failure also /a/kchai-2015-08-30_20:25:51-rados-master---basic-multi/1039521/. it failed with the first test,

rados -p "${POOL}" set-alloc-hint "${OBJ}" "${SMALL_HINT}" "${SMALL_HINT}" 
expect_alloc_hint_eq "${SMALL_HINT}" 

but the tests passed in local env.

CEPH_DEV_DIR=/media/xfs/dev MON=1 OSD=3 ./vstart.sh -n -l -X
../qa/workunits/rados/test_alloc_hint.sh
Actions #3

Updated by Kefu Chai over 8 years ago

sage suspects

it could be caused by "permission denied": "ubuntu" failed to read osd's data dir, which is owned by "ceph:ceph".

Actions #4

Updated by Sage Weil over 8 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Sage Weil over 8 years ago

  • Assignee set to Kefu Chai
Actions #6

Updated by Sage Weil over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Kefu Chai over 8 years ago

http://pulpito.ceph.com/kchai-2015-09-07_23:23:54-rados-wip-kefu-testing---basic-multi/1046356/ failed.

2015-09-07T23:29:47.784 INFO:tasks.workunit.client.0.burnupi26.stderr:++ sudo find -type f
2015-09-07T23:29:47.785 INFO:tasks.workunit.client.0.burnupi26.stderr:++ grep head/foo_
2015-09-07T23:29:47.792 INFO:tasks.workunit.client.0.burnupi26.stderr:+ local fns=
2015-09-07T23:29:47.792 INFO:tasks.workunit.client.0.burnupi26.stderr:+ local count=1
2015-09-07T23:29:47.792 INFO:tasks.workunit.client.0.burnupi26.stderr:+ '[' 1 -ne 1 ']'
2015-09-07T23:29:47.792 INFO:tasks.workunit.client.0.burnupi26.stderr:+ local extsize
2015-09-07T23:29:47.793 INFO:tasks.workunit.client.0.burnupi26.stderr:++ sudo xfs_io -c extsize ''
2015-09-07T23:29:47.856 INFO:tasks.workunit.client.0.burnupi26.stderr:No such file or directory
2015-09-07T23:29:47.857 INFO:tasks.workunit.client.0.burnupi26.stderr:+ extsize=

sha1 of the wip-kefu-testing branch: 6363af54296a5eddfeaa62af852e99e35816f56a

$ git merge-base --is-ancestor 64962aafed362a2a798eefe54158f65af767f0bc 6363af54296a5eddfeaa62af852e99e35816f56a && echo 1
1
$ git show --oneline 64962aafed362a2a798eefe54158f65af767f0bc
64962aa qa/workunits/rados/test_alloc_hint.sh: sudo to list files
diff --git a/qa/workunits/rados/test_alloc_hint.sh b/qa/workunits/rados/test_alloc_hint.sh
index 86d3986..c43fc3c 100755
--- a/qa/workunits/rados/test_alloc_hint.sh
+++ b/qa/workunits/rados/test_alloc_hint.sh
@@ -61,7 +61,7 @@ function expect_alloc_hint_eq() {

         # e.g., .../25.6_head/foo__head_7FC1F406__19
         #       .../26.bs1_head/bar__head_EFE6384B__1a_ffffffffffffffff_1
-        local fns=(${OSD_DATA[i]}/current/${PGID}*_head/${OBJ}_*)
+        local fns=$(sudo find ${OSD_DATA[i]}/current/${PGID}*_head -type f | grep head/${OBJ}_)
         local count="${#fns[@]}" 
         if [ "${count}" -ne 1 ]; then
             echo "bad fns count: ${count}" >&2
Actions #8

Updated by Kefu Chai over 8 years ago

  • Status changed from Resolved to 12

sage, i am reopening this ticket, spotted again in my test branch.

Actions #9

Updated by Kefu Chai over 8 years ago

in ceph.git/wip-12861. in the test script, i added a "sleep 2h" after flushing the object, will try to login to the test machine to see what happened.

Actions #10

Updated by Kefu Chai over 8 years ago

# find /var/lib/ceph/osd/ceph-0/current/1.6*_head -type f | grep head/foo_
/var/lib/ceph/osd/ceph-0/current/1.6_head/foo__head_7FC1F406__1
$ sudo find /var/lib/ceph/osd/ceph-0/current/1.6*_head -type f | grep head/foo_
find: `/var/lib/ceph/osd/ceph-0/current/1.6*_head': No such file or directory
$ fns=$(sudo find /var/lib/ceph/osd/ceph-0/current/1.6*_head -type f | grep head/foo_)
find: `/var/lib/ceph/osd/ceph-0/current/1.6*_head': No such file or directory
$ echo $fns

$ echo "${#fns[@]}" 
1

probably we should go this (ugly) way:

$ fns=$(sudo sh -c 'ls /var/lib/ceph/osd/ceph-0/current/1.6*_head/foo_*')
$ echo $fns
/var/lib/ceph/osd/ceph-0/current/1.6_head/foo__head_7FC1F406__1

Actions #11

Updated by Kefu Chai over 8 years ago

  • Status changed from 12 to 7
Actions #12

Updated by Kefu Chai over 8 years ago

  • Status changed from 7 to Fix Under Review
Actions #13

Updated by Sage Weil over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF