Project

General

Profile

Actions

Fix #5399

closed

timestamp changes on replayed mds request (pjd link 71)

Added by Sage Weil almost 11 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

teuthology-2013-06-19_10:46:59-fs-cuttlefish-master-basic 40138 40141

Actions #1

Updated by Sage Weil almost 11 years ago

  • Project changed from Ceph to CephFS
Actions #2

Updated by Greg Farnum almost 11 years ago

  • Status changed from New to Rejected

It's a time stamp check for things going backwards, and is failing due to out-of-sync clocks (over a network) being handled in ways we know about and have no intention of changing.

Actions #3

Updated by Greg Farnum almost 11 years ago

  • Status changed from Rejected to New

Hmm, Sage points out this might be something else; reopening.

Actions #4

Updated by Sage Weil almost 11 years ago

  • Subject changed from pjd link test failure on cuttlefish to timestamp changes on replayed mds request (pjd link 71)
  • Status changed from New to 12

- we send a create to mds
- get an ack, but it isn't journaled
- pjd stats the mtime/ctime/ec.
- mds restarts
- we replay the request
- new mds replies, this time with a different mtime/ctime/etc
- ... pjd complains

in reality, the directory ctime/mtime are also different, but the client doesn't notice because it only restats the file (and not the dir) on the traceless reply.

from pjd output

2013-06-20T18:34:12.177 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest create fstest_28f94f92cf9776f020b4e141e70ad403 0644
2013-06-20T18:34:12.177 INFO:teuthology.task.workunit.client.0.err:++ tail -1
2013-06-20T18:34:12.186 INFO:teuthology.task.workunit.client.0.err:+ r=0
2013-06-20T18:34:12.186 INFO:teuthology.task.workunit.client.0.err:+ echo 0
2013-06-20T18:34:12.187 INFO:teuthology.task.workunit.client.0.err:+ egrep '^0$'
2013-06-20T18:34:12.188 INFO:teuthology.task.workunit.client.0.err:+ '[' 0 -eq 0 ']'
2013-06-20T18:34:12.188 INFO:teuthology.task.workunit.client.0.err:+ echo 'ok 68'
2013-06-20T18:34:12.189 INFO:teuthology.task.workunit.client.0.err:++ expr 68 + 1
2013-06-20T18:34:12.190 INFO:teuthology.task.workunit.client.0.err:+ ntest=69
2013-06-20T18:34:12.190 INFO:teuthology.task.workunit.client.0.err:+ expect 0 -- chown fstest_28f94f92cf9776f020b4e141e70ad403 65534 -1
2013-06-20T18:34:12.190 INFO:teuthology.task.workunit.client.0.err:+ e=0
2013-06-20T18:34:12.190 INFO:teuthology.task.workunit.client.0.err:+ shift
2013-06-20T18:34:12.191 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest -- chown fstest_28f94f92cf9776f020b4e141e70ad403 65534 -1
2013-06-20T18:34:12.191 INFO:teuthology.task.workunit.client.0.err:++ tail -1
2013-06-20T18:34:12.195 INFO:teuthology.task.workunit.client.0.err:+ r=0
2013-06-20T18:34:12.195 INFO:teuthology.task.workunit.client.0.err:+ echo 0
2013-06-20T18:34:12.195 INFO:teuthology.task.workunit.client.0.err:+ egrep '^0$'
2013-06-20T18:34:12.197 INFO:teuthology.task.workunit.client.0.err:+ '[' 0 -eq 0 ']'
2013-06-20T18:34:12.197 INFO:teuthology.task.workunit.client.0.err:+ echo 'ok 69'
2013-06-20T18:34:12.197 INFO:teuthology.task.workunit.client.0.err:++ expr 69 + 1
2013-06-20T18:34:12.198 INFO:teuthology.task.workunit.client.0.err:+ ntest=70
2013-06-20T18:34:12.199 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest stat fstest_28f94f92cf9776f020b4e141e70ad403 ctime
2013-06-20T18:34:12.202 INFO:teuthology.task.workunit.client.0.err:+ ctime1=1371778452
2013-06-20T18:34:12.202 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest stat . ctime
2013-06-20T18:34:12.205 INFO:teuthology.task.workunit.client.0.err:+ dctime1=1371778452
2013-06-20T18:34:12.205 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest stat . mtime
2013-06-20T18:34:12.208 INFO:teuthology.task.workunit.client.0.err:+ dmtime1=1371778452
2013-06-20T18:34:12.208 INFO:teuthology.task.workunit.client.0.err:+ sleep 1
2013-06-20T18:34:12.407 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:12.416 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:12.416 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:12.597 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:12.608 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:kill mds.a (rank=0)
2013-06-20T18:34:12.608 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:12.608 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:12.789 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:12.800 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit
2013-06-20T18:34:12.809 INFO:teuthology.task.ceph.mds.a:Stopped
2013-06-20T18:34:12.809 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:12.809 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:12.994 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:13.005 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:13.005 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:13.188 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:13.199 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:waiting till mds map indicates mds.a is laggy/crashed, in failed state, or mds.a is removed from mdsmap
2013-06-20T18:34:13.210 INFO:teuthology.task.workunit.client.0.err:+ expect EACCES -u 65534 link fstest_28f94f92cf9776f020b4e141e70ad403 fstest_eb61070000c32b1cf879744fc2fdbb9d
2013-06-20T18:34:13.210 INFO:teuthology.task.workunit.client.0.err:+ e=EACCES
2013-06-20T18:34:13.210 INFO:teuthology.task.workunit.client.0.err:+ shift
2013-06-20T18:34:13.211 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest -u 65534 link fstest_28f94f92cf9776f020b4e141e70ad403 fstest_eb61070000c32b1cf879744fc2fdbb9d
2013-06-20T18:34:13.211 INFO:teuthology.task.workunit.client.0.err:++ tail -1
2013-06-20T18:34:15.199 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:15.199 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:15.380 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:15.391 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:15.392 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:15.738 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
2013-06-20T18:34:15.749 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:waiting till mds map indicates mds.a is laggy/crashed, in failed state, or mds.a is removed from mdsmap
2013-06-20T18:34:17.748 DEBUG:teuthology.misc:with jobid basedir: 41286
2013-06-20T18:34:17.749 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: '/home/ubuntu/cephtest/41286/enable-coredump ceph-coverage /home/ubuntu/cephtest/41286/archive/coverage ceph mds dump --format=json'
2013-06-20T18:34:17.933 INFO:teuthology.orchestra.run.err:dumped mdsmap epoch 17
...
2013-06-20T18:34:31.306 INFO:teuthology.task.workunit.client.0.err:+ r=EACCES
2013-06-20T18:34:31.306 INFO:teuthology.task.workunit.client.0.err:+ echo EACCES
2013-06-20T18:34:31.306 INFO:teuthology.task.workunit.client.0.err:+ egrep '^EACCES$'
2013-06-20T18:34:31.308 INFO:teuthology.task.workunit.client.0.err:+ '[' 0 -eq 0 ']'
2013-06-20T18:34:31.308 INFO:teuthology.task.workunit.client.0.err:+ echo 'ok 70'
2013-06-20T18:34:31.308 INFO:teuthology.task.workunit.client.0.err:++ expr 70 + 1
2013-06-20T18:34:31.309 INFO:teuthology.task.workunit.client.0.err:+ ntest=71
2013-06-20T18:34:31.310 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest stat fstest_28f94f92cf9776f020b4e141e70ad403 ctime
2013-06-20T18:34:31.396 INFO:teuthology.task.workunit.client.0.err:+ ctime2=1371778471
2013-06-20T18:34:31.396 INFO:teuthology.task.workunit.client.0.err:+ test_check 1371778452 -eq 1371778471
2013-06-20T18:34:31.396 INFO:teuthology.task.workunit.client.0.err:+ '[' 1371778452 -eq 1371778471 ']'
2013-06-20T18:34:31.396 INFO:teuthology.task.workunit.client.0.err:+ echo 'not ok 71'
2013-06-20T18:34:31.397 INFO:teuthology.task.workunit.client.0.err:++ expr 71 + 1
2013-06-20T18:34:31.397 INFO:teuthology.task.workunit.client.0.err:+ ntest=72
2013-06-20T18:34:31.398 INFO:teuthology.task.workunit.client.0.err:++ /home/ubuntu/cephtest/41286/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC-open24/tests/link/../../fstest stat . ctime
2013-06-20T18:34:31.401 INFO:teuthology.task.workunit.client.0.err:+ dctime2=1371778452
2013-06-20T18:34:31.402 INFO:teuthology.task.workunit.client.0.err:+ test_check 1371778452 -eq 1371778452

from original mds

2013-06-20 18:34:12.385655 7f73adc35700  1 -- 10.214.132.10:6813/18513 <== client.4129 10.214.132.11:0/19549 368 ==== client_request(client.4129:709 create #1000000014b/fstest_28f94f92cf9776f020b4e141e70ad403) v1 ==== 197+0+0 (1269283097 0 0) 0x322a280 con 0x32da420
2013-06-20 18:34:12.388626 7f73adc35700  1 -- 10.214.132.10:6813/18513 --> 10.214.132.11:0/19549 -- client_reply(???:709 = 0 Success unsafe) v1 -- ?+0 0x2e85b00 con 0x32da420
(then it restarts)

ubuntu@teuthology:/a/sage-2013-06-20_18:13:54-fs-wip-mds-testing-basic/41286

Actions #5

Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High

probably need to extend the replayed request message to include the timestamps we got for the inode and dir so that the mds can reuse them.

Actions #6

Updated by Greg Farnum about 10 years ago

  • Tracker changed from Bug to Fix
Actions #7

Updated by Sage Weil almost 10 years ago

  • Status changed from 12 to Resolved

commit:f81c53a716f43c8a36d6b2aea88cbebe961502d8

Actions

Also available in: Atom PDF