Bug #2947
closedosd: out of order reply
0%
Description
triggered by thrashing by this job:
nuke-on-error: true overrides: ceph: fs: ext4 log-whitelist: - slow request conf: osd: debug ms: 20 debug osd: 20 debug filestore: 20 client: debug ms: 20 debug objecter: 20 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 500 op_weights: delete: 10 read: 45 write: 45 ops: 4000
we observe (after 10-20 successes) a failure like this:
2012-08-13T17:17:26.349 INFO:teuthology.task.rados.rados.0.out:finishing write tid 2 to plana689588-300 2012-08-13T17:17:26.349 INFO:teuthology.task.rados.rados.0.out:finishing write tid 1 to plana689588-300 2012-08-13T17:17:26.349 INFO:teuthology.task.rados.rados.0.err:Error: finished tid 1 when last_acked_tid was 2 2012-08-13T17:17:26.349 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f9bc37fe700 time 2012-08-13 17:17:26.342537 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: 822: FAILED assert(0) 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: ceph version 0.50-157-g4e7d49e (commit:4e7d49e0c66ef39593e3bf5e6684779620c4eeda) 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 1: (WriteOp::_finish(TestOp::CallbackInfo*)+0x29e) [0x41385e] 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 2: (write_callback(void*, void*)+0x23) [0x421a43] 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 3: (librados::C_AioComplete::finish(int)+0x1e) [0x7f9bdd3b4dae] 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 4: (Finisher::finisher_thread_entry()+0x218) [0x7f9bdd4114e8] 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 5: (()+0x7e9a) [0x7f9bdd077e9a] 2012-08-13T17:17:26.350 INFO:teuthology.task.rados.rados.0.err: 6: (clone()+0x6d) [0x7f9bdc88f4bd] 2012-08-13T17:17:26.351 INFO:teuthology.task.rados.rados.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
from the osd, we see
ubuntu@teuthology:/a/sage-msgr-rados/14$ zcat remote/ubuntu@plana66.front.sepia.ceph.com/log/osd.2.log.gz | grep client.4117.0:313 | egrep '( ==| -->)' 2012-08-13 17:17:24.040289 7f47062b3700 1 -- 10.214.132.12:6805/6031 <== client.4117 10.214.132.10:0/1009588 444 ==== osd_op(client.4117.0:3133 plana689588-300 [write 0~56] 0.1860ffb4) v4 ==== 154+0+56 (2650586133 0 4000434292) 0x33f5000 con 0x2e13780 2012-08-13 17:17:24.084791 7f47062b3700 1 -- 10.214.132.12:6805/6031 <== client.4117 10.214.132.10:0/1009588 445 ==== osd_op(client.4117.0:3134 plana689588-300 [write 561017~484826] 0.1860ffb4) v4 ==== 154+0+484826 (681737827 0 1415563408) 0x3249d80 con 0x2e13780 2012-08-13 17:17:24.237951 7f47062b3700 1 -- 10.214.132.12:6805/6031 <== client.4117 10.214.132.10:0/1009588 446 ==== osd_op(client.4117.0:3135 plana689588-300 [write 1813902~618476] 0.1860ffb4) v4 ==== 154+0+618476 (1857590084 0 20743113) 0x3249b40 con 0x2e13780 2012-08-13 17:17:24.239559 7f47062b3700 1 -- 10.214.132.12:6805/6031 <== client.4117 10.214.132.10:0/1009588 447 ==== osd_op(client.4117.0:3136 plana689588-300 [write 2845880~601117] 0.1860ffb4) v4 ==== 154+0+601117 (2960727005 0 399445279) 0x33d7480 con 0x2e13780 2012-08-13 17:17:25.389857 7f47031ac700 1 -- 10.214.132.12:6807/6031 --> osd.1 10.214.132.12:6804/6033 -- osd_sub_op(client.4117.0:3134 0.14 1860ffb4/plana689588-300/head//0 [] v 88'109 snapset=0=[]:[] snapc=0=[]) v7 -- ?+485369 0x307f400 2012-08-13 17:17:25.390836 7f47031ac700 1 -- 10.214.132.12:6807/6031 --> osd.1 10.214.132.12:6804/6033 -- osd_sub_op(client.4117.0:3133 0.14 1860ffb4/plana689588-300/head//0 [] v 88'110 snapset=0=[]:[] snapc=0=[]) v7 -- ?+599 0x372a400 2012-08-13 17:17:25.391718 7f47029ab700 1 -- 10.214.132.12:6807/6031 --> osd.1 10.214.132.12:6804/6033 -- osd_sub_op(client.4117.0:3135 0.14 1860ffb4/plana689588-300/head//0 [] v 88'111 snapset=0=[]:[] snapc=0=[]) v7 -- ?+619019 0x34a2400 2012-08-13 17:17:25.392604 7f47029ab700 1 -- 10.214.132.12:6807/6031 --> osd.1 10.214.132.12:6804/6033 -- osd_sub_op(client.4117.0:3136 0.14 1860ffb4/plana689588-300/head//0 [] v 88'112 snapset=0=[]:[] snapc=0=[]) v7 -- ?+601660 0x34a1a00 2012-08-13 17:17:25.489379 7f4705ab2700 1 -- 10.214.132.12:6807/6031 <== osd.1 10.214.132.12:6804/6033 482 ==== osd_sub_op_reply(client.4117.0:3134 0.14 1860ffb4/plana689588-300/head//0 [] ondisk, result = 0) v1 ==== 141+0+0 (3830699219 0 0) 0x344b500 con 0x2cd5140 2012-08-13 17:17:25.489574 7f4705ab2700 1 -- 10.214.132.12:6807/6031 <== osd.1 10.214.132.12:6804/6033 483 ==== osd_sub_op_reply(client.4117.0:3133 0.14 1860ffb4/plana689588-300/head//0 [] ondisk, result = 0) v1 ==== 141+0+0 (1714506867 0 0) 0x344b280 con 0x2cd5140 2012-08-13 17:17:25.490041 7f4705ab2700 1 -- 10.214.132.12:6807/6031 <== osd.1 10.214.132.12:6804/6033 484 ==== osd_sub_op_reply(client.4117.0:3135 0.14 1860ffb4/plana689588-300/head//0 [] ondisk, result = 0) v1 ==== 141+0+0 (1725329692 0 0) 0x344b000 con 0x2cd5140 2012-08-13 17:17:25.490392 7f4705ab2700 1 -- 10.214.132.12:6807/6031 <== osd.1 10.214.132.12:6804/6033 485 ==== osd_sub_op_reply(client.4117.0:3136 0.14 1860ffb4/plana689588-300/head//0 [] ondisk, result = 0) v1 ==== 141+0+0 (4081841846 0 0) 0x2fa2c80 con 0x2cd5140
note that :3134 and :3133 switch order.
the reason appears to be:
- 3133 put on waiting_for_missing queue, recovery ensues
- 3134 put on op_waiters (waiting for map) (client had a newer map epoch for this request)
- pg does advance_map
- on_change requeues waiting_for_missing
- pg does activate_map
- take_waiters requeues op_waiters (waiting for map)
-> 3134 now preceeds 3133 in the op_queue.
Updated by Tamilarasi muthamizhan over 11 years ago
- Status changed from 7 to In Progress
- Source changed from Development to Q/A
Updated by Sage Weil over 11 years ago
ubuntu@teuthology:/a/teuthology-2012-08-20_02:00:02-regression-testing-testing-basic/4930$ cat config.yaml kernel: &id001 kdb: true sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c nuke-on-error: true overrides: ceph: fs: ext4 log-whitelist: - slow request sha1: 77745f9441c499871e7de0782fc7db89fcec0c53 workunit: sha1: 77745f9441c499871e7de0782fc7db89fcec0c53 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana78.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCl9XSQkT+TSrTTaMDs/IFmQ3nIyrLNiqLFEahRBSLwIMbVppF2EG9XVT1AkRbul9DbY6BTSVGd/AO205di6kFRv9qpHFPfMqUt3XUATEyBZ7drpwliNlM8taFxb43hPTVwLstqM+0wy1arlan6onug+S45ORvtWaKfo4w6e6DcQDKuH4N718mNzMk/ePkr+UbmeGLSZnxKOvKSfA4vFWzEpEJEQlpqSbCQzAl06qTxE9rMCAyQkSLPrr6UmEG1FY0ZX3M1vZYzV5MCVR1J2xofHvsatfbwZjgzJI2136aC8xOBLN4TT0URQ5pYWxAixEsoDRej2u3eYLWRFa9EyIvd ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjbUncT43IZIcpSlFXnWDaagYNdnaLTfgsN8TKi3J2QYOB1BlMpaSRBByXr5MW2ZGuWrB77USjJLdRW/feuwtAjDbwc7y5woxcxn1u9eqPh4RyQWqyCmKWvi3GpdM4/4vh+L+7X5rNzJPStTjiPXCxHK+stPGCk9pg+J+KKg/GlJ1Lwx1DeWetdForGqEVJrTSSWj8RM+Nyw/V+c+t2d6gW1SzjY2NvaDdVHfduM+8X9F9aTf3KLitnIvNQWXzfqoEDNPSwrrO2uvTDyqpFkW6HLpZDnWm+io8Y9P7XMrEhWCTlQX2SHeQv/BhaA1btYAw7qfLC9n8RbeXLlQnc5yn ubuntu@plana81.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzwyoEUguCYhu0TNdXrk7aWVUvePF6F6coBynLZ73Y/7eqTKAzxnNCiwUQx3kGjK33kliZDk7g/x4FdsiwzknDuGGCXP1pZyIGVtU5wNJ6ZM29XyH2SZvyU0MNfmoMzygxHR53TGcsK3hzwSbaW1woEpJmoqgIFQJ6BJr3nc2foKl79wBQdCNumqJ7sbh26xYVI29vYsJHUTYAdmyE4QrLaLOkZKU5Q/OvUnbKQbURcs7ArxFooObu9h0ENRPK4MKuxBFfgpAYTr/rMeWfVQxvSWsuOMpOdzLaNLt5UBYVVU+wxIFdDwcHb+2Er0rEV49W9xUD6JGXnaFjrxDDocfj tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 500 op_weights: delete: 10 read: 45 write: 45 ops: 4000
Updated by Tamilarasi muthamizhan over 11 years ago
More Logs: ubuntu@teuthology:/a/teuthology-2012-08-19_02:00:05-regression-testing-testing-basic/4288
Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Resolved
Updated by Sage Weil over 11 years ago
- Status changed from Resolved to In Progress
ooof, the saga continues: ubuntu@teuthology:/a/sage-gfoo2/5974
Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Resolved
Updated by Sage Weil over 11 years ago
- Target version changed from 83 to v0.52a
Updated by Sage Weil over 11 years ago
- Status changed from Resolved to In Progress
- Assignee set to Sage Weil
nooo!
kernel: &id001 kdb: true sha1: 995fc068ddf675260098c60591989bf2ee184338 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 coverage: true fs: xfs log-whitelist: - slow request sha1: aa91cf81af548463ae583872e319316871771e94 workunit: sha1: aa91cf81af548463ae583872e319316871771e94 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana76.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCoNlhzO4H0dC4/dcXgYUv2BA0azBZ8DQhoLeW3PWTI6bkmNKB5dR40BVtf7XcTwoLYsbKYKKf3hwQOo53TQNCOpiPjk2DyAIWUbth6JYraLWcDoIjsdGOjHi3hdjcCkIS3SzG/ZTgK8UwOu8fU2wdcuLNIV02jvU1t5n46Vw2Q1I8h4Am38x7+91UF41/E/ibHHxjFDbZMiqB75IWBaQumQzMmWkoH6z2xHQyWIxLUoHemxXGDCW2Mv4vaaHPsCfWuzbLLX+jhQRbOapWbaalM0/B2V0tieQGrunTlr9Pmve3+1Rga4RwB4cTYpmxYj+MvZqOSktky6Fpd2Dta+dr/ ubuntu@plana78.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCl9XSQkT+TSrTTaMDs/IFmQ3nIyrLNiqLFEahRBSLwIMbVppF2EG9XVT1AkRbul9DbY6BTSVGd/AO205di6kFRv9qpHFPfMqUt3XUATEyBZ7drpwliNlM8taFxb43hPTVwLstqM+0wy1arlan6onug+S45ORvtWaKfo4w6e6DcQDKuH4N718mNzMk/ePkr+UbmeGLSZnxKOvKSfA4vFWzEpEJEQlpqSbCQzAl06qTxE9rMCAyQkSLPrr6UmEG1FY0ZX3M1vZYzV5MCVR1J2xofHvsatfbwZjgzJI2136aC8xOBLN4TT0URQ5pYWxAixEsoDRej2u3eYLWRFa9EyIvd ubuntu@plana79.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCx0nVMVy140vXGRPqjqx63mfytPlqmoN7YoJ3Si0ti1XtvJTftB9EdQGwqj/tsY95DeUNBtAQs5TBsiLr1E/JHlKt7EXwyWsJNB2ntvkPJOMxoounypjkVgfv91EWmERQGFsalDmIYjSuSCG28g5Vaz8il9D7fH/ykKZ38EQChhPXIpB2bieJOr2Xm6llde1q2rUEltV17EmiQvu9eUuxb9y9h057k6GSqpsTViPADlT7CG7W60bqWs8d7TvV4rvPhUy6oyUp1ar8116NMSFUiaTgVTidDiQ3xyZeguwJAbzh86MQdHVhSi89W/vjvoEP1opjZP3RArB4BoNwzz/Dh tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 500 op_weights: delete: 10 read: 45 write: 45 ops: 4000 ubuntu@teuthology:/a/teuthology-2012-08-25_19:00:04-regression-master-testing-gcov/8842
Updated by Sage Weil over 11 years ago
- Status changed from 7 to Resolved
Updated by Tamilarasi muthamizhan over 11 years ago
- Status changed from Resolved to In Progress
- Target version deleted (
v0.52a)
Recent logs: ubuntu@teuthology:/a/teuthology-2012-10-20_19:00:11-regression-master-testing-gcov/4810
2012-10-20T20:31:19.803 INFO:teuthology.task.rados.rados.0.err:Error: finished tid 1 when last_acked_tid was 3 2012-10-20T20:31:19.806 INFO:teuthology.task.rados.rados.0.out:Writing plana3710565-450 from 3100727 to 3596741 tid 4 ranges are [0~57,796741~740629,1992905~604660,3100727~496014] 2012-10-20T20:31:19.815 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f0be97fa700 time 2012-10-20 20:31:19.809501 2012-10-20T20:31:19.815 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: 823: FAILED assert(0) 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: ceph version 0.53-396-ge6cbcb2 (commit:e6cbcb240073ddb39c7434d6133c8c6722c71e1c) 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 1: (WriteOp::_finish(TestOp::CallbackInfo*)+0x42d) [0x418b8d] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 2: (write_callback(void*, void*)+0x29) [0x42d1a9] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 3: (librados::C_AioComplete::finish(int)+0x20) [0x7f0bf3aee7a0] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 4: (Finisher::finisher_thread_entry()+0x340) [0x7f0bf3b71d50] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 5: (Finisher::FinisherThread::entry()+0x15) [0x7f0bf3ad7975] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 6: (Thread::_entry_func(void*)+0x12) [0x7f0bf3cf2fc2] 2012-10-20T20:31:19.816 INFO:teuthology.task.rados.rados.0.err: 7: (()+0x7e9a) [0x7f0bf378fe9a] 2012-10-20T20:31:19.817 INFO:teuthology.task.rados.rados.0.err: 8: (clone()+0x6d) [0x7f0bf2fa74bd] 2012-10-20T20:31:19.827 INFO:teuthology.task.rados.rados.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Can't reproduce
Updated by Tamilarasi muthamizhan about 11 years ago
- Status changed from Can't reproduce to In Progress
recent logs: ubuntu@teuthology:/a/teuthology-2013-02-10_01:00:02-regression-master-testing-gcov/4134
2013-02-10T06:29:43.312 INFO:teuthology.task.rados.rados.0.err:Error: finished tid 1 when last_acked_tid was 2 2013-02-10T06:29:43.314 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fa357fff700 time 2013-02-10 06:29:38.39236 1 2013-02-10T06:29:43.314 INFO:teuthology.task.rados.rados.0.err:./test/osd/RadosModel.h: 823: FAILED assert(0) 2013-02-10T06:29:43.314 INFO:teuthology.task.rados.rados.0.err: ceph version 0.56-707-gabc80ff (abc80ffc5b1aab3915c049701ab85c57fe93d550) 2013-02-10T06:29:43.314 INFO:teuthology.task.rados.rados.0.err: 1: (WriteOp::_finish(TestOp::CallbackInfo*)+0x42b) [0x41843b] 2013-02-10T06:29:43.314 INFO:teuthology.task.rados.rados.0.err: 2: (write_callback(void*, void*)+0x29) [0x42cd69] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 3: (librados::C_AioComplete::finish(int)+0x20) [0x7fa36a19f230] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 4: (Finisher::finisher_thread_entry()+0x298) [0x7fa36a22fd38] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 5: (Finisher::FinisherThread::entry()+0x15) [0x7fa36a188935] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 6: (Thread::_entry_func(void*)+0x12) [0x7fa36a3c0072] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 7: (()+0x7e9a) [0x7fa369e2ae9a] 2013-02-10T06:29:43.315 INFO:teuthology.task.rados.rados.0.err: 8: (clone()+0x6d) [0x7fa3696424bd] 2013-02-10T06:29:43.316 INFO:teuthology.task.rados.rados.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-02-10T06:29:43.316 INFO:teuthology.task.rados.rados.0.err:terminate called after throwing an instance of 'ceph::FailedAssertion' 2013-02-10T06:29:43.401 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2780650> Traceback (most recent call last): File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 45, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/teuthology-master/teuthology/task/rados.py", line 99, in task run.wait(tests.itervalues()) File "/var/lib/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 273, in wait proc.exitstatus.get() File "/var/lib/teuthworker/teuthology-master/virtualenv/local/lib/python2.7/site-packages/gevent/event.py", line 223, in get raise self._exception CommandCrashedError: Command crashed: 'CEPH_CLIENT_ID=0 CEPH_CONF=/tmp/cephtest/ceph.conf LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin /ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/testrados --op read 100 --op write 100 --op delete 50 --op snap_create 50 --op snap_remove 50 --op rollback 50 --op setattr 0 --op rmattr 0 --op watch 0 --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0' 2013-02-10T06:29:43.432 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x2773e10> ubuntu@teuthology:/a/teuthology-2013-02-10_01:00:02-regression-master-testing-gcov/4134$ cat config.yaml kernel: &id001 kdb: true sha1: 012d5bda1c0f229494c67098d00edfa24c531ea5 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 osd: osd op thread timeout: 60 coverage: true fs: btrfs log-whitelist: - slow request sha1: abc80ffc5b1aab3915c049701ab85c57fe93d550 s3tests: branch: master workunit: sha1: abc80ffc5b1aab3915c049701ab85c57fe93d550 roles: - - mon.a - mon.b - osd.0 - osd.1 - osd.2 - - mon.c - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana63.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDy2BKPe+fe5jK0ziU8aKM0DzODSTaKWecQRwLjnZLbjDvTyHm8x8xX/JCts3bFfrc2ozFz7ILBIWU96JRZiFF2TtFZjtf1H19kyvR8PWCxiZ/lld+C7B6U8iiPSNiSlgo7mwkpk1JoSpHe4rK/Z7WQRWBMsCC7XJETu6rRX3i0ZYaKh8BoWWhpsBs1quSNxRXNUqJ6OKnDbB5Vuan1TK9b49RXmibx+oapXm8V0sHEVLYa+NTUs+wAEHAnjFgRe75Cik/rmgeE0m2Cff1rp9tFhEEDwZ5PUdnscOTY78BxImMRdkbZ8lJXOGcOOsD3Dj1jOr4pVrgxZqUdtWfJGkj ubuntu@plana68.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCjq/dBds5eofgD5l4r9AcpVnq4tnZrPYUlDPcgRgzUSM36rUAOA9IPzjamtlb+kdmo5TkWve/QgqDt/6SI07L0EeULiRhPnYjsDsDvRINoL/7oUd6dqhsC+5OMrnuscOIQUVM40TR/J4F2pwUWSM5YKsc6VBYkmh+2JTvI9ZiwXsA0KY0a6qa5ScDDZtfGapo166cm/BkqmGPPudYqCrXbCqKLaXjSizU3e02RlFLwF4xw0wKCKQ5T/mh8mqUOEWnZZTVNx0Xrkit0q247KKc4sNZYX4sQVRx8MMedOr33VEhmyuOY8TgHLFp/aB0HzbatYYJb0HATxbjwJWFjGBIX ubuntu@plana74.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbl3n0OItF2T7uE/9SNA9YV+pq3nFmmr3EBjf2aMo7Bkyy6e9MC3kaK64vzlZxppbpxEhbrgWo7GsVHxphZqwtMt5Ek3RDbLUpQ8dPckbhOZWBhoZe9S8uCynDIZaF3Yq/b1//OBufSdM0TRYK4Vf4rtQnpyWPEn48HaairNhQvKCxY6XUImZc7Lh4CIaMkT0yyZkBlksJeAkTs4UgxhkSoSc9IAlAxwwPK5JpRL7GOowT+hsqYCVJM5+NVqIOGS66Vam/m9xHqecQtfYIEFoVFZiF/xi+b2kstxJzD39UTNP/XB5J2CtJOujW9J1jsdPKSz1L9XoDIkAFHzQjdhyx tasks: - internal.lock_machines: - 3 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 ubuntu@teuthology:/a/teuthology-2013-02-10_01:00:02-regression-master-testing-gcov/4134$ cat summary.yaml ceph-sha1: abc80ffc5b1aab3915c049701ab85c57fe93d550 client.0-kernel-sha1: 012d5bda1c0f229494c67098d00edfa24c531ea5 description: collection:rados-thrash clusters:6-osd-3-machine.yaml fs:btrfs.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:snaps-many-objects.yaml duration: 1533.4427070617676 failure_reason: 'Command crashed: ''CEPH_CLIENT_ID=0 CEPH_CONF=/tmp/cephtest/ceph.conf LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/testrados --op read 100 --op write 100 --op delete 50 --op snap_create 50 --op snap_remove 50 --op rollback 50 --op setattr 0 --op rmattr 0 --op watch 0 --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0''' flavor: gcov mon.a-kernel-sha1: 012d5bda1c0f229494c67098d00edfa24c531ea5 mon.c-kernel-sha1: 012d5bda1c0f229494c67098d00edfa24c531ea5 owner: scheduled_teuthology@teuthology success: false
Updated by Tamilarasi muthamizhan about 11 years ago
- Priority changed from High to Urgent
Updated by Sage Weil about 11 years ago
woot, reproduced with logs: ubuntu@teuthology:/a/sage-s1/1886
Updated by Sage Weil about 11 years ago
a sequence of messages, all tagged with epochs greater than ours, put on waiting_for_map
- op1 for pg 0.f
- notify for 0.f (that will trigger it's creation)
- op2 for pg 0.f
- op3 for pg 0.f
- op4 for pg 0.f
we get our map, and hten
- op1 is put on waiting_for_pg
- notify creates 0.f, puts op1 at end of finished queue
- op2 is handled
- op3 is handled
- op4 is handled
- op1 is handled
Updated by Sage Weil about 11 years ago
- Status changed from 7 to Resolved
56c5a07708d52de1699585c9560cff8b4e993d0a
I've been hammering on this test for a while and it's turning up some other issues but this one seems fixed.