Project

General

Profile

Bug #19518

log entry does not include per-op rvals?

Added by Sage Weil over 1 year ago. Updated over 1 year ago.

Status:
Verified
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
Start date:
04/06/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:

Description

2017-04-05T21:21:51.605 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations: [ RUN      ] CReadOpsTest.Omap
2017-04-05T21:21:51.605 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations: /build/ceph-12.0.0-2308-geaf042e/src/test/librados/c_read_operations.cc:541: Failure
2017-04-05T21:21:51.606 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations:       Expected: -125
2017-04-05T21:21:51.606 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations: To be equal to: r_vals
2017-04-05T21:21:51.606 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations:       Which is: 0
2017-04-05T21:21:51.606 INFO:tasks.workunit.client.0.smithi187.stdout:    api_c_read_operations: [  FAILED  ] CReadOpsTest.Omap (11 ms)

/a/sage-2017-04-05_18:31:27-rados-wip-sage-testing---basic-smithi/990710

if there is a retry, we don't fill in the per-op rvals.. just the final request rval, as indicated by the assert in the test.

History

#1 Updated by Sage Weil over 1 year ago

teuthology:990710  01:52 PM $ grep -a client.4150.0:263 remote/*/log/*osd.0*
2017-04-05 21:21:50.796704 7f4fc582a700  1 -- 172.21.15.187:6801/193008 <== client.4150 172.21.15.187:0/322241405 83 ==== osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+write+known_if_redirected e21) v8 ==== 264+0+42 (3174965691 0 495651469) 0x560d843b9a80 con 0x560d844a8fc0
2017-04-05 21:21:50.796712 7f4fc582a700 15 osd.0 21 enqueue_op 0x560d84722d00 prio 63 cost 42 latency 0.000021 epoch 21 osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+write+known_if_redirected e21) v8
2017-04-05 21:21:50.796730 7f4fd8b7d700 10 osd.0 21 dequeue_op 0x560d84722d00 prio 63 cost 42 latency 0.000039 osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+write+known_if_redirected e21) v8 pg pg[13.3( v 21'6 (0'0,21'6] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'5 mlcod 21'5 active+clean]
2017-04-05 21:21:50.796753 7f4fd8b7d700 20 osd.0 pg_epoch: 21 pg[13.3( v 21'6 (0'0,21'6] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'5 mlcod 21'5 active+clean] do_op: op osd_op(client.4150.0:263 13.3 13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head [omap-cmp,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected e21) v8
2017-04-05 21:21:50.796769 7f4fd8b7d700 10 osd.0 pg_epoch: 21 pg[13.3( v 21'6 (0'0,21'6] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'5 mlcod 21'5 active+clean] do_op osd_op(client.4150.0:263 13.3 13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head [omap-cmp,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected e21) v8 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected
2017-04-05 21:21:50.796883 7f4fd8b7d700 10 osd.0 pg_epoch: 21 pg[13.3( v 21'6 (0'0,21'6] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'5 mlcod 21'5 active+clean] submit_log_entries 21'7 (0'0) error    13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head by client.4150.0:263 0.000000 -125
2017-04-05 21:21:50.796896 7f4fd8b7d700 10 osd.0 pg_epoch: 21 pg[13.3( v 21'6 (0'0,21'6] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'5 mlcod 21'5 active+clean] merge_new_log_entries 21'7 (0'0) error    13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head by client.4150.0:263 0.000000 -125
2017-04-05 21:21:50.796902 7f4fd8b7d700 20 update missing, append 21'7 (0'0) error    13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head by client.4150.0:263 0.000000 -125
2017-04-05 21:21:50.796927 7f4fd8b7d700  1 -- 172.21.15.187:6802/193008 --> 172.21.15.187:6810/193010 -- pg_update_log_missing(13.3 epoch 21 rep_tid 90 entries 21'7 (0'0) error    13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head by client.4150.0:263 0.000000 -125) v1 -- ?+0 0x560d848bf180 con 0x560d841c1200
2017-04-05 21:21:50.797318 7f4fd6378700 10 osd.0 pg_epoch: 21 pg[13.3( v 21'7 (0'0,21'7] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'6 mlcod 21'5 active+clean]  sending commit on osd_op(client.4150.0:263 13.3 13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head [omap-cmp,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected e21) v8 0x560d84b723c0
2017-04-05 21:21:50.797791 7f4fc6a3c700  1 -- 172.21.15.187:6801/193008 <== client.4150 172.21.15.187:0/322241405 1 ==== osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+retry+write+known_if_redirected e21) v8 ==== 264+0+42 (3311517138 0 495651469) 0x560d8484e340 con 0x560d8450bf80
2017-04-05 21:21:50.797801 7f4fc6a3c700 15 osd.0 21 enqueue_op 0x560d8403d780 prio 63 cost 42 latency 0.000025 epoch 21 osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+retry+write+known_if_redirected e21) v8
2017-04-05 21:21:50.797822 7f4fd8b7d700 10 osd.0 21 dequeue_op 0x560d8403d780 prio 63 cost 42 latency 0.000046 osd_op(client.4150.0:263 13.3 13.4a5ae043 (undecoded) ondisk+retry+write+known_if_redirected e21) v8 pg pg[13.3( v 21'7 (0'0,21'7] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'6 mlcod 21'5 active+clean]
2017-04-05 21:21:50.797844 7f4fd8b7d700 20 osd.0 pg_epoch: 21 pg[13.3( v 21'7 (0'0,21'7] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'6 mlcod 21'5 active+clean] do_op: op osd_op(client.4150.0:263 13.3 13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head [omap-cmp,omap-rm-keys] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected e21) v8
2017-04-05 21:21:50.797860 7f4fd8b7d700 10 osd.0 pg_epoch: 21 pg[13.3( v 21'7 (0'0,21'7] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'6 mlcod 21'5 active+clean] do_op osd_op(client.4150.0:263 13.3 13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head [omap-cmp,omap-rm-keys] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected e21) v8 may_write may_read -> write-ordered flags ondisk+retry+write+known_if_redirected
2017-04-05 21:21:50.797868 7f4fd8b7d700  3 osd.0 pg_epoch: 21 pg[13.3( v 21'7 (0'0,21'7] local-les=19 n=1 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 crt=21'6 lcod 21'6 mlcod 21'5 active+clean] do_op dup client.4150.0:263 version 21'7
2017-04-05 21:21:50.803060 7f4fd6378700 20 osd.0 pg_epoch: 21 pg[13.3( v 21'8 (0'0,21'8] local-les=19 n=0 ec=18 les/c/f 19/19/0 18/18/18) [0,2] r=0 lpr=18 luod=21'7 lua=21'7 crt=21'8 lcod 21'6 mlcod 21'5 active+clean] rollforward: entry=21'7 (0'0) error    13:c2075a52:test-rados-api-smithi187-195332-13::testobj:head by client.4150.0:263 0.000000 -125

it was a resent client request.. that's why the per-op rval wasn't filled in.

#2 Updated by Sage Weil over 1 year ago

/a/sage-2017-06-05_18:36:01-rados-wip-sage-testing2-distro-basic-smithi/1261843

2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations: [ RUN      ] CReadOpsTest.Omap
2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-2249-gd2d44a7/rpm/el7/BUILD/ceph-12.0.2-2249-gd2d44a7/src/test/librados/c_read_operations.cc:609: Failure
2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations:       Expected: -125
2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations: To be equal to: r_vals
2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations:       Which is: 0
2017-06-05T20:00:46.243 INFO:tasks.workunit.client.0.smithi038.stdout:    api_c_read_operations: [  FAILED  ] CReadOpsTest.Omap (14232 ms)

#3 Updated by Greg Farnum over 1 year ago

  • Project changed from Ceph to RADOS
  • Category set to Correctness/Safety
  • Priority changed from Urgent to High
  • Component(RADOS) OSD added

Have we ever filled in the per-op rvalues on retry? That sounds distressingly like returning read data on a write op...which makes me wonder why we thought it was a good idea to begin with. :/

#4 Updated by Sage Weil over 1 year ago

https://github.com/ceph/ceph/pull/16196 disables the assertion until we fix this bug.

Also available in: Atom PDF