Actions
Bug #40820
closedstandalone/scrub/osd-scrub-test.sh +3 day failed assert
Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2019-07-18T04:53:41.527 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs:CEPH_ARGS= 2019-07-18T04:53:41.527 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs:ceph --admin-daemon /tmp/ceph-asok.298619/ceph-osd.1.asok dump_scrubs 2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs: DS='[ 2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr: { 2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr: "pgid": "1.0", 2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr: "sched_time": "2019-07-20T04:53:29.399279+0000", 2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr: "deadline": "2019-08-01T04:53:29.399279+0000", 2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr: "forced": false 2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr: } 2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:]' 2019-07-18T04:53:41.609 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:echo '[' '{' '"pgid":' '"1.0",' '"sched_time":' '"2019-07-20T04:53:29.399279+0000",' '"deadline":' '"2019-08-01T04:53:29.399279+0000",' '"forced":' false '}' ']' 2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:jq '.[0].sched_time' 2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs: eval 'SCHED_TIME="2019-07-20T04:53:29.399279+0000"' 2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:SCHED_TIME=2019-07-20T04:53:29.399279+0000 2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:echo 2019-07-20T04:53:29.399279+0000 2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:sed 's/\([0-9]*-[0-9]*-[0-9]*\).*/\1/' 2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:date +%Y-%m-%d -d 'now + 3 days' 2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs: test 2019-07-20 = 2019-07-21 2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs: return 1
/a/sage-2019-07-18_02:42:34-rados-wip-sage-testing-2019-07-17-1729-distro-basic-smithi/4125985
Updated by David Zafman almost 5 years ago
- Related to Bug #40772: mon: pg size change delayed 1 minute because osdmap 35 delay added
Updated by David Zafman almost 5 years ago
This test gives the mon 2 seconds to propagate changes. A scrub_min_interval change to a pool probably didn't reach the osd.1 in time. The test stopped so quickly and mon killed we don't know how long this would have taken. However, looking at #40772 we see that something took a full 60 seconds to propagate. This may be a dup.
2019-07-18T04:53:39.692+0000 7fe04fa0b700 7 mon.a@0(leader).log v44 update_from_paxos applying incremental log 44 2019-07-18T04:53:39.468740+0000 mon.a (mon.0) 76 : audit [INF] from='client.? 127.0.0.1:0/1466130859' entity='client.admin' cmd='[{"var": "scrub_min_interval", "prefix": "osd pool set", "pool": "test", "val": "259200"}]': finished 2019-07-18T04:53:39.693+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh 2019-07-18T04:53:39.959+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh 2019-07-18T04:53:40.035+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh
Updated by David Zafman almost 5 years ago
The prior osdmap was issuing these messages for 7 seconds.
2019-07-18T04:53:30.688+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:30.813+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:30.963+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:31.339+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:31.489+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:31.955+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:32.031+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:32.589+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:32.839+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:33.139+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:33.673+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:33.948+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:34.023+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:34.767+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:35.967+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:36.675+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:37.392+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh 2019-07-18T04:53:37.959+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
Updated by Josh Durgin almost 5 years ago
Can the test be improved so it doesn't assume maps propagate in a certain short time period, but waits for the relevant change to show up? e.g. querying the admin socket to check on the osd?
Updated by Neha Ojha over 3 years ago
- Status changed from New to Closed
Haven't seen this in a while.
Actions