Project

General

Profile

Actions

Bug #40820

closed

standalone/scrub/osd-scrub-test.sh +3 day failed assert

Added by Sage Weil almost 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-07-18T04:53:41.527 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs:CEPH_ARGS=
2019-07-18T04:53:41.527 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs:ceph --admin-daemon /tmp/ceph-asok.298619/ceph-osd.1.asok dump_scrubs
2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:119: check_dump_scrubs: DS='[
2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr:    {
2019-07-18T04:53:41.604 INFO:tasks.workunit.client.0.smithi017.stderr:        "pgid": "1.0",
2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:        "sched_time": "2019-07-20T04:53:29.399279+0000",
2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:        "deadline": "2019-08-01T04:53:29.399279+0000",
2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:        "forced": false
2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:    }
2019-07-18T04:53:41.605 INFO:tasks.workunit.client.0.smithi017.stderr:]'
2019-07-18T04:53:41.609 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:echo '[' '{' '"pgid":' '"1.0",' '"sched_time":' '"2019-07-20T04:53:29.399279+0000",' '"deadline":' '"2019-08-01T04:53:29.399279+0000",' '"forced":' false '}' ']'
2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:jq '.[0].sched_time'
2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs: eval 'SCHED_TIME="2019-07-20T04:53:29.399279+0000"'
2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:121: check_dump_scrubs:SCHED_TIME=2019-07-20T04:53:29.399279+0000
2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:echo 2019-07-20T04:53:29.399279+0000
2019-07-18T04:53:41.610 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:sed 's/\([0-9]*-[0-9]*-[0-9]*\).*/\1/'
2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs:date +%Y-%m-%d -d 'now + 3 days'
2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs: test 2019-07-20 = 2019-07-21
2019-07-18T04:53:41.612 INFO:tasks.workunit.client.0.smithi017.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-test.sh:122: check_dump_scrubs: return 1

/a/sage-2019-07-18_02:42:34-rados-wip-sage-testing-2019-07-17-1729-distro-basic-smithi/4125985

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #40772: mon: pg size change delayed 1 minute because osdmap 35 delayNewKefu Chai07/12/2019

Actions
Actions #1

Updated by David Zafman almost 5 years ago

  • Related to Bug #40772: mon: pg size change delayed 1 minute because osdmap 35 delay added
Actions #2

Updated by David Zafman almost 5 years ago

This test gives the mon 2 seconds to propagate changes. A scrub_min_interval change to a pool probably didn't reach the osd.1 in time. The test stopped so quickly and mon killed we don't know how long this would have taken. However, looking at #40772 we see that something took a full 60 seconds to propagate. This may be a dup.

2019-07-18T04:53:39.692+0000 7fe04fa0b700  7 mon.a@0(leader).log v44 update_from_paxos applying incremental log 44 2019-07-18T04:53:39.468740+0000 mon.a (mon.0) 76 : audit [INF] from='client.? 127.0.0.1:0/1466130859' entity='client.admin' cmd='[{"var": "scrub_min_interval", "prefix": "osd pool set", "pool": "test", "val": "259200"}]': finished
2019-07-18T04:53:39.693+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh
2019-07-18T04:53:39.959+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh
2019-07-18T04:53:40.035+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..15) post_refresh
Actions #3

Updated by David Zafman almost 5 years ago

The prior osdmap was issuing these messages for 7 seconds.

2019-07-18T04:53:30.688+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:30.813+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:30.963+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:31.339+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:31.489+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:31.955+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:32.031+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:32.589+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:32.839+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:33.139+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:33.673+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:33.948+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:34.023+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:34.767+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:35.967+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:36.675+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:37.392+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
2019-07-18T04:53:37.959+0000 7fe04fa0b700 10 mon.a@0(leader).paxosservice(osdmap 1..14) post_refresh
Actions #4

Updated by Josh Durgin over 4 years ago

Can the test be improved so it doesn't assume maps propagate in a certain short time period, but waits for the relevant change to show up? e.g. querying the admin socket to check on the osd?

Actions #5

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #6

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Closed

Haven't seen this in a while.

Actions

Also available in: Atom PDF