Project

General

Profile

Actions

Bug #21474

closed

bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart"

Added by Kefu Chai over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Tests
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/kchai-2017-09-19_14:50:44-rados-wip-kefu-testing-2017-09-19-1954-distro-basic-mira/1648644

2017-09-19 21:32:12.622065 7f2c502010c0  1 bluestore(/var/lib/ceph/osd/ceph-1) _fsck finish with 0 errors, 0 repaired, 0 remaining in 224.778802 seconds

osd registers the commands after mounting the store. so:

2017-09-19T21:31:01.475 INFO:teuthology.orchestra.run.mira077:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --adm
in-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2017-09-19T21:31:01.590 INFO:teuthology.orchestra.run.mira077.stderr:no valid command found; 10 closest matches:
2017-09-19T21:31:01.591 INFO:teuthology.orchestra.run.mira077.stderr:perf histogram schema
2017-09-19T21:31:01.604 INFO:teuthology.orchestra.run.mira077.stderr:perf histogram dump {<logger>} {<counter>}
2017-09-19T21:31:01.604 INFO:teuthology.orchestra.run.mira077.stderr:log reopen
2017-09-19T21:31:01.604 INFO:teuthology.orchestra.run.mira077.stderr:log flush
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:perf dump {<logger>} {<counter>}
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:objecter_requests
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:git_version
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:get_command_descriptions
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:log dump
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:help
2017-09-19T21:31:01.605 INFO:teuthology.orchestra.run.mira077.stderr:admin_socket: invalid command

hence:
Exception: timed out waiting for admin_socket to appear after osd.1 restart

Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #21543: luminous: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart"ResolvedActions
Copied to RADOS - Backport #22166: luminous: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart"ResolvedKefu ChaiActions
Actions #1

Updated by Kefu Chai over 6 years ago

Sage, if you believe that it's normal for bluestore to take around 4 minutes to complete a deep fsck. i will prolong the timeout for waiting for the asok commands in our test.

Actions #2

Updated by Kefu Chai over 6 years ago

  • Subject changed from fsck takes 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart" to fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart"
Actions #3

Updated by Kefu Chai over 6 years ago

  • Subject changed from fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart" to bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart"
Actions #4

Updated by Sage Weil over 6 years ago

Sigh.. yeah. I can't decide if we should stop doing these fsck's entirely, or reduce the debug level just for fsck, or... The other part of the problem is it's a 'deep' fsck, which means it reads all data too (to verify checksums).

Actions #5

Updated by Kefu Chai over 6 years ago

  • Status changed from New to 12
  • Assignee changed from Sage Weil to Kefu Chai

cool. will update the test.

Actions #6

Updated by Kefu Chai over 6 years ago

  • Category set to Tests
  • Status changed from 12 to Fix Under Review
Actions #7

Updated by Kefu Chai over 6 years ago

2017-09-21 21:48:57.547815 7f90a0a3b1c0  1 bluestore(/var/lib/ceph/osd/ceph-4) _fsck finish with 0 errors, 0 repaired, 0 remaining in 156.998835 seconds

/a/kchai-2017-09-21_06:22:45-rados-wip-kefu-testing-2017-09-21-1013-distro-basic-mira/1654961$ zless remote/mira033/log/ceph-osd.4.log.gz

Actions #8

Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous
Actions #9

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21543: luminous: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart" added
Actions #11

Updated by Kefu Chai over 6 years ago

  • Copied to Backport #22166: luminous: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting for admin_socket to appear after osd.1 restart" added
Actions #12

Updated by Kefu Chai over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF