Project

General

Profile

Bug #19113

slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)

Added by Kefu Chai about 7 years ago. Updated about 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

it takes more than 5 minutes to boot an OSD with valgrind.

2017-02-28T23:51:34.533 INFO:teuthology.orchestra.run.smithi020:Running: 'cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --tool=memcheck ceph-osd -f --cluster ceph -i 0'
...
2017-02-28T23:51:35.506 INFO:tasks.ceph:Waiting until ceph cluster ceph is healthy...
...
2017-02-28T23:51:35.857 DEBUG:teuthology.misc:0 of 6 OSDs are up
...
2017-02-28T23:56:50.397 DEBUG:teuthology.misc:0 of 6 OSDs are up
...
017-02-28T23:57:59.268 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-sage-testing/qa/tasks/ceph.py", line 1600, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-sage-testing/qa/tasks/ceph.py", line 1194, in healthy
    ceph_cluster=cluster_name,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/misc.py", line 913, in wait_until_osds_up
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 132, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds

in ceph-osd.0.log.gz

2017-02-28 23:51:54.430336 9552ac0  0 ceph version 12.0.0-932-g8d6fb1d (8d6fb1d065c5d096a6ef0d39d2ab3bcbeeefc394), process memcheck-amd64-, pid 24836
...
2017-02-28 23:51:55.839684 9552ac0 10 bluestore(/var/lib/ceph/osd/ceph-0) _set_csum csum_type crc32c
2017-02-28 23:52:12.092176 9552ac0 10 bluestore(/var/lib/ceph/osd/ceph-0) _set_compression mode none alg snappy
2017-02-28 23:53:00.882970 9552ac0  0 load: jerasure load: lrc load: isa
...
2017-02-28 23:55:47.773249 9552ac0 10 _load_class hello from /usr/lib64/rados-classes/libcls_hello.so
2017-02-28 23:56:04.360652 9552ac0  0 <cls> /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-932-g8d6fb1d/rpm/el7/BUILD/ceph-12.0.0-932-g8d6fb1d/src/cls/hello/cls_hello.cc:296: loading cls_hello
...
2017-02-28 23:56:04.362224 9552ac0 10 _load_class rgw from /usr/lib64/rados-classes/libcls_rgw.so
2017-02-28 23:56:20.631751 9552ac0 10 register_class rgw status 3
...
2017-02-28 23:56:20.636646 9552ac0 10 _load_class numops from /usr/lib64/rados-classes/libcls_numops.so
2017-02-28 23:56:36.950789 9552ac0 10 register_class numops status 3
..
2017-02-28 23:56:36.951440 9552ac0 10 _load_class replica_log from /usr/lib64/rados-classes/libcls_replica_log.so
2017-02-28 23:56:53.340437 9552ac0 10 register_class replica_log status 3
...
2017-02-28 23:56:56.872349 1f9e6700  1 -- 172.21.15.20:6804/24836 --> 172.21.15.20:6789/0 -- osd_boot(osd.0 booted 0 features 1152323339925389307 v5) v6 -- ?+0 0x18130cc0 con 0x17ff5290
...
2017-02-28 23:56:57.708835 1f9e6700  1 osd.0 6 state: booting -> active

see /a/sage-2017-02-28_23:37:48-rados-wip-sage-testing---basic-smithi/869430


Related issues

Duplicates Ceph - Bug #19099: valgrind runs time out in wait_for_up_osds after 300s; osd logs show *incredibly* slow startup Resolved

History

#1 Updated by Kefu Chai about 7 years ago

2017-02-28 23:51:54.360691 9552ac0  0 ceph version 12.0.0-932-g8d6fb1d (8d6fb1d065c5d096a6ef0d39d2ab3bcbeeefc394), process memcheck-amd64-, pid 24834
...
2017-02-28 23:53:02.705395 9552ac0  2 osd.1 0 boot
...
2017-02-28 23:56:55.621743 1f9e6700  1 -- 172.21.15.20:6808/24834 --> 172.21.15.20:6789/0 -- osd_boot(osd.1 booted 0 features 1152323339925389307 v4) v6 -- ?+0 0x181126a0 con 0x17dd0160
...
2017-02-28 23:56:56.191008 1f9e6700  1 osd.1 5 state: booting -> active
2017-02-28 23:51:54.376424 9552ac0  0 ceph version 12.0.0-932-g8d6fb1d (8d6fb1d065c5d096a6ef0d39d2ab3bcbeeefc394), process memcheck-amd64-, pid 24837
...
2017-02-28 23:56:57.601434 1f9e6700  1 osd.2 6 state: booting -> active

OSDs took more than 5 minutes to boot and active.

#2 Updated by Kefu Chai about 7 years ago

  • Description updated (diff)

#3 Updated by Nathan Cutler about 7 years ago

  • Duplicates Bug #19099: valgrind runs time out in wait_for_up_osds after 300s; osd logs show *incredibly* slow startup added

#4 Updated by Nathan Cutler about 7 years ago

@Kefu: see #19099

#5 Updated by Kefu Chai about 7 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF