Project

General

Profile

Bug #43914

nautilus: ceph tell command times out

Added by Sage Weil about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-01-30T04:08:15.082 INFO:teuthology.orchestra.run.smithi067:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.6 flush_pg_stats
2020-01-30T04:10:15.119 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-01-30T04:10:15.119 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 159, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/thrashosds.py", line 216, in task
    cluster_manager.flush_all_pg_stats()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1491, in flush_all_pg_stats
    self.flush_pg_stats(range(len(self.get_osd_dump())))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1467, in flush_pg_stats
    for osd in osds}
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1467, in <dictcomp>
    for osd in osds}
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1353, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi067 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.6 flush_pg_stats'

/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718262

notably, these commands normally log profusely to stderr, but this one did not, nor did it generate a log file.

History

#1 Updated by Sage Weil about 4 years ago

  • Status changed from New to Need More Info

#2 Updated by Neha Ojha about 4 years ago

This is on nautilus: /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4788575/

CommandFailedError: Command failed on smithi110 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.4 flush_pg_stats'

#3 Updated by Neha Ojha about 4 years ago

First observation from teuthology.log for /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4788575: we see "ceph tell osd.4 flush_pg_stats" being called thrice at different stages and note that the third one fails.

2020-02-21T21:10:06.676 INFO:teuthology.orchestra.run.smithi110:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.4 flush_pg_stats 
2020-02-21T21:13:02.778 INFO:teuthology.orchestra.run.smithi110:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.4 flush_pg_stats 
2020-02-21T23:17:49.733 INFO:teuthology.orchestra.run.smithi110:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.4 flush_pg_stats

From ceph-osd.4's log, only the first two have been received:

2020-02-21 21:10:06.752 7f77078f7700  1 -- 172.21.15.58:6804/12942 <== client.4234 172.21.15.110:0/4111042505 2 ==== command(tid 2: {"prefix": "flush_pg_stats"}) v1 ==== 52+0+0 (878328554 0 0) 0x3b1ac00 con 0x3da4300
2020-02-21 21:10:06.752 7f77078f7700 20 osd.4 20 OSD::ms_dispatch: command(tid 2: {"prefix": "flush_pg_stats"}) v1

2020-02-21 21:13:02.857 7fc329a43700  1 -- [v2:172.21.15.58:6800/15244,v1:172.21.15.58:6801/15244] <== client.24271 v1:172.21.15.110:0/2263514793 2 ==== command(tid 2: {"prefix": "flush_pg_stats"}) v1 ==== 52+0+0 (unknown 878328554 0 0) 0xd6f72a0 con 0xdafc880
2020-02-21 21:13:02.857 7fc329a43700 20 osd.4 28 OSD::ms_dispatch: command(tid 2: {"prefix": "flush_pg_stats"}) v1

#4 Updated by Neha Ojha about 4 years ago

This run has more relavant information: /a/nojha-2020-02-26_03:20:34-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4802539

Even here the third flush_pg_stats timed out

From teuthology

2020-02-26T03:56:16.776 INFO:teuthology.orchestra.run.smithi151:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.6 flush_pg_stats
2020-02-26T03:58:33.442 INFO:teuthology.orchestra.run.smithi151:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.6 flush_pg_stats
2020-02-26T05:33:06.811 INFO:teuthology.orchestra.run.smithi151:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.6 flush_pg_stats

From osd.6's log

2020-02-26 03:56:16.903 7f880b8c8700  1 -- 172.21.15.83:6812/10841 <== client.4261 172.21.15.151:0/3358669268 2 ==== command(tid 2: {"prefix": "flush_pg_stats"}) v1 ==== 52+0+0 (3076564282 0 0) 0x55a5bee6bc00 con 0x55a5beecf100
2020-02-26 03:58:33.566 7f3d48187700  1 -- [v2:172.21.15.83:6816/13319,v1:172.21.15.83:6817/13319] <== client.34221 v1:172.21.15.151:0/810849350 2 ==== command(tid 2: {"prefix": "flush_pg_stats"}) v1 ==== 52+0+0 (unknown 3076564282 0 0) 0x55add43b8760 con 0x55adf1979680
2020-02-26 05:34:24.472 7f9ae52d7700  1 -- [v2:172.21.15.83:6816/59862,v1:172.21.15.83:6817/59862] <== client.58053 v1:172.21.15.151:0/3801625840 17 ==== command(tid 2: {"prefix": "flush_pg_stats"}) v1 ==== 52+0+0 (unknown 3076564282 0 0) 0x5612f814c780 con 0x5612e91cdf80

Note the timestamps of when the command was sent vs when it was processed by the osd. For the third command it took much longer for the osd to see it, though it did respond once it saw it.

2020-02-26 05:35:10.691 7f9acfc53700 20 osd.6 3705 do_command tid 2 [{"prefix": "flush_pg_stats"}]
2020-02-26 05:35:10.691 7f9acfc53700  1 -- [v2:172.21.15.83:6816/59862,v1:172.21.15.83:6817/59862] --> [v2:172.21.15.151:6832/14617,v1:172.21.15.151:6833/14617] -- pg_stats(62 pgs tid 0 v 0) v2 -- 0x5612f0236300 con 0x5612ea043b00

Since the timeout is 120 seconds, the test failed.

#5 Updated by Sage Weil about 4 years ago

The thread (or lock?) is busy with

2020-02-26 05:34:24.432 7f9ae0474700 20 osd.6 3677 OSD::ms_dispatch: osd_map(3200..3213 src has 1..3213) v4
2020-02-26 05:34:24.436 7f9ae0474700 10 osd.6 3705 do_waiters -- start
2020-02-26 05:34:24.436 7f9ae0474700 10 osd.6 3705 handle_pg_create osd_pg_create(e3678 36.2:2247 36.7:2247 36.b:2247 36.d:2247 36.13:2247 36.1d:2247 36.1e:2247 36.1f:2247 36.20:2247 36.28:2247 36.31:2247 36.3e:2247 37.6:2249 37.c:2249 37.1b:2249 37.1c:2249 37.35:2249 37.37:2249 37.3c:2249 38.2:2251 38.c:2251 38.f:
2020-02-26 05:34:24.436 7f9ae0474700 15 osd.6 3705 require_same_or_newer_map 3678 (i am 3705) 0x5612f45a7600
2020-02-26 05:34:24.436 7f9ae0474700 20 osd.6 3705 mkpg 36.2 e2247@2020-02-26 04:58:03.765875
2020-02-26 05:34:24.436 7f9ae0474700 10 osd.6 3705 build_initial_pg_history 36.2 created 2247
2020-02-26 05:34:24.436 7f9ae0474700 20 osd.6 3705 get_map 2247 - loading and decoding 0x5612ea0a8800
2020-02-26 05:34:24.436 7f9ae0474700 10 osd.6 3705 add_map_bl 2247 5240 bytes
2020-02-26 05:34:24.436 7f9ae0474700 20 osd.6 3705 get_map 2248 - loading and decoding 0x5612ec07c300
2020-02-26 05:34:24.440 7f9ae0474700 10 osd.6 3705 add_map_bl 2248 5251 bytes
...
2020-02-26 05:35:10.691 7f9ae0474700 10 osd.6 3705 add_map_bl 3702 7769 bytes
2020-02-26 05:35:10.691 7f9ae0474700 20 osd.6 3705 get_map 3703 - loading and decoding 0x5612ea531700
2020-02-26 05:35:10.691 7f9ae0474700 10 osd.6 3705 add_map_bl 3703 7769 bytes
2020-02-26 05:35:10.691 7f9ae0474700 20 osd.6 3705 get_map 3704 - loading and decoding 0x5612ea15d900
2020-02-26 05:35:10.691 7f9ae0474700 10 osd.6 3705 add_map_bl 3704 7769 bytes
2020-02-26 05:35:10.691 7f9ae0474700 20 osd.6 3705 build_initial_pg_history check_new_interval interval(2281-2659 up [6,4](6) acting [6,4](6)) up_thru 2655 up_from 614 last_epoch_clean 0 interval(2281-2659 up [6,4](6) acting [6,4](6) maybe_went_rw) : primary up 614-2655 includes interval
check_new_interval interval(2660-2680 up [4](4) acting [4](4)) up_thru 2671 up_from 1900 last_epoch_clean 0 interval(2660-2680 up [4](4) acting [4](4) maybe_went_rw) : primary up 1900-2671 includes interval
check_new_interval interval(2681-3100 up [6,4](6) acting [6,4](6)) up_thru 3025 up_from 2681 last_epoch_clean 0 interval(2681-3100 up [6,4](6) acting [6,4](6) maybe_went_rw) : primary up 2681-3025 includes interval
check_new_interval interval(3101-3118 up [8,4](8) acting [8,4](8)) up_thru 3101 up_from 1908 last_epoch_clean 0 interval(3101-3118 up [8,4](8) acting [8,4](8) maybe_went_rw) : primary up 1908-3101 includes interval
check_new_interval interval(3119-3129 up [8,9](8) acting [8,9](8)) up_thru 3122 up_from 1908 last_epoch_clean 0 interval(3119-3129 up [8,9](8) acting [8,9](8) maybe_went_rw) : primary up 1908-3122 includes interval
check_new_interval interval(3130-3134 up [6,8](6) acting [6,8](6)) up_thru 3025 up_from 2681 last_epoch_clean 0 interval(3130-3134 up [6,8](6) acting [6,8](6)) : primary up 2681-3025 does not include interval

right after that the do_command thread picks it up and does the work. probably contended on osd_lock or something. but the mkpg for the long period is the real problem

#6 Updated by Sage Weil about 4 years ago

so, this was fixed in nautilus, in the sense that https://github.com/ceph/ceph/pull/27696 went into nautilus.

#7 Updated by Sage Weil about 4 years ago

okay yeah, it's because the command wq uses osd_lock

    void _process(Command *c, ThreadPool::TPHandle &) override {
      osd->osd_lock.lock();
      if (osd->is_stopping()) {
    osd->osd_lock.unlock();
    delete c;
    return;
      }
      osd->do_command(c->con.get(), c->tid, c->cmd, c->indata);
      osd->osd_lock.unlock();
      delete c;
    }

meanwhile, on the pg_create handling, it's an MOSDPGCreate (not MOSDPGCreate2), because require_osd_release < NAUTILUS. So, AFAICS, this problem goes away

(1) once the upgrade to nautilus completes, because osd_lock os not needed for MOSDPGCreate2 handling,
(2) even more completely with octopus, because the mon is passing along the past_intervals, and
(3) if the cluster is not being thrashed for extended periods while a pool is being created (it's the pool creation epoch of 1000+ epochs in the past that led to this).

#8 Updated by Sage Weil about 4 years ago

  • Status changed from Need More Info to Fix Under Review
  • Pull request ID set to 33560

#9 Updated by Neha Ojha about 4 years ago

  • Subject changed from ceph tell command times out to nautilus: ceph tell command times out

#10 Updated by Nathan Cutler about 4 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF