Actions
Bug #14616
closedWhen logging to a file fails, ceph logs excessively to stderr
% Done:
0%
Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
The exception raised by ceph_manager
should have caused the job to terminate (filed as #14615).
Instead, the job continues to log 72GB after this:
2016-01-30T16:24:55.708 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config set filestore_inject_stall 3' 2016-01-30T16:24:55.795 INFO:tasks.rados.rados.0.smithi020.stdout:1298: done (15 left) 2016-01-30T16:24:55.795 INFO:tasks.rados.rados.0.smithi020.stdout:1299: copy_from oid 402 from oid 281 current snap is 91 2016-01-30T16:24:55.795 INFO:tasks.rados.rados.0.smithi020.stdout:1297: finishing rollback tid 0 to smithi02021960-329 2016-01-30T16:24:55.795 INFO:tasks.rados.rados.0.smithi020.stdout:1297: finishing rollback tid 1 to smithi02021960-329 2016-01-30T16:24:55.796 INFO:tasks.rados.rados.0.smithi020.stdout:1297: finishing rollback tid 2 to smithi02021960-329 2016-01-30T16:24:55.796 INFO:tasks.rados.rados.0.smithi020.stdout:update_object_version oid 329 v 1210 (ObjNum 651 snap 44 seq_num 651) dirty exists 2016-01-30T16:24:55.797 INFO:tasks.rados.rados.0.smithi020.stdout:1297: done (15 left) 2016-01-30T16:24:55.797 INFO:tasks.rados.rados.0.smithi020.stdout:1300: delete oid 81 current snap is 91 2016-01-30T16:24:55.816 INFO:tasks.rados.rados.0.smithi020.stdout:1299: finishing copy_from to smithi02021960-402 2016-01-30T16:24:55.816 INFO:tasks.rados.rados.0.smithi020.stdout:update_object_version oid 402 v 1084 (ObjNum 420 snap 0 seq_num 420) dirty exists 2016-01-30T16:24:55.845 INFO:tasks.rados.rados.0.smithi020.stdout:1299: finishing copy_from racing read to smithi02021960-402 2016-01-30T16:24:55.860 INFO:teuthology.orchestra.run.smithi020.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2016-01-30T16:24:55.867 INFO:tasks.ceph.ceph_manager:waiting on admin_socket for osd-1, ['config', 'set', 'filestore_inject_stall', '3'] 2016-01-30T16:25:00.868 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config set filestore_inject_stall 3' 2016-01-30T16:25:01.033 INFO:teuthology.orchestra.run.smithi020.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2016-01-30T16:25:01.043 INFO:tasks.ceph.ceph_manager:waiting on admin_socket for osd-1, ['config', 'set', 'filestore_inject_stall', '3'] 2016-01-30T16:25:06.045 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config set filestore_inject_stall 3' 2016-01-30T16:25:06.204 INFO:teuthology.orchestra.run.smithi020.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2016-01-30T16:25:06.212 INFO:tasks.ceph.ceph_manager:waiting on admin_socket for osd-1, ['config', 'set', 'filestore_inject_stall', '3'] 2016-01-30T16:25:11.213 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config set filestore_inject_stall 3' 2016-01-30T16:25:11.370 INFO:teuthology.orchestra.run.smithi020.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2016-01-30T16:25:11.378 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 650, in wrapper return func(self) File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 704, in do_thrash self.choose_action()() File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 628, in <lambda> False), File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 501, in inject_pause self.ceph_manager.set_config(the_one, **{conf_key: duration}) File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 1078, in set_config ['config', 'set', str(k), str(v)]) File "/var/lib/teuthworker/src/ceph-qa-suite_wip-async/tasks/ceph_manager.py", line 1053, in wait_run_admin_socket id=service_id)) Exception: timed out waiting for admin_socket to appear after osd.1 restart 2016-01-30T16:25:11.379 CRITICAL:root: File "gevent/core.pyx", line 348, in gevent.core.loop.handle_error (gevent/gevent.core.c:6053) File "/home/teuthworker/src/teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 300, in handle_error self.print_exception(context, type, value, tb) File "/home/teuthworker/src/teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 323, in print_exception traceback.print_exception(type, value, tb) File "/usr/lib/python2.7/traceback.py", line 124, in print_exception _print(file, 'Traceback (most recent call last):') File "/usr/lib/python2.7/traceback.py", line 13, in _print file.write(str+terminator) 2016-01-30T16:25:11.379 CRITICAL:root:IOError 2016-01-30T18:24:33.394 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 02:24:33.380087 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 5774..6486) lease_expire from mon.0 172.21.15.20:6789/0 is 0.891053 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T18:48:07.511 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 02:48:07.490680 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 6778..7459) lease_expire from mon.0 172.21.15.20:6789/0 is 5.119297 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T18:49:14.766 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 02:49:14.763190 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 6778..7503) lease_expire from mon.0 172.21.15.20:6789/0 is 0.812451 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:09:01.670 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:09:01.610067 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 7782..8354) lease_expire from mon.0 172.21.15.20:6789/0 is 3.175789 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:13:36.036 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:13:36.033045 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 8033..8554) lease_expire from mon.0 172.21.15.20:6789/0 is 2.549258 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:37:38.148 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:37:38.134380 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 8786..9493) lease_expire from mon.0 172.21.15.20:6789/0 is 0.822134 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:38:11.403 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:38:11.400640 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 8786..9518) lease_expire from mon.0 172.21.15.20:6789/0 is 2.541366 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:38:45.790 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:38:45.786853 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 9037..9543) lease_expire from mon.0 172.21.15.20:6789/0 is 4.030940 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:47:55.001 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:47:54.989453 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 9288..9914) lease_expire from mon.0 172.21.15.20:6789/0 is 2.506908 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:54:07.022 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:54:07.020086 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 9539..10158) lease_expire from mon.0 172.21.15.20:6789/0 is 2.266840 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T19:57:22.848 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 03:57:22.845987 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 9539..10276) lease_expire from mon.0 172.21.15.20:6789/0 is 2.696128 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T20:37:49.765 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 04:37:49.750290 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 11296..11840) lease_expire from mon.0 172.21.15.20:6789/0 is 0.806196 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T20:46:16.034 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 04:46:16.032108 7f69cef42700 -1 mon.c@2(peon).data_health(104) reached critical levels of available space on local monitor storage -- shutdown! 2016-01-30T20:46:16.059 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 04:46:16.057651 7f69cff44700 -1 mon.c@2(peon).paxos(paxos updating c 11547..12172) lease_expire from mon.0 172.21.15.20:6789/0 is 0.175679 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-01-30T20:46:16.072 INFO:tasks.ceph.mon.a.smithi020.stderr:2016-01-31 04:46:16.070062 7f20811a9700 -1 mon.a@0(leader).data_health(104) reached critical levels of available space on local monitor storage -- shutdown! 2016-01-30T20:46:16.084 INFO:tasks.ceph.mon.a.smithi020.stderr:2016-01-31 04:46:16.082600 7f20809a8700 -1 mon.a@0(leader) e1 *** Got Signal Interrupt *** 2016-01-30T20:46:16.085 INFO:tasks.ceph.mon.c.smithi020.stderr:2016-01-31 04:46:16.082632 7f69ce741700 -1 mon.c@2(peon) e1 *** Got Signal Interrupt *** 2016-01-30T20:58:44.493 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.493 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.493 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.493 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.494 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device 2016-01-30T20:58:44.495 INFO:tasks.rados.rados.0.smithi020.stderr:problem writing to /var/log/ceph/ceph-client.0.21960.log: (28) No space left on device
I'm filing this against ceph because this seems like an excessive amount of logging.
Updated by Sage Weil about 8 years ago
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
Updated by Sage Weil about 8 years ago
- Status changed from New to Fix Under Review
Updated by Sage Weil about 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 8 years ago
- Copied to Backport #15081: hammer: When logging to a file fails, ceph logs excessively to stderr added
Updated by Nathan Cutler almost 8 years ago
- Status changed from Pending Backport to Resolved
Actions