Project

General

Profile

Actions

Bug #6469

closed

teuthology locker is causing apache to use ~30% CPU

Added by Zack Cerza over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

While testing the new teuthology-report by importing all previous test runs on the teuthology machine, I noticed that an apache process was using a large amount of cpu. It turns out it's the teuthology lock server. I peeked into /var/log/teuthology-locker/error.log, which is currently 285MB and dates back to 10/23/12.

[Thu Oct 03 08:23:33 2013] [error] Exception AttributeError: AttributeError("'_DummyThread' object has no attribute '_Thread__block'",) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

There are 408789 of these messages and counting. I found the first occurrence on 6/27/13. Conveniently, the last meaningful commit to teuthology/locker/api.py was on that date:

https://github.com/ceph/teuthology/commit/c22b941ed982c23c7a9d3b2f263302ebec945553

Then again, /locker/lock is being hit ten times per second by a python/httplib2 process on localhost. Maybe that's the cause of the high load.

Oh... access.log is 16GB and also dates back to 10/23/12.

Actions #1

Updated by Zack Cerza over 10 years ago

  • Severity changed from 3 - minor to 2 - major
Actions #2

Updated by Zack Cerza over 10 years ago

I modified the apache configuration to place logfiles back in the standard directory. That way logrotate will rotate them automatically. I did not delete the existing logs in /var/log/teuthology-locker.

Actions #4

Updated by Zack Cerza over 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Zack Cerza
  • Target version set to v0.71
Actions #5

Updated by Zack Cerza over 10 years ago

  • Status changed from In Progress to Resolved

Since the bug in Python was supposed to be fixed, I upgraded packages on the machine (there were hundreds of updates available). The apache process is now using 5-10% CPU, which seems pretty okay considering the amount of data it's being asked for so many times per second. The error messages have stopped.

Actions

Also available in: Atom PDF