Project

General

Profile

Bug #52101

paddles: race condition in Job.set_or_update()

Added by Ronen Friedman over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Urgent
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Frequent error messages for some of the jobs submitted in a run. For example:

Job scheduled with name rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi and ID 6327602
2021-08-09 06:01:23,207.207 ERROR:teuthology.report:POST to http://paddles.front.sepia.ceph.com:80/runs/rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/jobs/ failed with status 500: <html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html>

2021-08-09 06:01:23,207.207 ERROR:teuthology.report:Could not report results to http://paddles.front.sepia.ceph.com:80
Traceback (most recent call last):
  File "/home/rfriedma/src/teuthology/teuthology/report.py", line 498, in try_push_job_info
    push_job_info(run_name, job_id, job_info)
  File "/home/rfriedma/src/teuthology/teuthology/report.py", line 463, in push_job_info
    reporter.report_job(run_name, job_id, job_info)
  File "/home/rfriedma/src/teuthology/teuthology/report.py", line 320, in report_job
    response.raise_for_status()
  File "/home/rfriedma/src/teuthology/virtualenv/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://paddles.front.sepia.ceph.com:80/runs/rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/jobs/


Related issues

Related to paddles - Bug #49893: "500 Server Error" during runs scheduling Duplicate
Related to Infrastructure - Bug #49864: Passed jobs classified as dead due to "Could not report results to http://paddles.front.sepia.ceph.com/" Resolved

History

#1 Updated by Josh Durgin over 1 year ago

  • Project changed from Ceph to paddles

#2 Updated by Zack Cerza over 1 year ago

  • Related to Bug #49893: "500 Server Error" during runs scheduling added

#3 Updated by Zack Cerza over 1 year ago

  • Subject changed from teuthology: POST to paddles fails with status 500 to paddles: race condition in Job.set_or_update()
  • Priority changed from Normal to Urgent
  • Regression changed from No to Yes
  • Severity changed from 3 - minor to 1 - critical

I found the traceback in the paddles logfile:

2021-08-09 06:01:23,193 INFO  [paddles.controllers.jobs] Creating job: rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602
[2021-08-09 06:01:23 +0000] [29621] [ERROR] Error handling request
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__
    return self.application(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 810, in __call__
    return super(Pecan, self).__call__(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 659, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 559, in invoke_controller
    result = controller(*args, **kwargs)
  File "/home/ubuntu/paddles/paddles/controllers/jobs.py", line 130, in index_post
    self.job = Job(data, self.run)
  File "<string>", line 4, in __init__
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 306, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 303, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 114, in __init__
    self.set_or_update(json_data)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 150, in set_or_update
    self.run.set_status()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 243, in set_status
    results = results or self.get_results()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 215, in get_results
    jobs_status = [value[0] for value in self.jobs.values(Job.status)]
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 1024, in values
    return iter(q)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2515, in __iter__
    self.session._autoflush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1292, in _autoflush
    util.raise_from_cause(e)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1282, in _autoflush
    self.flush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
    self._flush(objects)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
    transaction.rollback(_capture_exception=True)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
    flush_context.execute()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
    mapper, table, update)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 692, in _emit_update_statements
    execute(statement, multiparams)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.extensions.TransactionRollbackError) could not serialize access due to concurrent update
 [SQL: 'UPDATE runs SET updated=%(updated)s WHERE runs.id = %(runs_id)s'] [parameters: {'runs_id': 105562, 'updated': datetime.datetime(2021, 8, 9, 6, 1, 23, 197141)}]

So, creating the job in paddles while scheduling failed. It was actually created later, when it was popped off the beanstalkd queue:

paddles.out.log.4.gz:2021-08-09 06:07:38,611 INFO  [paddles.controllers.jobs] Creating job: rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602
paddles.out.log.4.gz:2021-08-09 06:07:38,673 INFO  [paddles.controllers.jobs] Job rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602 status changed from running to waiting

#4 Updated by Zack Cerza over 1 year ago

  • Related to Bug #49864: Passed jobs classified as dead due to "Could not report results to http://paddles.front.sepia.ceph.com/" added

#5 Updated by David Galloway over 1 year ago

  • Assignee set to Aishwarya Mathuria

Also available in: Atom PDF