Actions
Bug #52101
openpaddles: race condition in Job.set_or_update()
% Done:
0%
Source:
Tags:
Backport:
Regression:
Yes
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Frequent error messages for some of the jobs submitted in a run. For example:
Job scheduled with name rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi and ID 6327602 2021-08-09 06:01:23,207.207 ERROR:teuthology.report:POST to http://paddles.front.sepia.ceph.com:80/runs/rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/jobs/ failed with status 500: <html> <head> <title>Internal Server Error</title> </head> <body> <h1><p>Internal Server Error</p></h1> </body> </html> 2021-08-09 06:01:23,207.207 ERROR:teuthology.report:Could not report results to http://paddles.front.sepia.ceph.com:80 Traceback (most recent call last): File "/home/rfriedma/src/teuthology/teuthology/report.py", line 498, in try_push_job_info push_job_info(run_name, job_id, job_info) File "/home/rfriedma/src/teuthology/teuthology/report.py", line 463, in push_job_info reporter.report_job(run_name, job_id, job_info) File "/home/rfriedma/src/teuthology/teuthology/report.py", line 320, in report_job response.raise_for_status() File "/home/rfriedma/src/teuthology/virtualenv/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://paddles.front.sepia.ceph.com:80/runs/rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/jobs/
Updated by Zack Cerza over 2 years ago
- Related to Bug #49893: "500 Server Error" during runs scheduling added
Updated by Zack Cerza over 2 years ago
- Subject changed from teuthology: POST to paddles fails with status 500 to paddles: race condition in Job.set_or_update()
- Priority changed from Normal to Urgent
- Regression changed from No to Yes
- Severity changed from 3 - minor to 1 - critical
I found the traceback in the paddles logfile:
2021-08-09 06:01:23,193 INFO [paddles.controllers.jobs] Creating job: rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602 [2021-08-09 06:01:23 +0000] [29621] [ERROR] Error handling request Traceback (most recent call last): File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle self.handle_request(listener, req, client, addr) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request respiter = self.wsgi(environ, resp.start_response) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__ return self.application(environ, start_response) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 810, in __call__ return super(Pecan, self).__call__(environ, start_response) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 659, in __call__ self.invoke_controller(controller, args, kwargs, state) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 559, in invoke_controller result = controller(*args, **kwargs) File "/home/ubuntu/paddles/paddles/controllers/jobs.py", line 130, in index_post self.job = Job(data, self.run) File "<string>", line 4, in __init__ File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 306, in _initialize_instance manager.dispatch.init_failure(self, args, kwargs) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 303, in _initialize_instance return manager.original_init(*mixed[1:], **kwargs) File "/home/ubuntu/paddles/paddles/models/jobs.py", line 114, in __init__ self.set_or_update(json_data) File "/home/ubuntu/paddles/paddles/models/jobs.py", line 150, in set_or_update self.run.set_status() File "/home/ubuntu/paddles/paddles/models/runs.py", line 243, in set_status results = results or self.get_results() File "/home/ubuntu/paddles/paddles/models/runs.py", line 215, in get_results jobs_status = [value[0] for value in self.jobs.values(Job.status)] File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 1024, in values return iter(q) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2515, in __iter__ self.session._autoflush() File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1292, in _autoflush util.raise_from_cause(e) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1282, in _autoflush self.flush() File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush self._flush(objects) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush transaction.rollback(_capture_exception=True) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush flush_context.execute() File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute rec.execute(self) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute uow File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj mapper, table, update) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 692, in _emit_update_statements execute(statement, multiparams) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.extensions.TransactionRollbackError) could not serialize access due to concurrent update [SQL: 'UPDATE runs SET updated=%(updated)s WHERE runs.id = %(runs_id)s'] [parameters: {'runs_id': 105562, 'updated': datetime.datetime(2021, 8, 9, 6, 1, 23, 197141)}]
So, creating the job in paddles while scheduling failed. It was actually created later, when it was popped off the beanstalkd queue:
paddles.out.log.4.gz:2021-08-09 06:07:38,611 INFO [paddles.controllers.jobs] Creating job: rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602 paddles.out.log.4.gz:2021-08-09 06:07:38,673 INFO [paddles.controllers.jobs] Job rfriedma-2021-08-09_05:54:02-rados:thrash-wip-ronenf-last-update-distro-basic-smithi/6327602 status changed from running to waiting
Updated by Zack Cerza over 2 years ago
- Related to Bug #49864: Passed jobs classified as dead due to "Could not report results to http://paddles.front.sepia.ceph.com/" added
Actions