Project

General

Profile

Actions

Bug #8166

closed

paddles: deadlock updating run.updated?

Added by Zack Cerza about 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

A teuthology job and worker died because of this completely unexpected error:

2014-04-18T04:31:03.475 INFO:teuthology.worker:Creating archive dir /var/lib/teuthworker/archive/gregf-2014-04-17_23:48:51-rados-wip-fast-dispatch-testing-basic-plana/199983
2014-04-18T04:31:03.475 INFO:teuthology.worker:Running job 199983
2014-04-18T04:31:03.488 INFO:teuthology.worker:Job archive: /var/lib/teuthworker/archive/gregf-2014-04-17_23:48:51-rados-wip-fast-dispatch-testing-basic-plana/199983
2014-04-18T04:31:03.489 INFO:teuthology.worker:Job PID: 3767
2014-04-18T04:31:03.489 INFO:teuthology.worker:Running with watchdog
2014-04-18T07:01:40.343 ERROR:teuthology.worker:run_with_watchdog had an unhandled exception
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-master/teuthology/worker.py", line 303, in run_job
    run_with_watchdog(p, job_config)
  File "/home/teuthworker/teuthology-master/teuthology/worker.py", line 230, in run_with_watchdog
    report.try_push_job_info(job_info, dict(status='running'))
  File "/home/teuthworker/teuthology-master/teuthology/report.py", line 410, in try_push_job_info
    push_job_info(run_name, job_id, job_info)
  File "/home/teuthworker/teuthology-master/teuthology/report.py", line 373, in push_job_info
    reporter.report_job(run_name, job_id, job_info)
  File "/home/teuthworker/teuthology-master/teuthology/report.py", line 277, in report_job
    resp_json = response.json()
  File "/home/teuthworker/teuthology-master/virtualenv/local/lib/python2.7/site-packages/requests/models.py", line 741, in json                                 
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 413, in loads 
    return _default_decoder.decode(s) 
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 402, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 420, in raw_decode
    raise JSONDecodeError("No JSON object could be decoded", s, idx)
JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)

I looked in paddles' logs to see what was up:

[huge traceback]
DBAPIError: (TransactionRollbackError) deadlock detected
DETAIL:  Process 3874 waits for ShareLock on transaction 500780; blocked by process 3896.
Process 3896 waits for ExclusiveLock on tuple (184,19) of relation 16397 of database 16385; blocked by process 3874.
HINT:  See server log for query details.
 'UPDATE runs SET updated=%(updated)s WHERE runs.id = %(runs_id)s' {'runs_id': 7031, 'updated': datetime.datetime(2014, 4, 18, 14, 1, 36, 556867)}

And now in the PostgreSQL log:

2014-04-18 07:01:37 PDT ERROR:  deadlock detected
2014-04-18 07:01:37 PDT DETAIL:  Process 3874 waits for ShareLock on transaction 500780; blocked by process 3896.
        Process 3896 waits for ExclusiveLock on tuple (184,19) of relation 16397 of database 16385; blocked by process 3874.
        Process 3874: UPDATE runs SET updated='2014-04-18T14:01:36.556867'::timestamp WHERE runs.id = 7031
        Process 3896: UPDATE runs SET updated='2014-04-18T14:01:36.558376'::timestamp WHERE runs.id = 7031
2014-04-18 07:01:37 PDT HINT:  See server log for query details.
2014-04-18 07:01:37 PDT STATEMENT:  UPDATE runs SET updated='2014-04-18T14:01:36.556867'::timestamp WHERE runs.id = 7031
2014-04-18 07:01:37 PDT LOG:  duration: 999.330 ms  statement: UPDATE runs SET updated='2014-04-18T14:01:36.558376'::timestamp WHERE runs.id = 7031
2014-04-18 07:01:37 PDT LOG:  duration: 895.216 ms  statement: UPDATE runs SET updated='2014-04-18T14:01:36.666787'::timestamp WHERE runs.id = 7031

Actions #2

Updated by Zack Cerza about 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Zack Cerza
Actions #3

Updated by Zack Cerza about 10 years ago

I thought the fix for #8190 would fix this, but it looks like we're not totally out of the woods yet:

Apr 24 01:27:20 pulpito postgres[6112]: [1159-1] ERROR:  deadlock detected
Apr 24 01:27:20 pulpito postgres[6112]: [1159-2] DETAIL:  Process 6112 waits for ShareLock on transaction 828930; blocked by process 6103.
Apr 24 01:27:20 pulpito postgres[6112]: [1159-3] #011Process 6103 waits for ExclusiveLock on tuple (46,46) of relation 16397 of database 16385; blocked by process 6112.
Apr 24 01:27:20 pulpito postgres[6112]: [1159-4] #011Process 6112: UPDATE runs SET updated='2014-04-24T08:27:19.219395'::timestamp WHERE runs.id = 7146
Apr 24 01:27:20 pulpito postgres[6112]: [1159-5] #011Process 6103: UPDATE runs SET updated='2014-04-24T08:27:19.223676'::timestamp WHERE runs.id = 7146
Apr 24 01:27:20 pulpito postgres[6112]: [1159-6] HINT:  See server log for query details.
Apr 24 01:27:20 pulpito postgres[6112]: [1159-7] STATEMENT:  UPDATE runs SET updated='2014-04-24T08:27:19.219395'::timestamp WHERE runs.id = 7146

Actions #4

Updated by Zack Cerza almost 10 years ago

  • Status changed from In Progress to 12

This is happening less now, but still happening. I am not actively working on it, so changing status.

Actions #5

Updated by Zack Cerza almost 10 years ago

Saw it happen a few times this morning. :-/

Actions #6

Updated by Zack Cerza almost 10 years ago

  • Priority changed from High to Normal

Dropping priority since this happens very infrequently now.

Actions #7

Updated by Zack Cerza almost 10 years ago

  • Status changed from 12 to 7
Actions #8

Updated by Zack Cerza almost 10 years ago

Hasn't happened since 5/21; I'll leave this open for another one or two days

Actions #9

Updated by Zack Cerza almost 10 years ago

  • Status changed from 7 to Resolved

Still hasn't happened. Marking resolved.

Actions

Also available in: Atom PDF