Project

General

Profile

Actions

Bug #7575

closed

osd/ReplicatedPG.cc: 10600: FAILED assert(r >= 0): hit_set_persist() races with agent_load_hit_sets()

Added by David Zafman about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2014-03-01 00:13:39.895495 7efefcde2700 10 osd.4 pg_epoch: 76 pg[4.1( v 74'1454 lc 17'513 (0'0,74'1454] local-les=76 n=49 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 rops=5 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] hit_set_persist
2014-03-01 00:13:39.895612 7efefcde2700 20 osd.4 pg_epoch: 76 pg[4.1( v 74'1454 lc 17'513 (0'0,74'1454] local-les=76 n=49 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 rops=5 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] get_hit_set_archive_object 1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4
2014-03-01 00:13:39.895522 7efefc5e1700 10 osd.4 76 start_recovery_op pg[3.3( v 74'258 lc 0'0 (0'0,74'258] local-les=76 n=136 ec=6 les/c 76/70 75/75/75) [4,0] r=0 lpr=75 pi=6-74/6 rops=2 crt=74'258 mlcod 0'0 active+recovering m=136] 92ad546f/plana622773-60/head//3 (14/15 rops)
2014-03-01 00:13:39.895692 7efefcde2700 20 osd.4 pg_epoch: 76 pg[4.1( v 74'1454 lc 17'513 (0'0,74'1454] local-les=76 n=49 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 rops=5 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] hit_set_persist archive 1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4

2014-03-01 00:13:39.941560 7efef85d9700 10 osd.4 pg_epoch: 76 pg[4.1( v 76'1456 lc 17'513 (0'0,76'1456] local-les=76 n=51 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 luod=74'1454 rops=6 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] agent_load_hit_sets
2014-03-01 00:13:39.941658 7efef85d9700 10 osd.4 pg_epoch: 76 pg[4.1( v 76'1456 lc 17'513 (0'0,76'1456] local-les=76 n=51 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 luod=74'1454 rops=6 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] agent_load_hit_sets loading 2014-03-01 00:13:38.002740-2014-03-01 00:13:39.895570
2014-03-01 00:13:39.941752 7efef85d9700 20 osd.4 pg_epoch: 76 pg[4.1( v 76'1456 lc 17'513 (0'0,76'1456] local-les=76 n=51 ec=7 les/c 76/70 75/75/75) [4,2] r=0 lpr=75 pi=68-74/2 luod=74'1454 rops=6 crt=74'1454 lcod 0'0 mlcod 0'0 active+recovering m=49] get_hit_set_archive_object 1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4
2014-03-01 00:13:39.941832 7efef85d9700 15 filestore(/var/lib/ceph/osd/ceph-4) read 4.1_head/1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4 0~0
2014-03-01 00:13:39.941663 7efefcde2700 20 osd.4 pg_epoch: 76 pg[3.3( v 74'258 lc 0'0 (0'0,74'258] local-les=76 n=136 ec=6 les/c 76/70 75/75/75) [4,0] r=0 lpr=75 pi=6-74/6 rops=10 crt=74'258 mlcod 0'0 active+recovering m=136] op_has_sufficient_caps pool=3 (base ) owner=0 need_read_cap=1 need_write_cap=0 need_class_read_cap=0 need_class_write_cap=0 -> yes
2014-03-01 00:13:39.941873 7efef85d9700 10 filestore(/var/lib/ceph/osd/ceph-4) error opening file /var/lib/ceph/osd/ceph-4/current/4.1_head/hit\uset\u4.1\uarchive\u2014-03-01 00:13:38.002740\u2014-03-01 00:13:39.895570__head_00000001_.ceph-internal_4 with flags=2: (2) No such file or directory
2014-03-01 00:13:39.941887 7efef85d9700 10 filestore(/var/lib/ceph/osd/ceph-4) FileStore::read(4.1_head/1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4) open error: (2) No such file or directory

  • TOO LATE CREATING THE ARCHIVE
    2014-03-01 00:13:39.948664 7eff07725700 15 filestore(/var/lib/ceph/osd/ceph-4) write 4.1_head/1/hit_set_4.1_archive_2014-03-01 00:13:38.002740_2014-03-01 00:13:39.895570/head/.ceph-internal/4 0~393 *********

2014-03-01 00:13:39.955756 7efef85d9700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::agent_load_hit_sets()' thread 7efef85d9700 time 2014-03-01 00:13:39.941895
osd/ReplicatedPG.cc: 10600: FAILED assert(r >= 0)


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #7496: agent hit set crash: osd/ReplicatedPG.cc: 10579: FAILED assert(r >= 0)Duplicate02/20/2014

Actions
Actions #1

Updated by David Zafman about 10 years ago

Log came from dzafman-2014-02-28_12:09:58-rados:thrash-wip-7458-testing-basic-plana/111634

Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from New to 12
  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
Actions #3

Updated by David Zafman about 10 years ago

  • Status changed from 12 to 7
  • Assignee changed from Sage Weil to David Zafman
Actions #4

Updated by David Zafman about 10 years ago

  • Status changed from 7 to Resolved

135c27ec74be352416d06a9d0ad78e63cf477433

Actions

Also available in: Atom PDF