Project

General

Profile

Actions

Bug #24227

closed

jewel->luminous: osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())

Added by Siegfried Hoellrigl almost 6 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi !

We have done an (almost) successful upgrade to Ceph Luminous 12.2.5.

The Cluster becomes almost healthy. But very shortly before that, One OSD crashes (osd.130).
We could already identify a faulty placement group.

Surprisingly this PG is up&running on 2 other servers without any problem.

With "ceph pg dump" we can see, that the last column (SNAPTRIMQ_LEN) of PG 5.9b is "27826
", and not zero like on the other pgs...

To solve the Problem, we already purged all snapshots of the rbd Pool. (ID=5)
Then we increases verbosity, cache Size and throttle_bytes in ceph.conf like this :
[osd.130]
debug bluestore = 20
debug osd = 20
bluestore_throttle_bytes = 0
bluestore_throttle_deferred_bytes = 0
debug throttle = 10
bluestore_cache_size_hdd = 10737418240
bluestore_cache_size_ssd = 10737418240

Then we deleted the PG from the crashing OSD :
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-130/ --pgid 5.9b --op remove --force

But again, the OSD crashes.

/build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())

What can we do to bring the PG up on all 3 OSDs ?


Files

OSD130LOG.zip (190 KB) OSD130LOG.zip Siegfried Hoellrigl, 05/22/2018 01:25 PM
pgdump.zip (628 KB) pgdump.zip Siegfried Hoellrigl, 05/22/2018 01:25 PM
Actions #1

Updated by Sage Weil almost 6 years ago

  • Status changed from New to 12
  • Priority changed from Normal to High

The second crash appears to be in the recovery path. are all of the other osds upgraded at this point?

Actions #2

Updated by Siegfried Hoellrigl almost 6 years ago

Yes. 12.2.5. But most of them still filestore. (OSD.130 is a bluestore already)

root@ceph-m-03:~# ceph versions {
"mon": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 3
},
"osd": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 191
},
"mds": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 3
},
"rgw": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 3
},
"rgw-nfs": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 1
},
"overall": {
"ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 204
}
} ############################
root@ceph-m-03:~# ceph features {
"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"mds": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 191
}
},
"client": {
"group": {
"features": "0x106b84a842a42",
"release": "hammer",
"num": 51
},
"group": {
"features": "0x107b84a842aca",
"release": "hammer",
"num": 77
},
"group": {
"features": "0xc1dff8eea4fffb",
"release": "hammer",
"num": 18
},
"group": {
"features": "0x7fddff8ee84bffb",
"release": "jewel",
"num": 16
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 196
}
}
}

Actions #3

Updated by Sage Weil almost 6 years ago

  • Subject changed from Upgrade from Jewel 10.2.10 to Luminous 12.2.5 has broken snaptrim at a placement group. OSD Crashes shortly before Cluster becomes healthy to jewel->luminous: osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())
Actions #4

Updated by Siegfried Hoellrigl almost 6 years ago

Hi !

Is there any workaround to fix the problem ?

Actions #5

Updated by Siegfried Hoellrigl almost 6 years ago

Hi !
We have now recompiled ceph-12.2.5 with the Patch from https://tracker.ceph.com/issues/24396
and started this OSD.130 with the new binary.

-> Cluster became healthy.

Is there any Roadmap when this patch will be in the official stable release of ceph ?

Actions #6

Updated by Siegfried Hoellrigl almost 6 years ago

Hi !

After a while (Cluster was HEALTH_OK) - there started the scrubbing (or deep-scrubbing ?).
Now we have a scub error.

The strange thing is, that "data_digest" on the primary and on one OSD seem to be the same.
("0xbac4384a"). And on the new OSD with the Patch applied - it is different.

How can we further troubleshoot/fix this ?

Is this related somehow to https://tracker.ceph.com/issues/21388 ?

###################################################

health: HEALTH_ERR
noout flag(s) set
2 scrub errors
Possible data damage: 1 pg inconsistent

###################################################

ceph health detail
HEALTH_ERR noout flag(s) set; 2 scrub errors; Possible data damage: 1 pg inconsistent
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 5.9b is active+clean+inconsistent, acting [19,166,130]

###################################################

rados list-inconsistent-obj 5.9b --format=json-pretty {
"epoch": 508740,
"inconsistents": [ {
"object": {
"name": "rbd_data.112913b238e1f29.0000000000000ba3",
"nspace": "",
"locator": "",
"snap": 355530,
"version": 18839265
},
"errors": [
"data_digest_mismatch"
],
"union_shard_errors": [],
"selected_object_info": {
"oid": {
"oid": "rbd_data.112913b238e1f29.0000000000000ba3",
"key": "",
"snapid": 355530,
"hash": 3609312923,
"max": 0,
"pool": 5,
"namespace": ""
},
"version": "472938'18842085",
"prior_version": "472861'18839265",
"last_reqid": "client.18280458.1:684915057",
"user_version": 18839265,
"size": 4194304,
"mtime": "2018-04-24 08:10:39.767505",
"local_mtime": "2018-04-24 08:10:39.773385",
"lost": 0,
"flags": [
"dirty",
"omap_digest"
],
"legacy_snaps": [
355530
],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0xffffffff",
"omap_digest": "0xffffffff",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0,
"redirect_target": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854775808,
"namespace": ""
}
},
"watchers": {}
},
"shards": [ {
"osd": 19,
"primary": true,
"errors": [],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0xbac4384a"
}, {
"osd": 130,
"primary": false,
"errors": [],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x43d61c5d"
}, {
"osd": 166,
"primary": false,
"errors": [],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0xbac4384a"
}
]
}
]
}

#################################

ceph pg 5.9b query {
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 508820,
"up": [
19,
166,
130
],
"acting": [
19,
166,
130
],
"actingbackfill": [
"19",
"130",
"166"
],
"info": {
"pgid": "5.9b",
"last_update": "508820'22438684",
"last_complete": "508820'22438684",
"log_tail": "508820'22437175",
"last_user_version": 22438684,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [ {
"start": "1",
"length": "5d75f"
}
],
"history": {
"epoch_created": 874,
"epoch_pool_created": 874,
"last_epoch_started": 508741,
"last_interval_started": 508740,
"last_epoch_clean": 508746,
"last_interval_clean": 508740,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 508728,
"same_interval_since": 508740,
"same_primary_since": 508013,
"last_scrub": "508820'22437015",
"last_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_deep_scrub": "508820'22437015",
"last_deep_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247"
},
"stats": {
"version": "508820'22438684",
"reported_seq": "23478889",
"reported_epoch": "508820",
"state": "active+clean+inconsistent",
"last_fresh": "2018-06-11 12:40:45.659235",
"last_change": "2018-06-11 12:06:16.997555",
"last_active": "2018-06-11 12:40:45.659235",
"last_peered": "2018-06-11 12:40:45.659235",
"last_clean": "2018-06-11 12:40:45.659235",
"last_became_active": "2018-06-07 16:25:29.165827",
"last_became_peered": "2018-06-07 16:25:29.165827",
"last_unstale": "2018-06-11 12:40:45.659235",
"last_undegraded": "2018-06-11 12:40:45.659235",
"last_fullsized": "2018-06-11 12:40:45.659235",
"mapping_epoch": 508740,
"log_start": "508820'22437175",
"ondisk_log_start": "508820'22437175",
"created": 874,
"last_epoch_clean": 508746,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "508820'22437015",
"last_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_deep_scrub": "508820'22437015",
"last_deep_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247",
"log_size": 1509,
"ondisk_log_size": 1509,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 25177387063,
"num_objects": 10224,
"num_object_clones": 4757,
"num_object_copies": 30672,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 10224,
"num_whiteouts": 77,
"num_read": 264603,
"num_read_kb": 9223879,
"num_write": 624940,
"num_write_kb": 10051548,
"num_scrub_errors": 2,
"num_shallow_scrub_errors": 1,
"num_deep_scrub_errors": 1,
"num_objects_recovered": 10658,
"num_bytes_recovered": 31770787895,
"num_keys_recovered": 6,
"num_objects_omap": 1,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
19,
166,
130
],
"acting": [
19,
166,
130
],
"blocked_by": [],
"up_primary": 19,
"acting_primary": 19
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 508741,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
"peer_info": [ {
"peer": "130",
"pgid": "5.9b",
"last_update": "508820'22438684",
"last_complete": "508741'22197582",
"log_tail": "508708'22196074",
"last_user_version": 22197574,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [ {
"start": "1",
"length": "5d75f"
}
],
"history": {
"epoch_created": 874,
"epoch_pool_created": 874,
"last_epoch_started": 508741,
"last_interval_started": 508740,
"last_epoch_clean": 508746,
"last_interval_clean": 508740,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 508728,
"same_interval_since": 508740,
"same_primary_since": 508013,
"last_scrub": "508820'22437015",
"last_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_deep_scrub": "508820'22437015",
"last_deep_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247"
},
"stats": {
"version": "508739'22197574",
"reported_seq": "23002322",
"reported_epoch": "508739",
"state": "active+undersized+remapped+inconsistent+backfilling",
"last_fresh": "2018-06-07 16:25:27.092367",
"last_change": "2018-06-07 16:25:27.079031",
"last_active": "2018-06-07 16:25:27.092367",
"last_peered": "2018-06-07 16:25:27.092367",
"last_clean": "2018-06-07 11:50:01.063918",
"last_became_active": "2018-06-07 16:17:46.744200",
"last_became_peered": "2018-06-07 16:17:46.744200",
"last_unstale": "2018-06-07 16:25:27.092367",
"last_undegraded": "2018-06-07 16:25:27.092367",
"last_fullsized": "2018-06-07 16:17:45.375370",
"mapping_epoch": 508740,
"log_start": "508708'22187564",
"ondisk_log_start": "508708'22187564",
"created": 874,
"last_epoch_clean": 508693,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "508693'22134503",
"last_scrub_stamp": "2018-06-07 10:59:16.801695",
"last_deep_scrub": "508693'22134503",
"last_deep_scrub_stamp": "2018-06-07 10:59:16.801695",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247",
"log_size": 10010,
"ondisk_log_size": 10010,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 26509574199,
"num_objects": 10525,
"num_object_clones": 4757,
"num_object_copies": 31575,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 10525,
"num_whiteouts": 67,
"num_read": 29234,
"num_read_kb": 833737,
"num_write": 165810,
"num_write_kb": 1879366,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 10657,
"num_bytes_recovered": 31770787895,
"num_keys_recovered": 6,
"num_objects_omap": 1,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
19,
166,
130
],
"acting": [
19,
166,
130
],
"blocked_by": [],
"up_primary": 19,
"acting_primary": 19
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 508741,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}, {
"peer": "166",
"pgid": "5.9b",
"last_update": "508820'22438684",
"last_complete": "508741'22197582",
"log_tail": "508708'22196074",
"last_user_version": 22197574,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [ {
"start": "1",
"length": "5d75f"
}
],
"history": {
"epoch_created": 874,
"epoch_pool_created": 874,
"last_epoch_started": 508741,
"last_interval_started": 508740,
"last_epoch_clean": 508746,
"last_interval_clean": 508740,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 508728,
"same_interval_since": 508740,
"same_primary_since": 508013,
"last_scrub": "508820'22437015",
"last_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_deep_scrub": "508820'22437015",
"last_deep_scrub_stamp": "2018-06-11 12:06:16.997511",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247"
},
"stats": {
"version": "508739'22197573",
"reported_seq": "23002321",
"reported_epoch": "508739",
"state": "active+undersized+remapped+inconsistent+backfilling",
"last_fresh": "2018-06-07 16:25:27.082391",
"last_change": "2018-06-07 16:25:27.079031",
"last_active": "2018-06-07 16:25:27.082391",
"last_peered": "2018-06-07 16:25:27.082391",
"last_clean": "2018-06-07 11:50:01.063918",
"last_became_active": "2018-06-07 16:17:46.744200",
"last_became_peered": "2018-06-07 16:17:46.744200",
"last_unstale": "2018-06-07 16:25:27.082391",
"last_undegraded": "2018-06-07 16:25:27.082391",
"last_fullsized": "2018-06-07 16:17:45.375370",
"mapping_epoch": 508740,
"log_start": "508708'22187564",
"ondisk_log_start": "508708'22187564",
"created": 874,
"last_epoch_clean": 508693,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "508693'22134503",
"last_scrub_stamp": "2018-06-07 10:59:16.801695",
"last_deep_scrub": "508693'22134503",
"last_deep_scrub_stamp": "2018-06-07 10:59:16.801695",
"last_clean_scrub_stamp": "2018-04-24 08:33:08.497247",
"log_size": 10009,
"ondisk_log_size": 10009,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 26509574199,
"num_objects": 10525,
"num_object_clones": 4757,
"num_object_copies": 31575,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 10525,
"num_whiteouts": 67,
"num_read": 29234,
"num_read_kb": 833737,
"num_write": 165810,
"num_write_kb": 1879366,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 10657,
"num_bytes_recovered": 31770787895,
"num_keys_recovered": 6,
"num_objects_omap": 1,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
19,
166,
130
],
"acting": [
19,
166,
130
],
"blocked_by": [],
"up_primary": 19,
"acting_primary": 19
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 508741,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
],
"recovery_state": [ {
"name": "Started/Primary/Active",
"enter_time": "2018-06-07 16:25:28.547460",
"might_have_unfound": [],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "MIN",
"backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "508740",
"scrubber.active": false,
"scrubber.state": "INACTIVE",
"scrubber.start": "MIN",
"scrubber.end": "MIN",
"scrubber.subset_last_update": "0'0",
"scrubber.deep": false,
"scrubber.seed": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
}, {
"name": "Started",
"enter_time": "2018-06-07 16:25:27.492226"
}
],
"agent_state": {}
}

Actions #7

Updated by Siegfried Hoellrigl over 5 years ago

Hi !

In the meantime we have upgraded to ceph 12.2.8.

In ceph.conf "osd distrust data digest = true" is still set.

Same PG has still problems.

It seems to cause an OSD Crash when we do a "ceph pg repair".

In the logs we can see :

10> 2018-09-26 18:05:54.637027 7fc7a5166700  5 - 10.0.0.25:6835/1114005 >> 10.0.0.29:0/1348232 conn(0x55cf9f89f800 :6835 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=5545 cs=1 l=1). rx osd.153 seq 6698 0x55cfe1f35000 osd_ping(ping e571875 stamp 2018-09-26 18:05:54.627141) v4
9> 2018-09-26 18:05:54.637228 7fc7a5166700 1 - 10.0.0.25:6835/1114005 <== osd.153 10.0.0.29:0/1348232 6698 ==== osd_ping(ping e571875 stamp 2018-09-26 18:05:54.627141) v4 ==== 2004+0+0 (366789399 0 0) 0x55cfe1f35000 con 0x55cf9f89f800
8> 2018-09-26 18:05:54.637274 7fc7a5166700 1 - 10.0.0.25:6835/1114005 --> 10.0.0.29:0/1348232 -- osd_ping(ping_reply e571875 stamp 2018-09-26 18:05:54.627141) v4 -- 0x55cfb306b400 con 0
7> 2018-09-26 18:05:54.638038 7fc7a4164700 5 - 10.7.2.143:6835/1114005 >> 10.7.2.142:0/1348232 conn(0x55cfa35c8800 :6835 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=5552 cs=1 l=1). rx osd.153 seq 6698 0x55cf9f150000 osd_ping(ping e571875 stamp 2018-09-26 18:05:54.627141) v4
6> 2018-09-26 18:05:54.638076 7fc7a4164700 1 - 10.7.2.143:6835/1114005 <== osd.153 10.7.2.142:0/1348232 6698 ==== osd_ping(ping e571875 stamp 2018-09-26 18:05:54.627141) v4 ==== 2004+0+0 (366789399 0 0) 0x55cf9f150000 con 0x55cfa35c8800
5> 2018-09-26 18:05:54.638151 7fc7a4164700 1 - 10.7.2.143:6835/1114005 --> 10.7.2.142:0/1348232 -- osd_ping(ping_reply e571875 stamp 2018-09-26 18:05:54.627141) v4 -- 0x55cfa73b6800 con 0
4> 2018-09-26 18:05:54.638805 7fc7a4164700 5 - 10.7.2.143:6834/1114005 >> 10.7.2.142:6845/5349190 conn(0x55cfbf84b000 :6834 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=39483 cs=1 l=0). rx osd.136 seq 31312 0x55cfe9848000 osd_repop(client.40219021.1:133086982 5.10b e571875/571823) v2
3> 2018-09-26 18:05:54.638822 7fc7a4164700 1 - 10.7.2.143:6834/1114005 <== osd.136 10.7.2.142:6845/5349190 31312 ==== osd_repop(client.40219021.1:133086982 5.10b e571875/571823) v2 ==== 1050+0+4851 (2098453948 0 2227888094) 0x55cfe9848000 con 0x55cfbf84b000
2> 2018-09-26 18:05:54.639290 7fc78b148700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 571875'49509681, trimmed: , trimmed_dups: , clear_divergent_priors: 0
-1> 2018-09-26 18:05:54.643913 7fc797961700 1 -
10.7.2.143:6834/1114005 --> 10.7.2.142:6845/5349190 -- osd_repop_reply(client.40219021.1:133086982 5.10b e571875/571823 ondisk, result = 0) v2 -- 0x55cfa83db200 con 0
0> 2018-09-26 18:05:54.649415 7fc78c14a700 -1 /build/ceph-12.2.8/src/osd/PG.cc: In function 'void PG::update_snap_map(const std::vector&lt;pg_log_entry_t&gt;&, ObjectStore::Transaction&)' thread 7fc78c14a700 time 2018-09-26 18:05:54.587951
/build/ceph-12.2.8/src/osd/PG.cc: 3493: FAILED assert(r == 0)
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55cf75198412]
2: (PG::update_snap_map(std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, ObjectStore::Transaction&)+0x6f8) [0x55cf74c420d8]
3: (PG::append_log(std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, eversion_t, eversion_t, ObjectStore::Transaction&, bool)+0x509) [0x55cf74c70219]
4: (PrimaryLogPG::log_operation(std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, boost::optional&lt;pg_hit_set_history_t&gt; const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55cf74d73934]
5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr&lt;PGTransaction, std::default_delete&lt;PGTransaction&gt; >&&, eversion_t const&, eversion_t const&, std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, boost::optional&lt;pg_hit_set_history_t&gt;&, Context*, Context*, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr&lt;OpRequest&gt;)+0xb49) [0x55cf74ea7109]
6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x9fa) [0x55cf74d0cc2a]
7: (PrimaryLogPG::simple_opc_submit(std::unique_ptr&lt;PrimaryLogPG::OpContext, std::default_delete&lt;PrimaryLogPG::OpContext&gt; >)+0x15c) [0x55cf74d133ec]
8: (PrimaryLogPG::scrub_snapshot_metadata(ScrubMap&, std::map&lt;hobject_t, std::pair&lt;boost::optional&lt;unsigned int&gt;, boost::optional&lt;unsigned int&gt; >, std::less&lt;hobject_t&gt;, std::allocator&lt;std::pair&lt;hobject_t const, std::pair&lt;boost::optional&lt;unsigned int&gt;, boost::optional&lt;unsigned int&gt; > > > > const&)+0x1a39) [0x55cf74d30569]
9: (PG::scrub_compare_maps()+0x1f11) [0x55cf74c467f1]
10: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x1f7) [0x55cf74c70c47]
11: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x2cb) [0x55cf74c7357b]
12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x106d) [0x55cf74bb4a3d]
13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55cf7519d204]
14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55cf751a0240]
15: (()+0x76ba) [0x7fc7a82d36ba]
16: (clone()+0x6d) [0x7fc7a734a41d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions #8

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #9

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF