Project

General

Profile

Bug #39529

bluestore transaction apparently lost on osd restart

Added by Samuel Just almost 5 years ago. Updated over 4 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-04-27 13:19:09.735 7f42935e8700 20 osd.1 op_wq(2) _process OpQueueItem(3.11s2 PGPeeringEvent(epoch_sent: 154 epoch_requested: 154 MInfoRec from 2(1) info: 3.11s1( v 151'473 (0'0,151'473] local-lis/les=153/154 n=278 ec=127/113 lis/c 13
4/134 les/c/f 135/135/0 152/153/153)) prio 255 cost 10 e154) pg 0x1f766000
2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'474 (0'0,151'474] local-lis/les=134/135 n=279 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 remapped NOTIFY mbc={}] do_peering_event: epoch_sent: 154 epoch_requested: 154 MInfoRec from 2(1) info: 3.11s1( v 151'473 (0'0,151'473] local-lis/les=153/154 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153)
2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'474 (0'0,151'474] local-lis/les=134/135 n=279 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 remapped NOTIFY mbc={}] state<Started/Stray>: got info from osd.2(1) 3.11s1( v 151'473 (0'0,151'473] local-lis/les=153/154 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153)
2019-04-27 13:19:09.735 7f42935e8700 10 rewind_divergent_log truncate divergent future 151'473
2019-04-27 13:19:09.735 7f42935e8700 20 rewind_divergent_log original_crt = 151'473
2019-04-27 13:19:09.735 7f42935e8700 10 rewind_divergent_log future divergent 151'474 (0'0) modify   3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head by client.5195.0:8767 2019-04-27 13:19:07.946483 0
2019-04-27 13:19:09.735 7f42935e8700 20 _merge_object_divergent_entries: merging hoid 3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head entries: 151'474 (0'0) modify   3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head by cl
ient.5195.0:8767 2019-04-27 13:19:07.946483 0
2019-04-27 13:19:09.735 7f42935e8700 20 _merge_object_divergent_entries: keeping 151'474 (0'0) modify   3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head by client.5195.0:8767 2019-04-27 13:19:07.946483 0
2019-04-27 13:19:09.735 7f42935e8700 10 _merge_object_divergent_entries: hoid  object_not_in_store: 0
2019-04-27 13:19:09.735 7f42935e8700 10 _merge_object_divergent_entries: hoid 3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head prior_version: 0'0 first_divergent_update: 151'474 last_divergent_update: 151'474
2019-04-27 13:19:09.735 7f42935e8700 10 _merge_object_divergent_entries: hoid 3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head has no more recent entries in log
2019-04-27 13:19:09.735 7f42935e8700 10 _merge_object_divergent_entries: hoid 3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head prior_version or op type indicates creation, deleting
2019-04-27 13:19:09.735 7f42935e8700  5 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=134/135 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped NOTIFY mbc={}] exit Started/Stray 2019-04-27 13:19:09.740858 7 0.000756
2019-04-27 13:19:09.735 7f42935e8700  5 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=134/135 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped NOTIFY mbc={}] enter Started/ReplicaActive
2019-04-27 13:19:09.735 7f42935e8700  5 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=134/135 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped NOTIFY mbc={}] enter Started/ReplicaActive/RepNotRecovering
2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=134/135 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped NOTIFY mbc={}] state<Started/ReplicaActive>: In ReplicaActive, about to call activate
2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=153/154 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped mbc={}] activate - no missing, moving last_complete 151'473 -> 151'473
2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473 (0'0,151'473] local-lis/les=153/154 n=278 ec=127/113 lis/c 134/134 les/c/f 135/135/0 152/153/153) [6,2,4]/[2147483647,2,1]p2(1) r=2 lpr=153 pi=[134,153)/1 crt
=151'473 lcod 151'474 remapped mbc={}] state<Started/ReplicaActive>: Activate Finished
2019-04-27 13:19:09.735 7f42935e8700  6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 151'474, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) queue_transactions ch 0x25d86800 3.11s2_head
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_create osr 0xd23e520 = 0xcda3500 seq 188
2019-04-27 13:19:09.735 7f42935e8700 15 bluestore(/var/lib/ceph/osd/ceph-1) _remove 3.11s2_head 2#3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head# onode 0x1ccc5700 txc 0xcda3500
2019-04-27 13:19:09.735 7f42935e8700 15 bluestore(/var/lib/ceph/osd/ceph-1) _do_truncate 3.11s2_head 2#3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head# 0x0
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.blob(0x2584ca00) put_ref 0x0~8000 Blob(0x2584ca00 blob([0xf890000~10000] csum+has_unused crc32c/0x1000 unused=0xff00) use_tracker(0x10000 0x8000) SharedBlob(0x15a6a680 sbid 0x0))
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.extentmap(0x1ccc5850) dirty_range mark inline shard dirty
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _wctx_finish lex_old 0x0~8000: 0x0~8000 Blob(0x2584ca00 blob([!~10000] csum+has_unused crc32c/0x1000 unused=0xff00) use_tracker(0x10000 0x0) SharedBlob(0x15a6a680
sbid 0x0))
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _wctx_finish  blob release [0xf890000~10000]
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.blob(0x2584ca00) discard_unallocated 0x0~10000
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.BufferSpace(0x15a6a698 in 0xbaae1c0) _discard 0x0~10000
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _wctx_finish  release 0xf890000~10000
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.sharedblob(0x15a6a680) put 0x15a6a680 removing self from set 0x25d86900
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.BufferSpace(0x15a6a698 in 0xbaae1c0) _clear
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _remove 3.11s2_head 2#3:8d5d7e28:::benchmark_data_mira041_25052_object8766:head# = 0
2019-04-27 13:19:09.735 7f42935e8700 15 bluestore(/var/lib/ceph/osd/ceph-1) _omap_rmkey_range 3.11s2_head 2#3:88000000::::head#
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore.onode(0x19a64840).flush flush done
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_rmkey_range remove range start: 0x000000000000352b'.0000000151.00000000000000000474' end: 0x000000000000352b'.4294967295.18446744073709551615'
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _omap_rmkey_range 3.11s2_head 2#3:88000000::::head# = 0
2019-04-27 13:19:09.735 7f42935e8700 15 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys 3.11s2_head 2#3:88000000::::head#
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys  0x000000000000352b'._biginfo' <- _biginfo
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys  0x000000000000352b'._epoch' <- _epoch
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys  0x000000000000352b'._info' <- _info
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys  0x000000000000352b'.can_rollback_to' <- can_rollback_to
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys  0x000000000000352b'.rollback_info_trimmed_to' <- rollback_info_trimmed_to
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys 3.11s2_head 2#3:88000000::::head# = 0
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_calc_cost 0xcda3500 cost 671710 (1 ios * 670000 + 1710 bytes)
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_write_nodes txc 0xcda3500 onodes  shared_blobs
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finalize_kv txc 0xcda3500 allocated 0x[] released 0x[f890000~10000]
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finalize_kv release 0xf890000~10000
2019-04-27 13:19:09.735 7f42935e8700 10 freelist release 0xf890000~10000
2019-04-27 13:19:09.735 7f42935e8700 20 freelist _xor first_key 0xf800000 last_key 0xf800000
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_state_proc txc 0xcda3500 prepare
2019-04-27 13:19:09.735 7f42935e8700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finish_io 0xcda3500
2019-04-27 13:19:09.735 7f42935e8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_state_proc txc 0xcda3500 io_done
2019-04-27 13:19:09.735 7f429fe01700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_sync_thread wake
...
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_finalize_thread wake
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_finalize_thread kv_committed <0xcda3500>
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_finalize_thread deferred_stable <>
2019-04-27 13:19:09.739 7f429fe01700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_sync_thread committed 0 cleaned 0 in 0.00165673s (1.63e-07s flush + 0.00165657s kv commit)
2019-04-27 13:19:09.739 7f42a0602700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_state_proc txc 0xcda3500 kv_submitted
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_committed_kv txc 0xcda3500
2019-04-27 13:19:09.739 7f429fe01700 20 bluestore(/var/lib/ceph/osd/ceph-1) _kv_sync_thread committing 1 submitting 1 deferred done 0 stable 0
2019-04-27 13:19:09.739 7f42a0602700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_state_proc txc 0xcda3500 finishing
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finish 0xcda3500 onodes
2019-04-27 13:19:09.739 7f429fe01700 10 bluefs _flush 0xbab4580 ignoring, length 23 < min_flush_size 524288
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finish  txc 0xcda3500 done
2019-04-27 13:19:09.739 7f42a0602700 20 bluestore(/var/lib/ceph/osd/ceph-1) _txc_finish osr 0xd23e520 q now empty
2019-04-27 13:19:09.739 7f42a0602700 10 bluestore(/var/lib/ceph/osd/ceph-1) _txc_release_alloc(sync) 0xcda3500 [f890000~10000]
2019-04-27 13:19:09.739 7f42935e8700 15 osd.1 154 enqueue_peering_evt 3.11s2 epoch_sent: 154 epoch_requested: 154 ActivateCommitted from 154 processed at 0
...
2019-04-27 13:22:04.394 7f587a8fff00  0 ceph version 15.0.0-789-gfd94178 (fd941781596ea87bdb123e8a003385917bf4d619) octopus (dev), process ceph-osd, pid 35119
2019-04-27 13:22:04.394 7f587a8fff00  5 object store type is bluestore
2019-04-27 13:22:04.394 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1) set_cache_shards 1
2019-04-27 13:22:04.394 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label got bdev(osd_uuid 61d4d1c8-77f0-4df8-8c9c-a857e34013f1, size 0x1680000000, btime 2019-04-27 13:12:31.768949, desc main, 8 meta)
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label got bdev(osd_uuid 61d4d1c8-77f0-4df8-8c9c-a857e34013f1, size 0x1680000000, btime 2019-04-27 13:12:31.768949, desc main, 8 meta)
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label got bdev(osd_uuid 61d4d1c8-77f0-4df8-8c9c-a857e34013f1, size 0x1680000000, btime 2019-04-27 13:12:31.768949, desc main, 8 meta)
2019-04-27 13:22:04.398 7f587a8fff00 10 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label

History

#1 Updated by Samuel Just almost 5 years ago

  • Description updated (diff)

#2 Updated by Samuel Just almost 5 years ago

The symptom was that the omap entries for the pg info after restart didn't reflect the pg log and last_update changes resulting from log rollback, though the object removal did persist.

2019-04-27 13:19:09.735 7f42935e8700 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'474
...
2019-04-27 13:19:09.735 7f42935e8700 20 rewind_divergent_log original_crt = 151'473
...
2019-04-27 13:19:09.735 7f42935e8700 5 osd.1 pg_epoch: 154 pg[3.11s2( v 151'473
...
2019-04-27 13:22:04.394 7f587a8fff00 0 ceph version 15.0.0-789-gfd94178
...
2019-04-27 13:22:27.606 7f587a8fff00 10 osd.1 pg_epoch: 154 pg[3.11s2( v 151'474 (0'0,151'474]

The bluestore logs seem to include both the omap updates for clearing the relevant pg log entries:

_omap_rmkey_range remove range start: 0x000000000000352b'.0000000151.00000000000000000474' end: 0x000000000000352b'.4294967295.18446744073709551615'

as well as updating the info key:

_omap_setkeys 0x000000000000352b'._info' <- _info

#3 Updated by Samuel Just almost 5 years ago

  • Assignee set to Sage Weil

Sage: here's the instance I was talking about.

#4 Updated by Samuel Just almost 5 years ago

The full run logs can be found at rex004:/slow/sjust/savedlogs/sjust-2019-04-26_14:00:33-rados-wip-sjust-peering-refactor-distro-basic-mira/3897091 (osd logs in ./mvdlogs).

#5 Updated by Greg Farnum almost 5 years ago

  • Project changed from Ceph to bluestore

#6 Updated by Adam Kupczyk over 4 years ago

  • Status changed from New to Need More Info

Also available in: Atom PDF