Project

General

Profile

Bug #2799

osd: pg log trimming zeroing broken

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/WaitActingChange",
          "enter_time": "2012-07-18 19:50:46.773043",
          "comment": "waiting for pg acting set to change"},
        { "name": "Started",
          "enter_time": "2012-07-18 19:50:46.768881"}]}

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-18_19:00:06-regression-master-testing-gcov/13880$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 14240f8208136dbbe7e825caedc0104806027aae
nuke-on-error: true
overrides:
  ceph:
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: c8ee30160d2253a8398df926e851bfa201ab2a39
  workunit:
    coverage: true
    sha1: c8ee30160d2253a8398df926e851bfa201ab2a39
roles:
- - mon.a
  - mon.b
  - mon.c
  - mds.a
  - osd.0
  - osd.1
  - osd.2
targets:
  ubuntu@plana52.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9kswBp2g5ZV1Qrvlee8MvUOCNdubQFqUBr5WSsmFBODqEuiitWbhuBu2Ucz0lBMf41DpMKLeYDN0lIC94GZmGaiCN+Ak9Ia05d/uRvesT2nDgHB3Z9J/zEFlY8RVxL3xhD+hq4u8dbASlqqoMDiBP+7efZMxt4Ndnzr/yOxge3KenxyQImBUS+OV+BqnfCOHf6BqM33U1leXz2kng7ocxoE91DAMslKD/2DPRSYEhfucUJZk6IYevr/g0JVhbfvjSlZzwUEfTyVmPeqNyls/U+azhKlvQbqpb+ttc02RNydQ1YgOgHFCaqd9Vm8XjUU6vYGlkFHZ+BMJuEwA9AH/D
tasks:
- internal.lock_machines: 1
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    conf:
      osd:
        osd min pg log entries: 5
    log-whitelist:
    - wrongly marked me down
- osd_recovery: null

Associated revisions

Revision f565ace6 (diff)
Added by Sage Weil over 11 years ago

osd: fix pg log zeroing

Zero the right number of bytes. Fixes a bug where we clobber legit log
data. Fortunately this is only triggered with osd preserve pg log = false,
which was not the default until recently in master.

Fixes: #2799
Signed-off-by: Sage Weil <>
Reviewed-by: Mike Ryan <>

History

#1 Updated by Sage Weil over 11 years ago

again today: ubuntu@teuthology:/a/teuthology-2012-07-19_19:00:08-regression-master-testing-gcov/14585

#2 Updated by Sage Weil over 11 years ago

  • Priority changed from High to Urgent

#3 Updated by Sage Weil over 11 years ago

  • Status changed from New to 12
  • Assignee set to Sage Weil

#4 Updated by Sage Weil over 11 years ago

2012-07-23 17:31:42.961718 7f3fcaf11700 10 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'159,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] add_log_entry 13'166 (0'0) modify   a69646ff/plana48_9965_object1318/head//2 by client
.4128.0:1319 2012-07-23 17:31:42.959726
2012-07-23 17:31:42.961734 7f3fcaf11700 10 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'159,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] append_log  20575~2224 adding 139
2012-07-23 17:31:42.961749 7f3fcaf11700 10 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'159,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] append_log  now 20575~2363
2012-07-23 17:31:42.961760 7f3fcaf11700 10 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'159,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] trim log(13'159,13'166] to 13'160
2012-07-23 17:31:42.961773 7f3fcaf11700 15 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'160,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] trim_ondisklog tail 20575 -> 22104, now 22104~834 (same block)
2012-07-23 17:31:42.961794 7f3fcaf11700 20 osd.1 pg_epoch: 13 pg[2.7( v 13'166 (13'160,13'166] n=166 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=4 pi=2-3/1 luod=0'0 lcod 13'162 active] write_info bigbl 494
2012-07-23 17:31:48.207859 7fb3dfe13780 20 osd.1 pg_epoch: 13 pg[2.7( DNE empty n=0 ec=0 les/c 0/0 0/0/0) [] r=0 lpr=0 mlcod 0'0 inactive] enter Initial
2012-07-23 17:31:48.207872 7fb3dfe13780 20 osd.1 pg_epoch: 13 pg[2.7( DNE empty n=0 ec=0 les/c 0/0 0/0/0) [] r=0 lpr=0 mlcod 0'0 inactive] enter NotTrimming
2012-07-23 17:31:48.208076 7fb3dfe13780 10 osd.1 pg_epoch: 13 pg[2.7( v 13'163 (13'157,13'163] n=163 ec=1 les/c 5/5 4/4/4) [] r=0 lpr=0 pi=2-3/1 (info mismatch, log(0'0,0'0]) lcod 0'0 mlcod 0'0 inactive] read_log 20575~1946
2012-07-23 17:31:48.208476 7fb3dfe13780  0 osd.1 pg_epoch: 13 pg[2.7( v 13'163 (13'157,13'163] n=163 ec=1 les/c 5/5 4/4/4) [] r=0 lpr=0 pi=2-3/1 (info mismatch, log(13'157,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] Got exception 'buffer::end_o
f_buffer' while reading log. Moving corrupted log file to 'corrupt_log_2012-07-23_17:31_2.7' for later analysis.
2012-07-23 17:31:48.208522 7fb3dfe13780 20 osd.1 pg_epoch: 13 pg[2.7( v 13'163 (13'163,13'163] n=163 ec=1 les/c 5/5 4/4/4) [] r=0 lpr=0 pi=2-3/1 lcod 0'0 mlcod 0'0 inactive] write_info bigbl 494
2012-07-23 17:31:48.209888 7fb3dfe13780 10 osd.1 pg_epoch: 13 pg[2.7( v 13'163 (13'163,13'163] lb 0//0//-1 n=0 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=0 pi=2-3/1 lcod 0'0 inactive] handle_loaded
2012-07-23 17:31:48.209904 7fb3dfe13780 20 osd.1 pg_epoch: 13 pg[2.7( v 13'163 (13'163,13'163] lb 0//0//-1 n=0 ec=1 les/c 5/5 4/4/4) [2,1] r=1 lpr=0 pi=2-3/1 lcod 0'0 inactive] exit Initial 0.002045 0 0.000000

#5 Updated by Sage Weil over 11 years ago

  • Subject changed from osd: pg hung waiting for pg acting set to change to osd: pg log trimming zeroing broken

this was a bug in pg log trimming/zeroing. thankfully it was only enabled in master! will backport the fix all over the place, just in case someone crazy turns it on.

#6 Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF