Project

General

Profile

Actions

Bug #36667

open

OSD object_map sync returned error

Added by yp dai over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
FileStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

i deploy a cephfs and the used the vdbench tool to wirte data in cephfs mount point,after a while osd appears down.
and i manually restart osd and the following log appears:

2018-11-01 11:01:50.482203 b6867000  5 osd.10 pg_epoch: 975 pg[2.25( v 954'2580 (0'0,954'2580] local-les=959 n=3 ec=66 les/c/f 959/959/0 975/975/714) [9,5,10]/[9,5] r=-1 lpr=0 pi=708-974/40 crt=954'2580 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -8> 2018-11-01 11:01:50.483487 b6867000  5 osd.10 pg_epoch: 975 pg[0.10b(unlocked)] enter Initial
    -7> 2018-11-01 11:01:50.520894 b6867000  5 osd.10 pg_epoch: 975 pg[0.10b( v 937'6036 (150'3000,937'6036] local-les=959 n=798 ec=77 les/c/f 959/959/0 975/975/736) [6,10]/[6] r=-1 lpr=0 pi=721-974/19 crt=937'6036 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.037408 0 0.000000
    -6> 2018-11-01 11:01:50.520972 b6867000  5 osd.10 pg_epoch: 975 pg[0.10b( v 937'6036 (150'3000,937'6036] local-les=959 n=798 ec=77 les/c/f 959/959/0 975/975/736) [6,10]/[6] r=-1 lpr=0 pi=721-974/19 crt=937'6036 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -5> 2018-11-01 11:01:50.522519 b6867000  5 osd.10 pg_epoch: 975 pg[1.15b(unlocked)] enter Initial
    -4> 2018-11-01 11:01:50.560698 b6867000  5 osd.10 pg_epoch: 975 pg[1.15b( v 959'27450 (873'24391,959'27450] local-les=959 n=4307 ec=63 les/c/f 959/959/0 975/975/973) [10,1]/[1] r=-1 lpr=0 pi=958-974/8 crt=959'27450 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.038179 0 0.000000
    -3> 2018-11-01 11:01:50.560773 b6867000  5 osd.10 pg_epoch: 975 pg[1.15b( v 959'27450 (873'24391,959'27450] local-les=959 n=4307 ec=63 les/c/f 959/959/0 975/975/973) [10,1]/[1] r=-1 lpr=0 pi=958-974/8 crt=959'27450 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -2> 2018-11-01 11:01:50.562743 b6867000  5 osd.10 pg_epoch: 975 pg[2.5d(unlocked)] enter Initial
    -1> 2018-11-01 11:01:50.581308 af60eb10 -1 filestore(/var/lib/ceph/osd/ceph-10) object_map sync got (1) Operation not permitted
     0> 2018-11-01 11:01:50.583579 af60eb10 -1 os/filestore/FileStore.cc: In function 'void FileStore::sync_entry()' thread af60eb10 time 2018-11-01 11:01:50.581359
os/filestore/FileStore.cc: 3796: FAILED assert(0 == "object_map sync returned error")

below is my ceph cluster node env:

OS:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty

Kernel:
Linux arm242 4.4.8-armada-17.02.2

who has a good ideas? thx.

Actions #1

Updated by Patrick Donnelly over 5 years ago

  • Project changed from Ceph to RADOS
  • Description updated (diff)
  • Category deleted (OSD)
  • Component(RADOS) FileStore added
Actions #2

Updated by Josh Durgin over 5 years ago

Check dmesg for hardware errors, this is leveldb/rocksdb returning an error writing to disk. You may want to ask the ceph-users mailing list if anyone else has seen this. It's a better forum for debugging problems like this.

Actions #3

Updated by Josh Durgin over 5 years ago

  • Tracker changed from Bug to Support
Actions #4

Updated by Sage Weil over 5 years ago

  • Tracker changed from Support to Bug
  • Regression set to No
  • Severity set to 3 - minor

This might also indicate something screwe dup the file permissions or ownership in /var/lib/ceph/osd/ceph-10. maybe the daemon got accidentally started as root and some files are root owned instead of ceph:ceph? That's usually why we see this.

Actions

Also available in: Atom PDF