Project

General

Profile

Actions

Bug #648

closed

monclient: PGMap::apply_incremental

Added by Wido den Hollander over 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I left my laptop on last night with a 'ceph -w' on one of my test machines, this morning I saw:

2010-12-13 22:38:51.497761 7f3907e94710 monclient: hunting for new mon
2010-12-13 22:39:02.919849    pg v42560: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:39:12.979827    pg v42561: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:39:24.353785    pg v42562: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:39:37.422499    pg v42563: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:39:49.759698    pg v42564: 832 pgs: 832 active+clean; 5803 MB data, 16893 MB used, 277 GB / 300 GB avail
2010-12-13 22:40:00.477234    pg v42565: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:40:12.024961    pg v42566: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:40:25.177157    pg v42567: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:40:35.827663    pg v42568: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:40:43.069503   mds e886: 1/1/1 up {0=up:active(laggy or crashed)}
2010-12-13 22:40:55.565163    pg v42569: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:41:01.910177   mds e887: 1/1/1 up {0=up:active}
2010-12-13 22:41:16.599476   log 2010-12-13 22:41:01.908892 mon0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0 44 : [INF] mds0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/1987 up:active
2010-12-13 22:41:27.167460    pg v42570: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:41:38.616370    pg v42571: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:41:50.188709    pg v42572: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:42:01.686584    pg v42573: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:42:12.054956    pg v42574: 832 pgs: 832 active+clean; 5803 MB data, 16892 MB used, 277 GB / 300 GB avail
2010-12-13 22:42:24.908336    pg v42575: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:42:33.920864    pg v42576: 832 pgs: 832 active+clean; 5803 MB data, 16891 MB used, 277 GB / 300 GB avail
2010-12-13 22:42:45.710302 7f3907e94710 monclient: hunting for new mon
./mon/PGMap.h: In function 'void PGMap::apply_incremental(PGMap::Incremental&)':
./mon/PGMap.h:77: FAILED assert(inc.version == version+1)
 ceph version 0.24~rc (commit:9add26be7698b55e31d9dff73537f1a726f9ee86)
 1: ceph() [0x455a6e]
 2: ceph() [0x458763]
 3: (Admin::ms_dispatch(Message*)+0xe0) [0x46dfa0]
 4: (SimpleMessenger::dispatch_entry()+0x759) [0x473b19]
 5: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45dd8c]
 6: (Thread::_entry_func(void*)+0xa) [0x4800da]
 7: (()+0x69ca) [0x7f3910b699ca]
 8: (clone()+0x6d) [0x7f390ea7070d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
./mon/PGMap.h: In function 'void PGMap::apply_incremental(PGMap::Incremental&)':
./mon/PGMap.h:77: FAILED assert(inc.version == version+1)
 ceph version 0.24~rc (commit:9add26be7698b55e31d9dff73537f1a726f9ee86)
 1: ceph() [0x455a6e]
 2: ceph() [0x458763]
 3: (Admin::ms_dispatch(Message*)+0xe0) [0x46dfa0]
 4: (SimpleMessenger::dispatch_entry()+0x759) [0x473b19]
 5: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45dd8c]
 6: (Thread::_entry_func(void*)+0xa) [0x4800da]
 7: (()+0x69ca) [0x7f3910b699ca]
 8: (clone()+0x6d) [0x7f390ea7070d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted

monclient: hunting for new mon makes you think the monitor crashed and the client switch to another monitor, but there is only one monitor in this case.

The cluster layout:

  • 1 monitor
  • 1 MDS
  • 3 OSD's

All on the same machine.

There is nothing special in the monitor logs on those times (debugging was low).

I'm not sure if I can reproduce it, but the hunting for new mon seems rather weird.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #656: cephClosed12/16/2010

Actions
Actions #1

Updated by Sage Weil over 13 years ago

  • Target version set to 19

This is a known issue, caused by the pg state trimming. It'll go away eventually with #647. In the meantime, I'll make the trimming less aggressive so it won't come up so often.

Actions #2

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved
Actions #3

Updated by Sage Weil almost 13 years ago

  • Target version deleted (19)
Actions

Also available in: Atom PDF