Project

General

Profile

Bug #631

OSD: FileJournal::committed_thru

Added by Wido den Hollander over 8 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
Start date:
12/05/2010
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

On a small cluster (1 MDS, 1 MON, 3 OSD's) I just saw 2 OSD's crashing with the same backtrace:

2010-12-04 22:01:01.823801 7f7bfd2f1710 osd0 96 update_osd_stat osd_stat(3258 MB used, 99141 MB avail, 102400 MB total, peers []/[])
2010-12-04 22:01:01.823864 7f7bfd2f1710 osd0 96 heartbeat: stat(2010-12-04 22:01:01.823671 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-04 22:01:01.823898 7f7bfd2f1710 osd0 96 heartbeat: osd_stat(3258 MB used, 99141 MB avail, 102400 MB total, peers []/[])
os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)':
os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq)
 ceph version 0.24~rc (commit:23f370436eeb20f02c0d20699f53071a1424cebf)
 1: (FileJournal::committed_thru(unsigned long)+0xb1) [0x58a931]
 2: (JournalingObjectStore::commit_finish()+0x8c) [0x58110c]
 3: (FileStore::sync_entry()+0xd03) [0x573853]
 4: (FileStore::SyncThread::entry()+0xd) [0x507bed]
 5: (Thread::_entry_func(void*)+0xa) [0x4797fa]
 6: (()+0x69ca) [0x7f7c0aa7d9ca]
 7: (clone()+0x6d) [0x7f7c0970970d]

This is on the RC branch.

I've ran cdebugpack to collect data, it's stored at logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_committed_thru

Note that the logs were already rotated before I found out the OSD's crashed, so you should the the *.1.gz files.

osd.1.log View (3.92 MB) Wido den Hollander, 12/06/2010 03:13 AM

Associated revisions

Revision b4264fbb (diff)
Added by Sage Weil over 8 years ago

filejournal: reset last_commited_seq if we find journal to be invalid

If we read an event that's later than our expected entry, we set read_pos
to -1 and discard the journal. If that happens we also need to reset
last_committed_seq to avoid a crash like

2010-12-08 17:04:39.246950 7f269d138910 journal commit_finish thru 16904
2010-12-08 17:04:39.246961 7f269d138910 journal committed_thru 16904 < last_committed_seq 37778589
os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)':
os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq)
ceph version 0.24~rc (fe10300317383ec29948d7dbe3cb31b3aa277e3c)
1: (FileJournal::committed_thru(unsigned long)+0xad) [0x588e7d]
2: (JournalingObjectStore::commit_finish()+0x8c) [0x57f2ec]
3: (FileStore::sync_entry()+0xcff) [0x5764cf]
4: (FileStore::SyncThread::entry()+0xd) [0x506d9d]
5: (Thread::_entry_func(void*)+0xa) [0x4790ba]
6: /lib/libpthread.so.0 [0x7f26a2f8373a]
7: (clone()+0x6d) [0x7f26a1c2569d]

Fixes #631

Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 8 years ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
  • Target version set to v0.24

#2 Updated by Sage Weil over 8 years ago

Wido, any chance you can reproduce this with 'debug filestore = 20' and 'debug journal = 20'?

#3 Updated by Wido den Hollander over 8 years ago

Yes, simply starting the OSD's again gave me the same crash on one OSD.

I've attached the log, but here are the last lines:

2010-12-06 11:58:33.041946 7f6b7adde710 filestore(/var/lib/ceph/osd.1) apply queued
2010-12-06 11:58:33.041959 7f6b7adde710 filestore(/var/lib/ceph/osd.1) queue_transactions (writeahead) 50828 0x7f6b7addca90
2010-12-06 11:58:33.356071 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3848 MB used, 98551 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:33.356124 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:33.355967 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:33.356153 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3848 MB used, 98551 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:34.356340 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3848 MB used, 98551 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:34.356389 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:34.356229 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:34.356436 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3848 MB used, 98551 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:34.751342 7f6b7e5e5710 filestore(/var/lib/ceph/osd.1) sync_entry woke after 5.000064
2010-12-06 11:58:34.751418 7f6b7e5e5710 filestore(/var/lib/ceph/osd.1) sync_entry committing 50827 sync_epoch 1
2010-12-06 11:58:34.751502 7f6b7e5e5710 filestore(/var/lib/ceph/osd.1) taking snap 'snap_50827'
2010-12-06 11:58:34.849010 7f6b7ede6710 filestore(/var/lib/ceph/osd.1) _journaled_ahead 50828 0x7f6b7addca90
2010-12-06 11:58:35.756652 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:35.756720 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:35.756514 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:35.756753 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:36.356963 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:36.357022 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:36.356832 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:36.357056 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:37.357194 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:37.357225 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:37.357132 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:37.357246 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:38.457381 7f6b76cd5710 osd1 101 update_osd_stat osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:38.457413 7f6b76cd5710 osd1 101 heartbeat: stat(2010-12-06 11:58:38.457320 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-06 11:58:38.457435 7f6b76cd5710 osd1 101 heartbeat: osd_stat(3849 MB used, 98550 MB avail, 102400 MB total, peers []/[])
2010-12-06 11:58:38.490337 7f6b7e5e5710 filestore(/var/lib/ceph/osd.1) snap create 'snap_50827' got 0 Success
2010-12-06 11:58:38.490372 7f6b7e5e5710 filestore(/var/lib/ceph/osd.1) sync_entry commit took 3.738954
2010-12-06 11:58:38.490389 7f6b7ede6710 filestore(/var/lib/ceph/osd.1) queue_op new osr 0x1a00240/0x1976538
2010-12-06 11:58:38.490409 7f6b7ede6710 filestore(/var/lib/ceph/osd.1) queue_op 0x1966780 seq 50828 2605 bytes   (queue has 1 ops and 2605 bytes)
2010-12-06 11:58:38.490436 7f6b7dde4710 filestore(/var/lib/ceph/osd.1) _do_op 0x1966780 50828 osr 0x1a00240/0x1976538 start
2010-12-06 11:58:38.490453 7f6b7dde4710 filestore(/var/lib/ceph/osd.1) _do_transaction on 0x7f6b7addca90
2010-12-06 11:58:38.490477 7f6b7dde4710 filestore(/var/lib/ceph/osd.1) write /var/lib/ceph/osd.1/current/meta/osdmap.101_0 0~2517
2010-12-06 11:58:38.492198 7f6b7dde4710 filestore(/var/lib/ceph/osd.1) queue_flusher ep 1 fd 14 0~2517 qlen 1
2010-12-06 11:58:38.492219 7f6b7dde4710 filestore(/var/lib/ceph/osd.1) write /var/lib/ceph/osd.1/current/meta/osdmap.101_0 0~2517 = 2517
os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)':
os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq)
 ceph version 0.24~rc (commit:23f370436eeb20f02c0d20699f53071a1424cebf)
 1: (FileJournal::committed_thru(unsigned long)+0xb1) [0x58a931]
 2: (JournalingObjectStore::commit_finish()+0x8c) [0x58110c]
 3: (FileStore::sync_entry()+0xd03) [0x573853]
 4: (FileStore::SyncThread::entry()+0xd) [0x507bed]
 5: (Thread::_entry_func(void*)+0xa) [0x4797fa]
 6: (()+0x69ca) [0x7f6b844619ca]
 7: (clone()+0x6d) [0x7f6b830ed70d]

#4 Updated by Sage Weil over 8 years ago

I updated the above comment but I suspect you only looked at the email notification? In any case, can you reproduce with 'debug journal = 20' too? I can't seem to ssh into noisy (timeout).

THanks!

#5 Updated by Wido den Hollander over 8 years ago

Oh yes, there seems to be a issue with the IPv6 connectivity where 'noisy' is at.

I added debug journal = 20 (I indeed only read the notifications), but right now the OSD's won't crash.

#6 Updated by Sage Weil over 8 years ago

  • Target version changed from v0.24 to v0.24.1

#7 Updated by Sage Weil over 8 years ago

Okay, confirmed it was the same bug you were seeing:

2010-12-06 11:58:28.687261 7f6b84a74720 filestore(/var/lib/ceph/osd.1) mount found snaps <17539,50821,50828,50814>
2010-12-06 11:58:28.687353 7f6b84a74720 filestore(/var/lib/ceph/osd.1) *** curr_seq=50814 cp=50814
2010-12-06 11:58:29.558965 7f6b84a74720 filestore(/var/lib/ceph/osd.1) mount rolled back to consistent snap 50814
2010-12-06 11:58:29.559050 7f6b84a74720 filestore(/var/lib/ceph/osd.1) mount op_seq is 50813
2010-12-06 11:58:29.559069 7f6b84a74720 filestore(/var/lib/ceph/osd.1) open_journal at /dev/osd-data/osd1-journal
2010-12-06 11:58:29.559130 7f6b84a74720 filestore(/var/lib/ceph/osd.1) mount: enabling WRITEAHEAD journal mode: btrfs SNAP_CREATE_ASYNC ioctl not detected (v2.6.37+)
2010-12-06 11:58:29.740815 7f6b84a74720 journal read_entry 237654016 : seq 50829 5355 bytes

The question though is HOW your journal ended up starting in the future relative to your most recent snap...

#8 Updated by Sage Weil over 8 years ago

Okay, pretty sure this was caused by a bug in 2.6.37-rc that was doing an async commit even for the sync snap/subvol create ioctls. That would allow cosd to trim the journal before the btrfs fully committed. That would make sense if at some point the machine crashed on the 2.6.37-rc kernel. And thereafter failed to start (on any kernel) with the above assertion.

Wido, is that possible? If not, I'm at a loss how it got into this state... :/

#9 Updated by Wido den Hollander over 8 years ago

To answer your question, I never saw a crash with 2.6.37, I just rebooted back into 2.6.32

Just rebooting into 2.6.37 again gave me a crash on one OSD:

2010-12-10 09:15:05.913940 7f366dfa1710 filestore(/var/lib/ceph/osd.0) snap create 'snap_119961' got -1 File exists
2010-12-10 09:15:05.913960 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 39 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.913976 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 40 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.913988 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 41 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.913999 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 42 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914020 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 43 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914033 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 44 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914045 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 45 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914056 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 46 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914068 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 47 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914079 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 48 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914090 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 49 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914101 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 50 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914112 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 51 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914123 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 52 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914133 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 53 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914144 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 54 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914155 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 55 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914166 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 56 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914176 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 57 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914188 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 19 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914199 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry JUST closing 20 (stop=0, ep=3, sync_epoch=4)
2010-12-10 09:15:05.914210 7f366c79e710 filestore(/var/lib/ceph/osd.0) flusher_entry sleeping
os/FileStore.cc: In function 'void FileStore::sync_entry()':
os/FileStore.cc:2337: FAILED assert(r == 0)
 ceph version 0.24~rc (commit:f68e6e7d38f0ed3a5580bfc72f6d16abc1427081)
 1: (FileStore::sync_entry()+0x201f) [0x574cff]
 2: (FileStore::SyncThread::entry()+0xd) [0x507dcd]
 3: (Thread::_entry_func(void*)+0xa) [0x47985a]
 4: (()+0x69ca) [0x7f3673e1d9ca]
 5: (clone()+0x6d) [0x7f3672aa970d]

Just to be sure, I've ran cdebugpack and uploaded the content to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_committed_thru, filename: filestore_sync_entry.tar.gz

Rebooting back into 2.6.32 prevented osd0 (the one that crashed) from starting again:

ceph version 0.24~rc (commit:f68e6e7d38f0ed3a5580bfc72f6d16abc1427081)
2010-12-10 11:50:06.123579 7fb80923e720 filestore(/var/lib/ceph/osd.0) test_mount basedir /var/lib/ceph/osd.0 journal /dev/osd-data/osd0-journal
2010-12-10 11:50:06.124484 7fb80923e720 osd0 0 mounting /var/lib/ceph/osd.0 /dev/osd-data/osd0-journal
2010-12-10 11:50:06.124567 7fb80923e720 filestore(/var/lib/ceph/osd.0) basedir /var/lib/ceph/osd.0 journal /dev/osd-data/osd0-journal
2010-12-10 11:50:06.139564 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount FIEMAP ioctl is NOT supported
2010-12-10 11:50:06.139647 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount detected btrfs
2010-12-10 11:50:06.139663 7fb80923e720 filestore(/var/lib/ceph/osd.0) _do_clone_range 0~1
2010-12-10 11:50:06.139676 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount btrfs CLONE_RANGE ioctl is supported
2010-12-10 11:50:06.391953 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount btrfs SNAP_CREATE is supported
2010-12-10 11:50:06.911638 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount btrfs SNAP_DESTROY is supported
2010-12-10 11:50:06.911735 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount btrfs START_SYNC got -1 Operation not permitted
2010-12-10 11:50:06.911767 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount btrfs START_SYNC is NOT supported: Operation not permitted
2010-12-10 11:50:06.911781 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount WARNING: btrfs snaps enabled, but no SNAP_CREATE_ASYNC ioctl (from kernel 2.6.37+)
2010-12-10 11:50:06.911898 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount fsid is 629477477
2010-12-10 11:50:06.912029 7fb80923e720 filestore(/var/lib/ceph/osd.0) mount found snaps <119961,119955,119958>
2010-12-10 11:50:07.079718 7fb80923e720 filestore(/var/lib/ceph/osd.0) *** curr_seq=119961 cp=119958
2010-12-10 11:50:07.079740 7fb80923e720 filestore(/var/lib/ceph/osd.0) 
 ** ERROR: current volume data version is not equal to snapshotted version
           which can lead to data inconsistency. 
           Current version=119961 snapshot version=119958
           Startup with snapshotted version can be forced using the
            'osd use stale snap = true' config option.

Is this due to your recent fix? Checked your diff and it doesn't seem so.

I then rebooted back into 2.6.37 again, but osd0 wouldn't start there too:

2010-12-10 11:54:53.937120 7f473c8c8720 filestore(/var/lib/ceph/osd.0) *** curr_seq=119961 cp=119958
2010-12-10 11:54:53.937181 7f473c8c8720 filestore(/var/lib/ceph/osd.0) 
 ** ERROR: current volume data version is not equal to snapshotted version
           which can lead to data inconsistency. 
           Current version=119961 snapshot version=119958
           Startup with snapshotted version can be forced using the
            'osd use stale snap = true' config option.

Adding 'osd use stale snap = true' to the ceph.conf made the OSD start again, the cluster would also recover.

I then (while on 2.6.37) tried to write something to the cluster, just to test if I could crash it afterwards when going back to 2.6.32, but that failed, 'rados -p tmp put obj001 xxxxxxxxx.tar.gz' blocked for ever, resulting in:

2010-12-10 12:04:49.597070    pg v40738: 832 pgs: 1 active, 831 active+clean; 5736 MB data, 14384 MB used, 279 GB / 300 GB avail; 1/3080 degraded (0.032%)
2010-12-10 12:04:49.599457   mds e795: 1/1/1 up {0=up:active}
2010-12-10 12:04:49.599497   osd e194: 3 osds: 3 up, 3 in
2010-12-10 12:04:49.599571   log 2010-12-10 12:00:15.878770 mon0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0 10 : [INF] mds0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/1936 up:active
2010-12-10 12:04:49.599681   class rbd (v1.3 [x86-64])
2010-12-10 12:04:49.599703   mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0}

The OSD logs don't show anything strange to me, so I'm not sure why the cluster wouldn't recover. Restarting the OSD's fixed it.

So I was not able to reproduce this issue. Was this any helpfull?

#10 Updated by Sage Weil over 8 years ago

Okay, the first crash you saw is due to #645. I think it's a kernel bug causing that ioctl to fail.

This is actually the one weakness with the sanity check you're running into: a crash after the commit_op_seq file is updated but before the snapshot create commits will make this come up. See #55... I think we need a better/different fix there.

#11 Updated by Sage Weil over 8 years ago

  • Status changed from New to Won't Fix

Sage Weil wrote:

Okay, the first crash you saw is due to #645. I think it's a kernel bug causing that ioctl to fail.

This is actually the one weakness with the sanity check you're running into: a crash after the commit_op_seq file is updated but before the snapshot create commits will make this come up. See #55... I think we need a better/different fix there.

After talking with Yehuda, the risk is a pretty narrow one. Basically, the modification has to commit without the snapshot committing. That a independent commit has to start right between us updating the file and trying to create the snapshot. And then we also would have to crash between that transaction committing and our transaction committing.

Not going to worry about it for now.

Also available in: Atom PDF