Project

General

Profile

Bug #9144

filestore: commit triggered during journal replay

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly, dumpling
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2014-08-15 05:39:17.703868 7f39a2e6e8c0  2 osd.2 0 mounting /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2014-08-15 05:39:17.703876 7f39a2e6e8c0  5 filestore(/var/lib/ceph/osd/ceph-2) basedir /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal
2014-08-15 05:39:17.703894 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) mount fsid is 738c52e8-bfbf-452a-bf91-39ea1afbcdce
2014-08-15 05:39:17.703931 7f39a2e6e8c0  0 filestore(/var/lib/ceph/osd/ceph-2) backend btrfs (magic 0x9123683e)
2014-08-15 05:39:17.706773 7f39a2e6e8c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is supported and appears to work
2014-08-15 05:39:17.706792 7f39a2e6e8c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-08-15 05:39:17.710399 7f39a2e6e8c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-08-15 05:39:17.710517 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: CLONE_RANGE ioctl is supported
2014-08-15 05:39:17.714707 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE is supported
2014-08-15 05:39:17.714894 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_DESTROY is supported
2014-08-15 05:39:17.714991 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: START_SYNC is supported (transid 26)
2014-08-15 05:39:17.716735 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: WAIT_SYNC is supported
2014-08-15 05:39:17.718052 7f39a2e6e8c0  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE_V2 is supported
2014-08-15 05:39:17.718545 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2)  current/ seq was 2
2014-08-15 05:39:17.718554 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2)  most recent snap from <1,2,1,2> is 2
2014-08-15 05:39:17.718560 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) mount rolling back to consistent snap 2
2014-08-15 05:39:17.718563 7f39a2e6e8c0 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: to 'snap_2'
2014-08-15 05:39:17.720583 7f39a2e6e8c0  5 filestore(/var/lib/ceph/osd/ceph-2) mount op_seq is 2
2014-08-15 05:39:17.723597 7f39a2e6e8c0 20 filestore (init)dbobjectmap: seq is 1
2014-08-15 05:39:17.723614 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) open_journal at /var/lib/ceph/osd/ceph-2/journal
2014-08-15 05:39:17.723636 7f39a2e6e8c0  0 filestore(/var/lib/ceph/osd/ceph-2) mount: PARALLEL journal mode explicitly enabled in conf
2014-08-15 05:39:17.723638 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) list_collections
2014-08-15 05:39:17.725914 7f39a2e6e8c0 10 journal journal_replay fs op_seq 2
2014-08-15 05:39:17.725933 7f399a7e6700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry not waiting, force_sync set
2014-08-15 05:39:17.725939 7f399a7e6700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry force_sync set
2014-08-15 05:39:17.725943 7f399a7e6700 10 journal commit_start max_applied_seq 0, open_ops 0
2014-08-15 05:39:17.725945 7f399a7e6700 10 journal commit_start blocked, all open_ops have completed
2014-08-15 05:39:17.729188 7f399a7e6700 10 journal commit_start committing 0, still blocked
2014-08-15 05:39:17.729214 7f399a7e6700 10 journal commit_start
2014-08-15 05:39:17.729218 7f399a7e6700 15 filestore(/var/lib/ceph/osd/ceph-2) sync_entry committing 0 sync_epoch 1
2014-08-15 05:39:17.729226 7f39a2e6e8c0  2 journal open /var/lib/ceph/osd/ceph-2/journal fsid 738c52e8-bfbf-452a-bf91-39ea1afbcdce fs_op_seq 2
2014-08-15 05:39:17.729247 7f399a7e6700 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) create_checkpoint: 'snap_0'
2014-08-15 05:39:17.729254 7f39a2e6e8c0 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-08-15 05:39:17.729256 7f39a2e6e8c0 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-2/journal'
2014-08-15 05:39:17.729286 7f39a2e6e8c0  1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 20: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2014-08-15 05:39:17.729289 7f39a2e6e8c0 10 journal read_header
2014-08-15 05:39:17.729376 7f39a2e6e8c0 10 journal header: block_size 4096 alignment 4096 max_size 104857600
2014-08-15 05:39:17.729385 7f39a2e6e8c0 10 journal header: start 8192
2014-08-15 05:39:17.729386 7f39a2e6e8c0 10 journal  write_pos 4096
2014-08-15 05:39:17.729388 7f39a2e6e8c0 10 journal open header.fsid = 738c52e8-bfbf-452a-bf91-39ea1afbcdce
2014-08-15 05:39:17.729401 7f39a2e6e8c0  2 journal No further valid entries found, journal is most likely valid
2014-08-15 05:39:17.729402 7f39a2e6e8c0 10 journal open reached end of journal.
2014-08-15 05:39:17.729406 7f39a2e6e8c0  2 journal No further valid entries found, journal is most likely valid
2014-08-15 05:39:17.729407 7f39a2e6e8c0  3 journal journal_replay: end of journal, done.
2014-08-15 05:39:17.730587 7f39a2e6e8c0 10 journal make_writeable
2014-08-15 05:39:17.730607 7f39a2e6e8c0 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-2/journal'
2014-08-15 05:39:17.730641 7f39a2e6e8c0  1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 20: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2014-08-15 05:39:17.730683 7f39987e2700 10 journal write_thread_entry start
2014-08-15 05:39:17.730699 7f39a2e6e8c0 10 journal journal_start
2014-08-15 05:39:17.730699 7f39987e2700 20 journal prepare_multi_write queue_pos now 8192
2014-08-15 05:39:17.730739 7f3997fe1700 10 journal write_finish_thread_entry enter
2014-08-15 05:39:17.730831 7f3997fe1700 20 journal write_finish_thread_entry sleeping
2014-08-15 05:39:17.730808 7f39987e2700 15 journal do_write writing 8192~0 + header
2014-08-15 05:39:17.732153 7f39a2e6e8c0  2 osd.2 0 boot
2014-08-15 05:39:17.732171 7f39a2e6e8c0 15 filestore(/var/lib/ceph/osd/ceph-2) read meta/23c2fcde/osd_superblock/0//-1 0~0
2014-08-15 05:39:17.735057 7f399a7e6700 20 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) create_checkpoint: async snap create 'snap_0' transid 29
2014-08-15 05:39:17.735170 7f399a7e6700 10 journal commit_started committing 0, unblocking
2014-08-15 05:39:17.735182 7f399a7e6700 20 filestore(/var/lib/ceph/osd/ceph-2)  waiting for checkpoint 29 to complete
2014-08-15 05:39:17.735185 7f399a7e6700 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) sync_checkpoint: transid 29 to complete
2014-08-15 05:39:17.735198 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~358/358
2014-08-15 05:39:17.735220 7f39a2e6e8c0 10 osd.2 0 read_superblock sb(fd0d793c-a0b1-4588-ae87-192ab57c8d8f osd.2 738c52e8-bfbf-452a-bf91-39ea1afbcdce e0 [0,0] lci=[0,0])
2014-08-15 05:39:17.735264 7f39a2e6e8c0 10 filestore(/var/lib/ceph/osd/ceph-2) stat meta/16ef7597/infos/head//-1 = -2
2014-08-15 05:39:17.735282 7f39a2e6e8c0 10 osd.2 0 init creating/touching infos object
2014-08-15 05:39:17.735296 7f39a2e6e8c0  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions new osr(default 0x3b549f0)/0x3b549f0
2014-08-15 05:39:17.735405 7f399a7e6700 20 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) sync_checkpoint: done waiting for transid 29
2014-08-15 05:39:17.735414 7f399a7e6700 20 filestore(/var/lib/ceph/osd/ceph-2)  done waiting for checkpoint29 to complete
2014-08-15 05:39:17.735417 7f399a7e6700 10 filestore(/var/lib/ceph/osd/ceph-2) sync_entry commit took 0.006201, interval was 0.009483
2014-08-15 05:39:17.735743 7f39987e2700 20 journal do_write latency 0.004850
2014-08-15 05:39:17.736851 7f39a2e6e8c0 10 journal op_submit_start 3
2014-08-15 05:39:17.736862 7f39a2e6e8c0  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions (parallel) 3 0x7fff8b916630
2014-08-15 05:39:17.736865 7f39a2e6e8c0 10 journal op_journal_transactions 3 0x7fff8b916630
2014-08-15 05:39:17.736873 7f39a2e6e8c0  5 journal submit_entry seq 3 len 105 (0)
2014-08-15 05:39:17.736872 7f399a7e6700 10 journal commit_finish thru 0
2014-08-15 05:39:17.746756 7f399a7e6700  5 journal committed_thru 0 < last_committed_seq 2
2014-08-15 05:39:17.746776 7f39a2e6e8c0  5 filestore(/var/lib/ceph/osd/ceph-2) queue_op 0x3ae7d10 seq 3 osr(default 0x3b549f0) 99 bytes   (queue has 1 ops and 99 bytes)
2014-08-15 05:39:17.747817 7f399a7e6700 -1 os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)' thread 7f399a7e6700 time 2014-08-15 05:39:17.746773
os/FileJournal.cc: 1510: FAILED assert(seq >= last_committed_seq)

 ceph version 0.83-713-gadecf57 (adecf57fce2b5888b6056a7792b3848875841fce)
 1: (FileJournal::committed_thru(unsigned long)+0x507) [0xa055e7]
 2: (JournalingObjectStore::ApplyManager::commit_finish()+0x82) [0x8e9942]
 3: (FileStore::sync_entry()+0x973) [0x89f183]
 4: (FileStore::SyncThread::entry()+0xd) [0x8d6ffd]
 5: (()+0x8182) [0x7f39a1d69182]
 6: (clone()+0x6d) [0x7f39a02d538d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/var/lib/teuthworker/archive/samuelj-2014-08-14_18:41:07-rados-wip-sam-testing-testing-basic-multi/425503 (but no logs there)

Associated revisions

Revision dd11042f (diff)
Added by Sage Weil over 9 years ago

os/FileStore: fix mount/remount force_sync race

Consider:

- mount
- sync_entry is doing some work
- umount
- set force_sync = true
- set done = true
- sync_entry exits (due to done)
- ..but does not set force_sync = false
- mount
- journal replay starts
- sync_entry sees force_sync and does a commit while op_seq == 0
...crash...

Fixes: #9144
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

Revision 2f11631f (diff)
Added by Sage Weil over 9 years ago

os/FileStore: fix mount/remount force_sync race

Consider:

- mount
- sync_entry is doing some work
- umount
- set force_sync = true
- set done = true
- sync_entry exits (due to done)
- ..but does not set force_sync = false
- mount
- journal replay starts
- sync_entry sees force_sync and does a commit while op_seq == 0
...crash...

Fixes: #9144
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>
(cherry picked from commit dd11042f969b94f7a461d02e1475794031c79f61)

Conflicts:
src/os/FileStore.cc

Revision 53363986 (diff)
Added by Sage Weil over 9 years ago

os/FileStore: fix mount/remount force_sync race

Consider:

- mount
- sync_entry is doing some work
- umount
- set force_sync = true
- set done = true
- sync_entry exits (due to done)
- ..but does not set force_sync = false
- mount
- journal replay starts
- sync_entry sees force_sync and does a commit while op_seq == 0
...crash...

Fixes: #9144
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>
(cherry picked from commit dd11042f969b94f7a461d02e1475794031c79f61)

Conflicts:
src/os/FileStore.cc

History

#1 Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review
  • Backport set to firefly, dumpling

#2 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF