Project

General

Profile

Actions

Bug #6756

closed

journal full hang on startup

Added by Samuel Just over 10 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-11-10 22:51:06.073780 7fd833945780 2 journal open /var/lib/ceph/osd/ceph-4/journal fsid ed7f3df7-52a6-4fc7-a6ca-96a53465bc10 fs_op_seq 130604
...
2013-11-10 22:51:06.170911 7fd833945780 2 journal read_entry 27324416 : seq 130604 920 bytes
2013-11-10 22:51:06.170918 7fd833945780 2 journal No further valid entries found, journal is most likely valid
2013-11-10 22:51:06.170920 7fd833945780 10 journal open reached end of journal.
2013-11-10 22:51:06.170924 7fd833945780 2 journal No further valid entries found, journal is most likely valid
2013-11-10 22:51:06.170925 7fd833945780 3 journal journal_replay: end of journal, done.
...
2013-11-10 22:51:06.174836 7fd833945780 10 osd.4 1198 load_pgs
2013-11-10 22:51:06.174838 7fd833945780 10 filestore(/var/lib/ceph/osd/ceph-4) list_collections
2013-11-10 22:51:06.175038 7fd833945780 10 osd.4 1198 load_pgs 3.35_TEMP clearing temp
2013-11-10 22:51:06.175072 7fd833945780 20 prefixes
2013-11-10 22:51:06.175085 7fd833945780 5 filestore(/var/lib/ceph/osd/ceph-4) queue_transactions new osr(default 0x2aa6b50)/0x2aa6b50
2013-11-10 22:51:06.175096 7fd833945780 10 journal op_submit_start 130605
2013-11-10 22:51:06.175097 7fd833945780 5 filestore(/var/lib/ceph/osd/ceph-4) queue_transactions (writeahead) 130605 0x7fff6bcbe3a0
2013-11-10 22:51:06.175099 7fd833945780 10 journal op_journal_transactions 130605 0x7fff6bcbe3a0
2013-11-10 22:51:06.175103 7fd833945780 5 journal submit_entry seq 130605 len 57 (0x2fa7f90)
2013-11-10 22:51:06.175116 7fd833945780 10 journal op_submit_finish 130605
2013-11-10 22:51:06.175125 7fd829c57700 20 journal write_thread_entry woke up
2013-11-10 22:51:06.175135 7fd829c57700 10 journal room 4095 max_size 104857600 pos 27324416 header.start 27328512 top 4096
2013-11-10 22:51:06.175138 7fd829c57700 1 journal check_for_full at 27324416 : JOURNAL FULL 27324416 >= 4095 (max_size 104857600 start 27328512)
2013-11-10 22:51:06.175143 7fd829c57700 20 journal prepare_multi_write full on first entry, need to wait
2013-11-10 22:51:06.175145 7fd829c57700 20 journal write_thread_entry full, going to sleep (waiting for commit)
2013-11-10 22:51:11.073901 7fd82a458700 20 filestore(/var/lib/ceph/osd/ceph-4) sync_entry woke after 5.000119
2013-11-10 22:51:11.073924 7fd82a458700 10 journal commit_start max_applied_seq 130604, open_ops 0
2013-11-10 22:51:11.073926 7fd82a458700 10 journal commit_start blocked, all open_ops have completed
2013-11-10 22:51:11.073927 7fd82a458700 10 journal commit_start nothing to do
2013-11-10 22:51:11.073928 7fd82a458700 10 journal commit_start

Testing fix on wip-queueing2.


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #8204: "timed out waiting for admin_socket to appear after osd.5 restart" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vpsDuplicate04/24/2014

Actions
Has duplicate Ceph - Bug #9416: ods crash in upgrade:dumpling-dumpling-distro-basic-vps run DuplicateSamuel Just09/10/2014

Actions
Actions #1

Updated by Samuel Just over 10 years ago

  • Backport set to emperor, dumpling, cuttlefish, bobtail
Actions #2

Updated by Samuel Just over 10 years ago

  • Status changed from 7 to Pending Backport
Actions #3

Updated by Samuel Just over 10 years ago

  • Status changed from Pending Backport to 12
Actions #4

Updated by Ian Colle over 10 years ago

  • Assignee changed from Samuel Just to Dan Mick
Actions #5

Updated by Dan Mick over 10 years ago

  • Status changed from 12 to Need More Info

What's the status on this? Last comment was "Testing fix on wip-queueing2"...does that mean that's when this bug manifested, or there is a fix for this bug in testing?

Actions #6

Updated by Samuel Just over 10 years ago

d8d27f13e11dcaefd3aa1c049b97c980283da575 was my first attempt.
Reverted due to bugginess in 703f9a09e2449712a99f0865db982cb0c66d820d.
15139462378fae466796a3760fcf0cda80346b81 in wip-queueing2 seems to think it fixes the issues with d8d27f13e11dcaefd3aa1c049b97c980283da575. You probably need to cherry-pick d8d27f13e11dcaefd3aa1c049b97c980283da575 and 15139462378fae466796a3760fcf0cda80346b81 and verify that it fixes the original issue and works.

Actions #7

Updated by Sage Weil about 10 years ago

  • Assignee deleted (Dan Mick)
Actions #8

Updated by Sage Weil about 10 years ago

  • Priority changed from Urgent to High
Actions #9

Updated by Samuel Just about 10 years ago

ubuntu@teuthology:/a/teuthology-2014-04-02_02:30:02-rados-master-testing-basic-plana/161291/remote

Actions #10

Updated by Samuel Just about 10 years ago

  • Priority changed from High to Urgent
Actions #11

Updated by Samuel Just about 10 years ago

  • Status changed from Need More Info to 12
Actions #12

Updated by Samuel Just almost 10 years ago

ubuntu@teuthology:/a/samuelj-2014-05-09_14:11:19-rados-wip-sam-testing-testing-basic-plana/245820

Actions #13

Updated by Sage Weil almost 10 years ago

  • Priority changed from Urgent to High
Actions #14

Updated by Sage Weil over 9 years ago

2014-10-06 17:20:28.761031 7fe98fb98780  5 filestore(/var/lib/ceph/osd/ceph-4) queue_transactions (writeahead) 24210 0x7fff60309a10
2014-10-06 17:20:28.761036 7fe98fb98780 10 journal op_journal_transactions 24210 0x7fff60309a10
2014-10-06 17:20:28.761041 7fe98fb98780  5 journal submit_entry seq 24210 len 57 (0x246fbd0)
2014-10-06 17:20:28.761054 7fe98fb98780 10 journal op_submit_finish 24210
2014-10-06 17:20:28.761066 7fe984b6b700 20 journal write_thread_entry woke up
2014-10-06 17:20:28.761080 7fe984b6b700 10 journal room 4095 max_size 104857600 pos 89251840 header.start 89255936 top 4096
2014-10-06 17:20:28.761092 7fe984b6b700  1 journal check_for_full at 89251840 : JOURNAL FULL 89251840 >= 4095 (max_size 104857600 start 89255936)
2014-10-06 17:20:28.761101 7fe984b6b700 20 journal prepare_multi_write full on first entry, need to wait
2014-10-06 17:20:28.761103 7fe984b6b700 20 journal write_thread_entry full, going to sleep (waiting for commit)

ubuntu@teuthology:/a/teuthology-2014-10-05_10:00:04-upgrade:firefly-firefly-distro-basic-multi/528752
Actions #15

Updated by Sage Weil over 9 years ago

  • Priority changed from High to Urgent
Actions #16

Updated by Sage Weil over 9 years ago

  • Assignee set to Sage Weil
Actions #17

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Fix Under Review

https://github.com/ceph/ceph/pull/2745

(rebased and retested old patch)

Actions #18

Updated by Samuel Just over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #19

Updated by Loïc Dachary over 9 years ago

  • Backport changed from emperor, dumpling, cuttlefish, bobtail to emperor, dumpling
Actions #20

Updated by Loïc Dachary about 9 years ago

3916c83 JounralingObjectStore: journal->committed_thru after replay (in dumpling),

Actions #21

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #22

Updated by Loïc Dachary about 9 years ago

  • Backport changed from emperor, dumpling to dumpling

emperor is end of life

Actions

Also available in: Atom PDF