https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2011-03-10T10:26:42ZCeph Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26682011-03-10T10:26:42ZSage Weilsage@newdream.net
<ul><li><strong>Assignee</strong> set to <i>Sage Weil</i></li></ul> Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26702011-03-10T11:43:10ZSage Weilsage@newdream.net
<ul></ul><p>Ah, this is my fault. I made a copy of the files in 3.7c9 in a subdir called 't' (they were missing xattrs... :/) while debugging the old issue. And then when cosd went and removed all objects, the rmdir on 3.7c9_head failed (not empty). And now when it starts up it sees the dir but no info, and crashes. Just remove the dir from the most recent snap_* dir and it should start right up.</p>
<p>I'm not sure what the proper behavior here should be. We can throw the error when the rmdir fails and crash then? Or log something and continue?</p> Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26712011-03-10T11:56:12ZWido den Hollanderwido@42on.com
<ul></ul><p>I do not think that crashing due to one faulty dir is what I'd do, but on the other hand, it will force a admin to keep the OSD's datadir 'sane', which might prevent further issues in the future.</p>
<p>I could live with both options, you could make it a config option which defaults to crashing?</p> Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26722011-03-10T12:22:54ZWido den Hollanderwido@42on.com
<ul></ul><p>Just thought about this, will this something a admin would run into? I ran into this due to the recovery issue. But in a real production env, wouldn't you just wipe the OSD? It's not likely to come back, unless someone messes around with the OSD datadir.</p> Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26772011-03-10T13:11:12ZSage Weilsage@newdream.net
<ul></ul><p>Wido den Hollander wrote:</p>
<blockquote>
<p>Just thought about this, will this something a admin would run into? I ran into this due to the recovery issue. But in a real production env, wouldn't you just wipe the OSD? It's not likely to come back, unless someone messes around with the OSD datadir.</p>
</blockquote>
<p>Right. This only happened because I polluted things with files that shouldn't be there.</p>
<p>For now I'll just assert on ENOTEMPTY; that makes the most sense given the existing error handling.</p> Ceph - Bug #872: osd: crash due to missing pginfohttps://tracker.ceph.com/issues/872?journal_id=26782011-03-10T13:13:03ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul>