https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-06-21T03:08:47Z
Ceph
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=168666
2020-06-21T03:08:47Z
Niklas Hambuechen
<ul></ul><p>I did not experience that in Mimic.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=168667
2020-06-21T03:18:57Z
Niklas Hambuechen
<ul></ul><p>In case it helps, here are `strace` invocations, each showing slightly different behaviour and error messages, that include the `close()` syscalls, so you can see which pattern leads to the suspected double-lock:</p>
<pre>
# strace -fye flock,close ceph-osd -i 0 --mkfs --osd-uuid f1889a99-b4dc-4044-a244-eabca4539ac9 --setuser ceph --setgroup ceph --osd-objectstore bluestore 2>&1 | grep -E 'flock|Resource|close.*dev/xvd|^\+\+\+ exited'
[pid 16133] close(10</dev/xvdg> <unfinished ...>
[pid 16133] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 16133] close(24</dev/xvdg>) = 0
[pid 16133] close(24</dev/xvdg>) = 0
[pid 16133] close(24</dev/xvdg>) = 0
[pid 16133] close(24</dev/xvdg>) = 0
[pid 16133] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 16133] close(36</dev/xvdh>) = 0
[pid 16133] close(48</dev/xvdg>) = 0
[pid 16133] close(24</dev/xvdh>) = 0
[pid 16133] close(25</dev/xvdh>) = 0
[pid 16133] close(26</dev/xvdh>) = 0
[pid 16133] close(27</dev/xvdh>) = 0
[pid 16133] close(28</dev/xvdh>) = 0
[pid 16133] close(29</dev/xvdh>) = 0
[pid 16133] close(30</dev/xvdh>) = 0
[pid 16133] close(31</dev/xvdh>) = 0
[pid 16133] close(32</dev/xvdh>) = 0
[pid 16133] close(33</dev/xvdh>) = 0
[pid 16133] close(34</dev/xvdh>) = 0
[pid 16133] close(35</dev/xvdh>) = 0
[pid 16133] close(36</dev/xvdg>) = 0
[pid 16133] close(37</dev/xvdg>) = 0
[pid 16133] close(38</dev/xvdg>) = 0
[pid 16133] close(39</dev/xvdg>) = 0
[pid 16133] close(40</dev/xvdg>) = 0
[pid 16133] close(41</dev/xvdg>) = 0
[pid 16133] close(42</dev/xvdg>) = 0
[pid 16133] close(43</dev/xvdg>) = 0
[pid 16133] close(44</dev/xvdg>) = 0
[pid 16133] close(45</dev/xvdg>) = 0
[pid 16133] close(46</dev/xvdg>) = 0
[pid 16133] close(47</dev/xvdg>) = 0
[pid 16133] close(12</dev/xvdg>) = 0
[pid 16133] close(13</dev/xvdg>) = 0
[pid 16133] close(14</dev/xvdg>) = 0
[pid 16133] close(15</dev/xvdg>) = 0
[pid 16133] close(16</dev/xvdg>) = 0
[pid 16133] close(17</dev/xvdg>) = 0
[pid 16133] close(18</dev/xvdg>) = 0
[pid 16133] close(19</dev/xvdg>) = 0
[pid 16133] close(20</dev/xvdg>) = 0
[pid 16133] close(21</dev/xvdg>) = 0
[pid 16133] close(22</dev/xvdg>) = 0
[pid 16133] close(23</dev/xvdg>) = 0
[pid 16133] close(10</dev/xvdg>) = 0
[pid 16133] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
2020-06-21T03:11:10.417+0000 7fa45796edc0 -1 bdev(0x55babde3c000 /var/lib/ceph/osd/ceph-0/block) _lock flock failed on /var/lib/ceph/osd/ceph-0/block[pid 16133] close(12</dev/xvdg>) = 0
[pid 16133] close(13</dev/xvdg>) = 0
2020-06-21T03:11:10.417+0000 7fa45796edc0 -1 bdev(0x55babde3c000 /var/lib/ceph/osd/ceph-0/block) open failed to lock /var/lib/ceph/osd/ceph-0/block: (11) Resource temporarily unavailable[pid 16133] close(14</dev/xvdg>
[pid 16133] close(15</dev/xvdg>) = 0
[pid 16133] close(16</dev/xvdg>) = 0
[pid 16133] close(17</dev/xvdg>) = 0
[pid 16133] close(18</dev/xvdg>) = 0
[pid 16133] close(19</dev/xvdg>) = 0
[pid 16133] close(20</dev/xvdg>) = 0
[pid 16133] close(21</dev/xvdg>) = 0
[pid 16133] close(22</dev/xvdg>) = 0
[pid 16133] close(23</dev/xvdg>) = 0
2020-06-21T03:11:10.418+0000 7fa45796edc0 -1 OSD::mkfs: couldn't mount ObjectStore: error (11) Resource temporarily unavailable
2020-06-21T03:11:10.419+0000 7fa45796edc0 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (11) Resource temporarily unavailable
+++ exited with 250 +++
# strace -fye flock,close ceph-osd -i 0 --mkfs --osd-uuid f1889a99-b4dc-4044-a244-eabca4539ac9 --setuser ceph --setgroup ceph --osd-objectstore bluestore 2>&1 | grep -E 'flock|Resource|close.*dev/xvd|^\+\+\+ exited'
[pid 19261] close(10</dev/xvdg>) = 0
[pid 19261] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 19261] close(24</dev/xvdg>) = 0
[pid 19261] close(24</dev/xvdg>) = 0
[pid 19261] close(24</dev/xvdg>) = 0
[pid 19261] close(24</dev/xvdg>) = 0
[pid 19261] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 19261] close(36</dev/xvdh>) = 0
[pid 19261] close(48</dev/xvdg>) = 0
[pid 19261] close(24</dev/xvdh>) = 0
[pid 19261] close(25</dev/xvdh>) = 0
[pid 19261] close(26</dev/xvdh>) = 0
[pid 19261] close(27</dev/xvdh>) = 0
[pid 19261] close(28</dev/xvdh>) = 0
[pid 19261] close(29</dev/xvdh>) = 0
[pid 19261] close(30</dev/xvdh>) = 0
[pid 19261] close(31</dev/xvdh>) = 0
[pid 19261] close(32</dev/xvdh>) = 0
[pid 19261] close(33</dev/xvdh>) = 0
[pid 19261] close(34</dev/xvdh>) = 0
[pid 19261] close(35</dev/xvdh>) = 0
[pid 19261] close(36</dev/xvdg>) = 0
[pid 19261] close(37</dev/xvdg>) = 0
[pid 19261] close(38</dev/xvdg>) = 0
[pid 19261] close(39</dev/xvdg>) = 0
[pid 19261] close(40</dev/xvdg>) = 0
[pid 19261] close(41</dev/xvdg>) = 0
[pid 19261] close(42</dev/xvdg>) = 0
[pid 19261] close(43</dev/xvdg>) = 0
[pid 19261] close(44</dev/xvdg>) = 0
[pid 19261] close(45</dev/xvdg>) = 0
[pid 19261] close(46</dev/xvdg>) = 0
[pid 19261] close(47</dev/xvdg>) = 0
[pid 19261] close(12</dev/xvdg>) = 0
[pid 19261] close(13</dev/xvdg>) = 0
[pid 19261] close(14</dev/xvdg>) = 0
[pid 19261] close(15</dev/xvdg>) = 0
[pid 19261] close(16</dev/xvdg>) = 0
[pid 19261] close(17</dev/xvdg>) = 0
[pid 19261] close(18</dev/xvdg>) = 0
[pid 19261] close(19</dev/xvdg>) = 0
[pid 19261] close(20</dev/xvdg>) = 0
[pid 19261] close(21</dev/xvdg>) = 0
[pid 19261] close(22</dev/xvdg>) = 0
[pid 19261] close(23</dev/xvdg>) = 0
[pid 19261] close(10</dev/xvdg>) = 0
[pid 19261] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
[pid 19261] close(12</dev/xvdg>2020-06-21T03:12:01.110+0000 7fa9791b7dc0 -1 bdev(0x5586e29f4000 /var/lib/ceph/osd/ceph-0/block) _lock flock failed on /var/lib/ceph/osd/ceph-0/block
2020-06-21T03:12:01.110+0000 7fa9791b7dc0 -1 bdev(0x5586e29f4000 /var/lib/ceph/osd/ceph-0/block) open failed to lock /var/lib/ceph/osd/ceph-0/block: (11) Resource temporarily unavailable
[pid 19261] close(13</dev/xvdg>) = 0
[pid 19261] close(14</dev/xvdg>) = 0
[pid 19261] close(15</dev/xvdg>) = 0
[pid 19261] close(16</dev/xvdg>) = 0
[pid 19261] close(17</dev/xvdg>) = 0
[pid 19261] close(18</dev/xvdg>) = 0
[pid 19261] close(19</dev/xvdg>) = 0
[pid 19261] close(20</dev/xvdg>) = 0
[pid 19261] close(21</dev/xvdg>) = 0
[pid 19261] close(22</dev/xvdg>) = 0
[pid 19261] close(23</dev/xvdg>) = 0
+++ exited with 250 +++
# strace -fye flock,close ceph-osd -i 0 --mkfs --osd-uuid f1889a99-b4dc-4044-a244-eabca4539ac9 --setuser ceph --setgroup ceph --osd-objectstore bluestore 2>&1 | grep -E 'flock|Resource|close.*dev/xvd|^\+\+\+ exited'
[pid 24470] close(10</dev/xvdg>) = 0
[pid 24470] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 24470] close(36</dev/xvdh>) = 0
[pid 24470] close(48</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdh>) = 0
[pid 24470] close(25</dev/xvdh>) = 0
[pid 24470] close(26</dev/xvdh>) = 0
[pid 24470] close(27</dev/xvdh>) = 0
[pid 24470] close(28</dev/xvdh>) = 0
[pid 24470] close(29</dev/xvdh>) = 0
[pid 24470] close(30</dev/xvdh>) = 0
[pid 24470] close(31</dev/xvdh>) = 0
[pid 24470] close(32</dev/xvdh>) = 0
[pid 24470] close(33</dev/xvdh>) = 0
[pid 24470] close(34</dev/xvdh>) = 0
[pid 24470] close(35</dev/xvdh>) = 0
[pid 24470] close(36</dev/xvdg>) = 0
[pid 24470] close(37</dev/xvdg>) = 0
[pid 24470] close(38</dev/xvdg>) = 0
[pid 24470] close(39</dev/xvdg>) = 0
[pid 24470] close(40</dev/xvdg>) = 0
[pid 24470] close(41</dev/xvdg>) = 0
[pid 24470] close(42</dev/xvdg>) = 0
[pid 24470] close(43</dev/xvdg>) = 0
[pid 24470] close(44</dev/xvdg>) = 0
[pid 24470] close(45</dev/xvdg>) = 0
[pid 24470] close(46</dev/xvdg>) = 0
[pid 24470] close(47</dev/xvdg>) = 0
[pid 24470] close(12</dev/xvdg>) = 0
[pid 24470] close(13</dev/xvdg>) = 0
[pid 24470] close(14</dev/xvdg>) = 0
[pid 24470] close(15</dev/xvdg>) = 0
[pid 24470] close(16</dev/xvdg>) = 0
[pid 24470] close(17</dev/xvdg>) = 0
[pid 24470] close(18</dev/xvdg>) = 0
[pid 24470] close(19</dev/xvdg>) = 0
[pid 24470] close(20</dev/xvdg>) = 0
[pid 24470] close(21</dev/xvdg>) = 0
[pid 24470] close(22</dev/xvdg>) = 0
[pid 24470] close(23</dev/xvdg>) = 0
[pid 24470] close(10</dev/xvdg>) = 0
[pid 24470] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 24470] close(36</dev/xvdh>) = 0
[pid 24470] close(48</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdh>) = 0
[pid 24470] close(25</dev/xvdh>) = 0
[pid 24470] close(26</dev/xvdh>) = 0
[pid 24470] close(27</dev/xvdh>) = 0
[pid 24470] close(28</dev/xvdh>) = 0
[pid 24470] close(29</dev/xvdh>) = 0
[pid 24470] close(30</dev/xvdh>) = 0
[pid 24470] close(31</dev/xvdh>) = 0
[pid 24470] close(32</dev/xvdh>) = 0
[pid 24470] close(33</dev/xvdh>) = 0
[pid 24470] close(34</dev/xvdh>) = 0
[pid 24470] close(35</dev/xvdh>) = 0
[pid 24470] close(36</dev/xvdg>) = 0
[pid 24470] close(37</dev/xvdg>) = 0
[pid 24470] close(38</dev/xvdg>) = 0
[pid 24470] close(39</dev/xvdg>) = 0
[pid 24470] close(40</dev/xvdg>) = 0
[pid 24470] close(41</dev/xvdg>) = 0
[pid 24470] close(42</dev/xvdg>) = 0
[pid 24470] close(43</dev/xvdg>) = 0
[pid 24470] close(44</dev/xvdg>) = 0
[pid 24470] close(45</dev/xvdg>) = 0
[pid 24470] close(46</dev/xvdg>) = 0
[pid 24470] close(47</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] close(24</dev/xvdg>) = 0
[pid 24470] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
2020-06-21T03:13:24.640+0000 7f555224bdc0 -1 bdev(0x55873b676700 /var/lib/ceph/osd/ceph-0/block.db) _lock flock failed on /var/lib/ceph/osd/ceph-0/block.db
2020-06-21T03:13:24.640+0000 7f555224bdc0 -1 bdev(0x55873b676700 /var/lib/ceph/osd/ceph-0/block.db) open failed to lock /var/lib/ceph/osd/ceph-0/block.db: (11) Resource temporarily unavailable
[pid 24470] close(24</dev/xvdh>) = 0
[pid 24470] close(25</dev/xvdh>) = 0
[pid 24470] close(26</dev/xvdh>) = 0
[pid 24470] close(27</dev/xvdh>) = 0
[pid 24470] close(28</dev/xvdh>) = 0
[pid 24470] close(29</dev/xvdh>) = 0
[pid 24470] close(30</dev/xvdh>) = 0
[pid 24470] close(31</dev/xvdh>) = 0
[pid 24470] close(32</dev/xvdh>) = 0
[pid 24470] close(33</dev/xvdh>) = 0
[pid 24470] close(34</dev/xvdh>) = 0
[pid 24470] close(35</dev/xvdh>) = 0
2020-06-21T03:13:24.640+0000 7f555224bdc0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _minimal_open_bluefs add block device(/var/lib/ceph/osd/ceph-0/block.db) returned: (11) Resource temporarily unavailable
[pid 24470] close(12</dev/xvdg>) = 0
[pid 24470] close(13</dev/xvdg>) = 0
[pid 24470] close(14</dev/xvdg>) = 0
[pid 24470] close(15</dev/xvdg>) = 0
[pid 24470] close(16</dev/xvdg>) = 0
[pid 24470] close(17</dev/xvdg>) = 0
[pid 24470] close(18</dev/xvdg>) = 0
[pid 24470] close(19</dev/xvdg>) = 0
[pid 24470] close(20</dev/xvdg>) = 0
[pid 24470] close(21</dev/xvdg>) = 0
[pid 24470] close(22</dev/xvdg>) = 0
[pid 24470] close(23</dev/xvdg>) = 0
2020-06-21T03:13:25.006+0000 7f555224bdc0 -1 OSD::mkfs: couldn't mount ObjectStore: error (11) Resource temporarily unavailable
2020-06-21T03:13:25.108+0000 7f555224bdc0 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (11) Resource temporarily unavailable
+++ exited with 250 +++
# strace -fye flock,close ceph-osd -i 0 --mkfs --osd-uuid f1889a99-b4dc-4044-a244-eabca4539ac9 --setuser ceph --setgroup ceph --osd-objectstore bluestore 2>&1 | grep -E 'flock|Resource|close.*dev/xvd|^\+\+\+ exited'
[pid 16441] close(10</dev/xvdg>) = 0
[pid 16441] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 16441] close(36</dev/xvdh>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdh>) = 0
[pid 16441] close(25</dev/xvdh>) = 0
[pid 16441] close(26</dev/xvdh>) = 0
[pid 16441] close(27</dev/xvdh>) = 0
[pid 16441] close(28</dev/xvdh>) = 0
[pid 16441] close(29</dev/xvdh>) = 0
[pid 16441] close(30</dev/xvdh>) = 0
[pid 16441] close(31</dev/xvdh>) = 0
[pid 16441] close(32</dev/xvdh>) = 0
[pid 16441] close(33</dev/xvdh>) = 0
[pid 16441] close(34</dev/xvdh>) = 0
[pid 16441] close(35</dev/xvdh>) = 0
[pid 16441] close(36</dev/xvdg>) = 0
[pid 16441] close(37</dev/xvdg>) = 0
[pid 16441] close(38</dev/xvdg>) = 0
[pid 16441] close(39</dev/xvdg>) = 0
[pid 16441] close(40</dev/xvdg>) = 0
[pid 16441] close(41</dev/xvdg>) = 0
[pid 16441] close(42</dev/xvdg>) = 0
[pid 16441] close(43</dev/xvdg>) = 0
[pid 16441] close(44</dev/xvdg>) = 0
[pid 16441] close(45</dev/xvdg>) = 0
[pid 16441] close(46</dev/xvdg>) = 0
[pid 16441] close(47</dev/xvdg>) = 0
[pid 16441] close(12</dev/xvdg>) = 0
[pid 16441] close(13</dev/xvdg>) = 0
[pid 16441] close(14</dev/xvdg>) = 0
[pid 16441] close(15</dev/xvdg>) = 0
[pid 16441] close(16</dev/xvdg>) = 0
[pid 16441] close(17</dev/xvdg>) = 0
[pid 16441] close(18</dev/xvdg>) = 0
[pid 16441] close(19</dev/xvdg>) = 0
[pid 16441] close(20</dev/xvdg>) = 0
[pid 16441] close(21</dev/xvdg>) = 0
[pid 16441] close(22</dev/xvdg>) = 0
[pid 16441] close(23</dev/xvdg>) = 0
[pid 16441] close(10</dev/xvdg>) = 0
[pid 16441] flock(12</dev/xvdg>, LOCK_EX|LOCK_NB) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 16441] close(36</dev/xvdh>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdh>) = 0
[pid 16441] close(25</dev/xvdh>) = 0
[pid 16441] close(26</dev/xvdh>) = 0
[pid 16441] close(27</dev/xvdh>) = 0
[pid 16441] close(28</dev/xvdh>) = 0
[pid 16441] close(29</dev/xvdh>) = 0
[pid 16441] close(30</dev/xvdh>) = 0
[pid 16441] close(31</dev/xvdh>) = 0
[pid 16441] close(32</dev/xvdh>) = 0
[pid 16441] close(33</dev/xvdh>) = 0
[pid 16441] close(34</dev/xvdh>) = 0
[pid 16441] close(35</dev/xvdh>) = 0
[pid 16441] close(36</dev/xvdg>) = 0
[pid 16441] close(37</dev/xvdg>) = 0
[pid 16441] close(38</dev/xvdg>) = 0
[pid 16441] close(39</dev/xvdg>) = 0
[pid 16441] close(40</dev/xvdg>) = 0
[pid 16441] close(41</dev/xvdg>) = 0
[pid 16441] close(42</dev/xvdg>) = 0
[pid 16441] close(43</dev/xvdg>) = 0
[pid 16441] close(44</dev/xvdg>) = 0
[pid 16441] close(45</dev/xvdg>) = 0
[pid 16441] close(46</dev/xvdg>) = 0
[pid 16441] close(47</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdg>) = 0
[pid 16441] flock(24</dev/xvdh>, LOCK_EX|LOCK_NB) = 0
[pid 16441] close(36</dev/xvdh>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(48</dev/xvdg>) = 0
[pid 16441] close(24</dev/xvdh>) = 0
[pid 16441] close(25</dev/xvdh>) = 0
[pid 16441] close(26</dev/xvdh>) = 0
[pid 16441] close(27</dev/xvdh>) = 0
[pid 16441] close(28</dev/xvdh>) = 0
[pid 16441] close(29</dev/xvdh>) = 0
[pid 16441] close(30</dev/xvdh>) = 0
[pid 16441] close(31</dev/xvdh>) = 0
[pid 16441] close(32</dev/xvdh>) = 0
[pid 16441] close(33</dev/xvdh>) = 0
[pid 16441] close(34</dev/xvdh>) = 0
[pid 16441] close(35</dev/xvdh>) = 0
[pid 16441] close(36</dev/xvdg>) = 0
[pid 16441] close(37</dev/xvdg>) = 0
[pid 16441] close(38</dev/xvdg>) = 0
[pid 16441] close(39</dev/xvdg>) = 0
[pid 16441] close(40</dev/xvdg>) = 0
[pid 16441] close(41</dev/xvdg>) = 0
[pid 16441] close(42</dev/xvdg>) = 0
[pid 16441] close(43</dev/xvdg>) = 0
[pid 16441] close(44</dev/xvdg>) = 0
[pid 16441] close(45</dev/xvdg>) = 0
[pid 16441] close(46</dev/xvdg>) = 0
[pid 16441] close(47</dev/xvdg>) = 0
[pid 16441] close(12</dev/xvdg>) = 0
[pid 16441] close(13</dev/xvdg>) = 0
[pid 16441] close(14</dev/xvdg>) = 0
[pid 16441] close(15</dev/xvdg>) = 0
[pid 16441] close(16</dev/xvdg>) = 0
[pid 16441] close(17</dev/xvdg>) = 0
[pid 16441] close(18</dev/xvdg>) = 0
[pid 16441] close(19</dev/xvdg>) = 0
[pid 16441] close(20</dev/xvdg>) = 0
[pid 16441] close(21</dev/xvdg>) = 0
[pid 16441] close(22</dev/xvdg>) = 0
[pid 16441] close(23</dev/xvdg>) = 0
+++ exited with 0 +++
</pre>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=168673
2020-06-21T13:53:37Z
Niklas Hambuechen
<ul></ul><p>Another question:</p>
<p>Would it not be better to use OFD locks (Open File Description locks), that is via <code>F_OFD_SETLK</code>? See</p>
<ul>
<li><a class="external" href="https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html">https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html</a></li>
</ul>
<p>They are the newest of the 3 types of file locking (classical <code>fcntl(F_SETLK)</code>, <code>flock()</code>, and <code>fcntl(F_OFD_SETLK)</code>), and also solve the problem of</p>
<blockquote>
<p>classic posix lock gotcha: if you close any fd to the same inode from the process, the lock(s) go away.</p>
</blockquote>
<p>that the PR aimed to address.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=168678
2020-06-21T17:27:30Z
Niklas Hambuechen
<ul><li><strong>File</strong> <a href="/attachments/download/4924/ceph-issue-46124-strace-flock-close-clone.txt">ceph-issue-46124-strace-flock-close-clone.txt</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/4924/ceph-issue-46124-strace-flock-close-clone.txt">View</a> added</li></ul><p>From the strace above, we can see that there's always a `close()` after a matching `flock()` within the same PID, so the single thread seems to behave correctly.</p>
<p>I suspect that Ceph starts other threads (using clone() on Linux) while the lock is held, which leads to the lock being held further even though the thread that took the lock originally already close()d the file descriptor.</p>
<p>The attached strace output (2 runs) shows a failing and a working case, with added `clone()` syscalls and process birth/death.</p>
<p>From `man 2 flock`:</p>
<blockquote>
<p>Locks created by flock() are associated with an open file description (see open(2)). This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock,</p>
</blockquote>
<p>This would also explain why it worked before: `fcntl(F_SETLK)` locks are process-local, so this would not raise that error.</p>
<p>The main thing I do not understand is how this can make it into 2 Ceph releases (v14 and v15) without being noticed, as for me it quite reliably breaks both OSD creation and OSD process startup.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=168680
2020-06-21T21:09:51Z
Niklas Hambuechen
<ul></ul><blockquote>
<p>I suspect that Ceph starts other threads (using clone() on Linux) while the lock is held</p>
</blockquote>
<p>Sorry, this should be threads/processes. And threads should not be relevant, because they share file descriptor state.</p>
<p>From the attached strace, it the only `clone()` that's a real process, not a thread, is at the beginning, even before the first `flock()`, so my suspicion above is probably not well-founded.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=169603
2020-07-02T14:09:33Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li></ul><p>Please feel free to reopen if you find a real bug somewhere.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=169788
2020-07-06T22:30:31Z
Niklas Hambuechen
<ul></ul><p>@Neha Ojha:</p>
<p>I think the bug remains as real as it gets; I did not retract that this is a bug. With my last comment I only stated that my suspicion about the underlying reason of the bug is probably wrong.</p>
<p>How can I reopen the ticket?</p>
<p>I'm not sure I have sufficient Redmine permissions for that.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=169789
2020-07-06T22:33:27Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>Closed</i> to <i>New</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=169951
2020-07-08T09:33:45Z
Igor Fedotov
igor.fedotov@croit.io
<ul></ul><p>I'm able to reproduce the issue with src/objectstore/store_test and pretty pawerfull HW:<br /> -10> 2020-07-08T09:28:28.175+0000 7f1ac982e0c0 -1 bdev(0x5596ad2eea80 bluesto<br />re.test_temp_dir/block.db) _lock flock failed on bluestore.test_temp_dir/block.d<br />b 11<br /> -9> 2020-07-08T09:28:28.175+0000 7f1ac982e0c0 -1 bdev(0x5596ad2eea80 bluesto<br />re.test_temp_dir/block.db) open failed to lock bluestore.test_temp_dir/block.db:<br /> (11) Resource temporarily unavailable</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=169952
2020-07-08T09:34:37Z
Igor Fedotov
igor.fedotov@croit.io
<ul><li><strong>Pull request ID</strong> set to <i>34566</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=171993
2020-07-30T14:27:18Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=173981
2020-08-28T10:36:09Z
Igor Fedotov
igor.fedotov@croit.io
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=174986
2020-09-14T14:02:48Z
Igor Fedotov
igor.fedotov@croit.io
<ul><li><strong>Backport</strong> set to <i>octopus, nautilus</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=174987
2020-09-14T14:04:52Z
Igor Fedotov
igor.fedotov@croit.io
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Pending Backport</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=174998
2020-09-14T15:31:21Z
Niklas Hambuechen
<ul></ul><p>I believe this bug was not fixed properly; the added commit "just retries 3 times".</p>
<p>This makes success more likely, but does not address the underlying race condition, so I don't think this bug should be auto-closed by that commit. (Retrying indefinitely seems like the more correct fix to me.)</p>
<p>See:</p>
<ul>
<li><a class="external" href="https://github.com/ceph/ceph/pull/34566#pullrequestreview-446799112">https://github.com/ceph/ceph/pull/34566#pullrequestreview-446799112</a></li>
<li><a class="external" href="https://github.com/ceph/ceph/pull/34566#issuecomment-692133157">https://github.com/ceph/ceph/pull/34566#issuecomment-692133157</a></li>
</ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=175000
2020-09-14T15:54:30Z
Kefu Chai
tchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>New</i></li></ul><p>per the discussion in <a class="external" href="https://github.com/ceph/ceph/pull/34566">https://github.com/ceph/ceph/pull/34566</a>, i am reopening this ticket.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=175001
2020-09-14T15:59:52Z
Kefu Chai
tchaikov@gmail.com
<ul><li><strong>Assignee</strong> set to <i>Kefu Chai</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=175026
2020-09-15T06:38:25Z
Kefu Chai
tchaikov@gmail.com
<ul></ul><p>per <a class="external" href="https://repo.or.cz/glibc.git/blob/HEAD:/sysdeps/posix/flock.c">https://repo.or.cz/glibc.git/blob/HEAD:/sysdeps/posix/flock.c</a>,</p>
<pre><code class="c syntaxhl"><span class="CodeRay"><span class="predefined-type">int</span>
__flock (<span class="predefined-type">int</span> fd, <span class="predefined-type">int</span> operation)
{
<span class="keyword">struct</span> flock lbuf;
<span class="keyword">switch</span> (operation & ~LOCK_NB)
{
<span class="keyword">case</span> LOCK_SH:
lbuf.l_type = F_RDLCK;
<span class="keyword">break</span>;
<span class="keyword">case</span> LOCK_EX:
lbuf.l_type = F_WRLCK;
<span class="keyword">break</span>;
<span class="keyword">case</span> LOCK_UN:
lbuf.l_type = F_UNLCK;
<span class="keyword">break</span>;
<span class="keyword">default</span>:
__set_errno (EINVAL);
<span class="keyword">return</span> -<span class="integer">1</span>;
}
lbuf.l_whence = SEEK_SET;
lbuf.l_start = lbuf.l_len = <span class="integer">0L</span>; <span class="comment">/* Lock the whole file. */</span>
<span class="keyword">return</span> __fcntl (fd, (operation & LOCK_NB) ? F_SETLK : F_SETLKW, &lbuf);
}
</span></code></pre>
<p>the change of <a class="external" href="https://github.com/ceph/ceph/pull/26245/files#diff-af76abecbe3293ccd298b6deded4ca49L58-L63">https://github.com/ceph/ceph/pull/26245/files#diff-af76abecbe3293ccd298b6deded4ca49L58-L63</a> in <a class="external" href="https://github.com/ceph/ceph/pull/26245">https://github.com/ceph/ceph/pull/26245</a> does not change anything.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=175033
2020-09-15T08:06:59Z
Kefu Chai
tchaikov@gmail.com
<ul></ul><p>i think we also need to include <a class="external" href="https://github.com/ceph/ceph/pull/37153">https://github.com/ceph/ceph/pull/37153</a> for properly returning the right errno.</p>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=175035
2020-09-15T08:09:06Z
Kefu Chai
tchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=176342
2020-09-30T13:56:11Z
Igor Fedotov
igor.fedotov@croit.io
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=176383
2020-09-30T15:42:16Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/47707">Backport #47707</a>: nautilus: Potential race condition regression around new OSD flock()s</i> added</li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=176385
2020-09-30T15:42:23Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/47708">Backport #47708</a>: octopus: Potential race condition regression around new OSD flock()s</i> added</li></ul>
bluestore - Bug #46124: Potential race condition regression around new OSD flock()s
https://tracker.ceph.com/issues/46124?journal_id=182012
2021-01-05T15:02:40Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul><p>While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".</p>