mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
synced 2025-08-26 21:52:20 +00:00

Several flags are updated and checked only under namespace_sem; we are already making use of that when we are checking them without mount_lock, but we have to hold mount_lock for all updates, which makes things clumsier than they have to be. Take MNT_SHARED, MNT_UNBINDABLE, MNT_MARKED and MNT_UMOUNT_CANDIDATE into a separate field (->mnt_t_flags), renaming them to T_SHARED, etc. to avoid confusion. All accesses must be under namespace_sem. That changes locking requirements for mnt_change_propagation() and set_mnt_shared() - only namespace_sem is needed now. The same goes for SET_MNT_MARKED et.al. There might be more flags moved from ->mnt_flags to that field; this is just the initial set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
485 lines
21 KiB
Plaintext
485 lines
21 KiB
Plaintext
Notes on propagate_umount()
|
|
|
|
Umount propagation starts with a set of mounts we are already going to
|
|
take out. Ideally, we would like to add all downstream cognates to
|
|
that set - anything with the same mountpoint as one of the removed
|
|
mounts and with parent that would receive events from the parent of that
|
|
mount. However, there are some constraints the resulting set must
|
|
satisfy.
|
|
|
|
It is convenient to define several properties of sets of mounts:
|
|
|
|
1) A set S of mounts is non-shifting if for any mount X belonging
|
|
to S all subtrees mounted strictly inside of X (i.e. not overmounting
|
|
the root of X) contain only elements of S.
|
|
|
|
2) A set S is non-revealing if all locked mounts that belong to S have
|
|
parents that also belong to S.
|
|
|
|
3) A set S is closed if it contains all children of its elements.
|
|
|
|
The set of mounts taken out by umount(2) must be non-shifting and
|
|
non-revealing; the first constraint is what allows to reparent
|
|
any remaining mounts and the second is what prevents the exposure
|
|
of any concealed mountpoints.
|
|
|
|
propagate_umount() takes the original set as an argument and tries to
|
|
extend that set. The original set is a full subtree and its root is
|
|
unlocked; what matters is that it's closed and non-revealing.
|
|
Resulting set may not be closed; there might still be mounts outside
|
|
of that set, but only on top of stacks of root-overmounting elements
|
|
of set. They can be reparented to the place where the bottom of
|
|
stack is attached to a mount that will survive. NOTE: doing that
|
|
will violate a constraint on having no more than one mount with
|
|
the same parent/mountpoint pair; however, the caller (umount_tree())
|
|
will immediately remedy that - it may keep unmounted element attached
|
|
to parent, but only if the parent itself is unmounted. Since all
|
|
conflicts created by reparenting have common parent *not* in the
|
|
set and one side of the conflict (bottom of the stack of overmounts)
|
|
is in the set, it will be resolved. However, we rely upon umount_tree()
|
|
doing that pretty much immediately after the call of propagate_umount().
|
|
|
|
Algorithm is based on two statements:
|
|
1) for any set S, there is a maximal non-shifting subset of S
|
|
and it can be calculated in O(#S) time.
|
|
2) for any non-shifting set S, there is a maximal non-revealing
|
|
subset of S. That subset is also non-shifting and it can be calculated
|
|
in O(#S) time.
|
|
|
|
Finding candidates.
|
|
|
|
We are given a closed set U and we want to find all mounts that have
|
|
the same mountpoint as some mount m in U *and* whose parent receives
|
|
propagation from the parent of the same mount m. Naive implementation
|
|
would be
|
|
S = {}
|
|
for each m in U
|
|
add m to S
|
|
p = parent(m)
|
|
for each q in Propagation(p) - {p}
|
|
child = look_up(q, mountpoint(m))
|
|
if child
|
|
add child to S
|
|
but that can lead to excessive work - there might be propagation among the
|
|
subtrees of U, in which case we'd end up examining the same candidates
|
|
many times. Since propagation is transitive, the same will happen to
|
|
everything downstream of that candidate and it's not hard to construct
|
|
cases where the approach above leads to the time quadratic by the actual
|
|
number of candidates.
|
|
|
|
Note that if we run into a candidate we'd already seen, it must've been
|
|
added on an earlier iteration of the outer loop - all additions made
|
|
during one iteration of the outer loop have different parents. So
|
|
if we find a child already added to the set, we know that everything
|
|
in Propagation(parent(child)) with the same mountpoint has been already
|
|
added.
|
|
S = {}
|
|
for each m in U
|
|
if m in S
|
|
continue
|
|
add m to S
|
|
p = parent(m)
|
|
q = propagation_next(p, p)
|
|
while q
|
|
child = look_up(q, mountpoint(m))
|
|
if child
|
|
if child in S
|
|
q = skip_them(q, p)
|
|
continue;
|
|
add child to S
|
|
q = propagation_next(q, p)
|
|
where
|
|
skip_them(q, p)
|
|
keep walking Propagation(p) from q until we find something
|
|
not in Propagation(q)
|
|
|
|
would get rid of that problem, but we need a sane implementation of
|
|
skip_them(). That's not hard to do - split propagation_next() into
|
|
"down into mnt_slave_list" and "forward-and-up" parts, with the
|
|
skip_them() being "repeat the forward-and-up part until we get NULL
|
|
or something that isn't a peer of the one we are skipping".
|
|
|
|
Note that there can be no absolute roots among the extra candidates -
|
|
they all come from mount lookups. Absolute root among the original
|
|
set is _currently_ impossible, but it might be worth protecting
|
|
against.
|
|
|
|
Maximal non-shifting subsets.
|
|
|
|
Let's call a mount m in a set S forbidden in that set if there is a
|
|
subtree mounted strictly inside m and containing mounts that do not
|
|
belong to S.
|
|
|
|
The set is non-shifting when none of its elements are forbidden in it.
|
|
|
|
If mount m is forbidden in a set S, it is forbidden in any subset S' it
|
|
belongs to. In other words, it can't belong to any of the non-shifting
|
|
subsets of S. If we had a way to find a forbidden mount or show that
|
|
there's none, we could use it to find the maximal non-shifting subset
|
|
simply by finding and removing them until none remain.
|
|
|
|
Suppose mount m is forbidden in S; then any mounts forbidden in S - {m}
|
|
must have been forbidden in S itself. Indeed, since m has descendents
|
|
that do not belong to S, any subtree that fits into S will fit into
|
|
S - {m} as well.
|
|
|
|
So in principle we could go through elements of S, checking if they
|
|
are forbidden in S and removing the ones that are. Removals will
|
|
not invalidate the checks done for earlier mounts - if they were not
|
|
forbidden at the time we checked, they won't become forbidden later.
|
|
It's too costly to be practical, but there is a similar approach that
|
|
is linear by size of S.
|
|
|
|
Let's say that mount x in a set S is forbidden by mount y, if
|
|
* both x and y belong to S.
|
|
* there is a chain of mounts starting at x and leaving S
|
|
immediately after passing through y, with the first
|
|
mountpoint strictly inside x.
|
|
Note 1: x may be equal to y - that's the case when something not
|
|
belonging to S is mounted strictly inside x.
|
|
Note 2: if y does not belong to S, it can't forbid anything in S.
|
|
Note 3: if y has no children outside of S, it can't forbid anything in S.
|
|
|
|
It's easy to show that mount x is forbidden in S if and only if x is
|
|
forbidden in S by some mount y. And it's easy to find all mounts in S
|
|
forbidden by a given mount.
|
|
|
|
Consider the following operation:
|
|
Trim(S, m) = S - {x : x is forbidden by m in S}
|
|
|
|
Note that if m does not belong to S or has no children outside of S we
|
|
are guaranteed that Trim(S, m) is equal to S.
|
|
|
|
The following is true: if x is forbidden by y in Trim(S, m), it was
|
|
already forbidden by y in S.
|
|
|
|
Proof: Suppose x is forbidden by y in Trim(S, m). Then there is a
|
|
chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1}
|
|
is the first element that doesn't belong to Trim(S, m) and the
|
|
mountpoint of x_1 is strictly inside x. If mount r belongs to S, it must
|
|
have been removed by Trim(S, m), i.e. it was forbidden in S by m.
|
|
Then there was a mount chain from r to some child of m that stayed in
|
|
S all the way until m, but that's impossible since x belongs to Trim(S, m)
|
|
and prepending (x_0, ..., x_k) to that chain demonstrates that x is also
|
|
forbidden in S by m, and thus can't belong to Trim(S, m).
|
|
Therefore r can not belong to S and our chain demonstrates that
|
|
x is forbidden by y in S. QED.
|
|
|
|
Corollary: no mount is forbidden by m in Trim(S, m). Indeed, any
|
|
such mount would have been forbidden by m in S and thus would have been
|
|
in the part of S removed in Trim(S, m).
|
|
|
|
Corollary: no mount is forbidden by m in Trim(Trim(S, m), n). Indeed,
|
|
any such would have to have been forbidden by m in Trim(S, m), which
|
|
is impossible.
|
|
|
|
Corollary: after
|
|
S = Trim(S, x_1)
|
|
S = Trim(S, x_2)
|
|
...
|
|
S = Trim(S, x_k)
|
|
no mount remaining in S will be forbidden by either of x_1,...,x_k.
|
|
|
|
The following will reduce S to its maximal non-shifting subset:
|
|
visited = {}
|
|
while S contains elements not belonging to visited
|
|
let m be an arbitrary such element of S
|
|
S = Trim(S, m)
|
|
add m to visited
|
|
|
|
S never grows, so the number of elements of S not belonging to visited
|
|
decreases at least by one on each iteration. When the loop terminates,
|
|
all mounts remaining in S belong to visited. It's easy to see that at
|
|
the beginning of each iteration no mount remaining in S will be forbidden
|
|
by any element of visited. In other words, no mount remaining in S will
|
|
be forbidden, i.e. final value of S will be non-shifting. It will be
|
|
the maximal non-shifting subset, since we were removing only forbidden
|
|
elements.
|
|
|
|
There are two difficulties in implementing the above in linear
|
|
time, both due to the fact that Trim() might need to remove more than one
|
|
element. Naive implementation of Trim() is vulnerable to running into a
|
|
long chain of mounts, each mounted on top of parent's root. Nothing in
|
|
that chain is forbidden, so nothing gets removed from it. We need to
|
|
recognize such chains and avoid walking them again on subsequent calls of
|
|
Trim(), otherwise we will end up with worst-case time being quadratic by
|
|
the number of elements in S. Another difficulty is in implementing the
|
|
outer loop - we need to iterate through all elements of a shrinking set.
|
|
That would be trivial if we never removed more than one element at a time
|
|
(linked list, with list_for_each_entry_safe for iterator), but we may
|
|
need to remove more than one entry, possibly including the ones we have
|
|
already visited.
|
|
|
|
Let's start with naive algorithm for Trim():
|
|
|
|
Trim_one(m)
|
|
found = false
|
|
for each n in children(m)
|
|
if n not in S
|
|
found = true
|
|
if (mountpoint(n) != root(m))
|
|
remove m from S
|
|
break
|
|
if found
|
|
Trim_ancestors(m)
|
|
|
|
Trim_ancestors(m)
|
|
for (; parent(m) in S; m = parent(m)) {
|
|
if (mountpoint(m) != root(parent(m)))
|
|
remove parent(m) from S
|
|
}
|
|
|
|
If m belongs to S, Trim_one(m) will replace S with Trim(S, m).
|
|
Proof:
|
|
Consider the chains excluding elements from Trim(S, m). The last
|
|
two elements in such chain are m and some child of m that does not belong
|
|
to S. If m has no such children, Trim(S, m) is equal to S.
|
|
m itself is removed if and only if the chain has exactly two
|
|
elements, i.e. when the last element does not overmount the root of m.
|
|
In other words, that happens when m has a child not in S that does not
|
|
overmount the root of m.
|
|
All other elements to remove will be ancestors of m, such that
|
|
the entire descent chain from them to m is contained in S. Let
|
|
(x_0, x_1, ..., x_k = m) be the longest such chain. x_i needs to be
|
|
removed if and only if x_{i+1} does not overmount its root. It's easy
|
|
to see that Trim_ancestors(m) will iterate through that chain from
|
|
x_k to x_1 and that it will remove exactly the elements that need to be
|
|
removed.
|
|
|
|
Note that if the loop in Trim_ancestors() walks into an already
|
|
visited element, we are guaranteed that remaining iterations will see
|
|
only elements that had already been visited and remove none of them.
|
|
That's the weakness that makes it vulnerable to long chains of full
|
|
overmounts.
|
|
|
|
It's easy to deal with, if we can afford setting marks on
|
|
elements of S; we would mark all elements already visited by
|
|
Trim_ancestors() and have it bail out as soon as it sees an already
|
|
marked element.
|
|
|
|
The problems with iterating through the set can be dealt with in
|
|
several ways, depending upon the representation we choose for our set.
|
|
One useful observation is that we are given a closed subset in S - the
|
|
original set passed to propagate_umount(). Its elements can neither
|
|
forbid anything nor be forbidden by anything - all their descendents
|
|
belong to S, so they can not occur anywhere in any excluding chain.
|
|
In other words, the elements of that subset will remain in S until
|
|
the end and Trim_one(S, m) is a no-op for all m from that subset.
|
|
|
|
That suggests keeping S as a disjoint union of a closed set U
|
|
('will be unmounted, no matter what') and the set of all elements of
|
|
S that do not belong to U. That set ('candidates') is all we need
|
|
to iterate through. Let's represent it as a subset in a cyclic list,
|
|
consisting of all list elements that are marked as candidates (initially -
|
|
all of them). Then we could have Trim_ancestors() only remove the mark,
|
|
leaving the elements on the list. Then Trim_one() would never remove
|
|
anything other than its argument from the containing list, allowing to
|
|
use list_for_each_entry_safe() as iterator.
|
|
|
|
Assuming that representation we get the following:
|
|
|
|
list_for_each_entry_safe(m, ..., Candidates, ...)
|
|
Trim_one(m)
|
|
where
|
|
Trim_one(m)
|
|
if (m is not marked as a candidate)
|
|
strip the "seen by Trim_ancestors" mark from m
|
|
remove m from the Candidates list
|
|
return
|
|
|
|
remove_this = false
|
|
found = false
|
|
for each n in children(m)
|
|
if n not in S
|
|
found = true
|
|
if (mountpoint(n) != root(m))
|
|
remove_this = true
|
|
break
|
|
if found
|
|
Trim_ancestors(m)
|
|
if remove_this
|
|
strip the "seen by Trim_ancestors" mark from m
|
|
strip the "candidate" mark from m
|
|
remove m from the Candidate list
|
|
|
|
Trim_ancestors(m)
|
|
for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) {
|
|
if m is marked as seen by Trim_ancestors
|
|
return
|
|
mark m as seen by Trim_ancestors
|
|
if (mountpoint(m) != root(p))
|
|
strip the "candidate" mark from p
|
|
}
|
|
|
|
Terminating condition in the loop in Trim_ancestors() is correct,
|
|
since that that loop will never run into p belonging to U - p is always
|
|
an ancestor of argument of Trim_one() and since U is closed, the argument
|
|
of Trim_one() would also have to belong to U. But Trim_one() is never
|
|
called for elements of U. In other words, p belongs to S if and only
|
|
if it belongs to candidates.
|
|
|
|
Time complexity:
|
|
* we get no more than O(#S) calls of Trim_one()
|
|
* the loop over children in Trim_one() never looks at the same child
|
|
twice through all the calls.
|
|
* iterations of that loop for children in S are no more than O(#S)
|
|
in the worst case
|
|
* at most two children that are not elements of S are considered per
|
|
call of Trim_one().
|
|
* the loop in Trim_ancestors() sets its mark once per iteration and
|
|
no element of S has is set more than once.
|
|
|
|
In the end we may have some elements excluded from S by
|
|
Trim_ancestors() still stuck on the list. We could do a separate
|
|
loop removing them from the list (also no worse than O(#S) time),
|
|
but it's easier to leave that until the next phase - there we will
|
|
iterate through the candidates anyway.
|
|
|
|
The caller has already removed all elements of U from their parents'
|
|
lists of children, which means that checking if child belongs to S is
|
|
equivalent to checking if it's marked as a candidate; we'll never see
|
|
the elements of U in the loop over children in Trim_one().
|
|
|
|
What's more, if we see that children(m) is empty and m is not
|
|
locked, we can immediately move m into the committed subset (remove
|
|
from the parent's list of children, etc.). That's one fewer mount we'll
|
|
have to look into when we check the list of children of its parent *and*
|
|
when we get to building the non-revealing subset.
|
|
|
|
Maximal non-revealing subsets
|
|
|
|
If S is not a non-revealing subset, there is a locked element x in S
|
|
such that parent of x is not in S.
|
|
|
|
Obviously, no non-revealing subset of S may contain x. Removing such
|
|
elements one by one will obviously end with the maximal non-revealing
|
|
subset (possibly empty one). Note that removal of an element will
|
|
require removal of all its locked children, etc.
|
|
|
|
If the set had been non-shifting, it will remain non-shifting after
|
|
such removals.
|
|
Proof: suppose S was non-shifting, x is a locked element of S, parent of x
|
|
is not in S and S - {x} is not non-shifting. Then there is an element m
|
|
in S - {x} and a subtree mounted strictly inside m, such that m contains
|
|
an element not in in S - {x}. Since S is non-shifting, everything in
|
|
that subtree must belong to S. But that means that this subtree must
|
|
contain x somewhere *and* that parent of x either belongs that subtree
|
|
or is equal to m. Either way it must belong to S. Contradiction.
|
|
|
|
// same representation as for finding maximal non-shifting subsets:
|
|
// S is a disjoint union of a non-revealing set U (the ones we are committed
|
|
// to unmount) and a set of candidates, represented as a subset of list
|
|
// elements that have "is a candidate" mark on them.
|
|
// Elements of U are removed from their parents' lists of children.
|
|
// In the end candidates becomes empty and maximal non-revealing non-shifting
|
|
// subset of S is now in U
|
|
while (Candidates list is non-empty)
|
|
handle_locked(first(Candidates))
|
|
|
|
handle_locked(m)
|
|
if m is not marked as a candidate
|
|
strip the "seen by Trim_ancestors" mark from m
|
|
remove m from the list
|
|
return
|
|
cutoff = m
|
|
for (p = m; p in candidates; p = parent(p)) {
|
|
strip the "seen by Trim_ancestors" mark from p
|
|
strip the "candidate" mark from p
|
|
remove p from the Candidates list
|
|
if (!locked(p))
|
|
cutoff = parent(p)
|
|
}
|
|
if p in U
|
|
cutoff = p
|
|
while m != cutoff
|
|
remove m from children(parent(m))
|
|
add m to U
|
|
m = parent(m)
|
|
|
|
Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S.
|
|
* If it contains some elements of U, let x_k be the last one of those.
|
|
Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing.
|
|
* otherwise if all its elements are locked, then none of {x_0, ..., x_n}
|
|
may be elements of a non-revealing subset of S.
|
|
* otherwise let x_k be the first unlocked element of the chain. Then none
|
|
of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of
|
|
S and union of U and {x_k, ..., x_n} is non-revealing.
|
|
|
|
handle_locked(m) finds which of these cases applies and adjusts Candidates
|
|
and U accordingly. U remains non-revealing, union of Candidates and
|
|
U still contains any non-revealing subset of S and after the call of
|
|
handle_locked(m) m is guaranteed to be not in Candidates list. So having
|
|
it called for each element of S would suffice to empty Candidates,
|
|
leaving U the maximal non-revealing subset of S.
|
|
|
|
However, handle_locked(m) is a no-op when m belongs to U, so it's enough
|
|
to have it called for elements of Candidates list until none remain.
|
|
|
|
Time complexity: number of calls of handle_locked() is limited by
|
|
#Candidates, each iteration of the first loop in handle_locked() removes
|
|
an element from the list, so their total number of executions is also
|
|
limited by #Candidates; number of iterations in the second loop is no
|
|
greater than the number of iterations of the first loop.
|
|
|
|
|
|
Reparenting
|
|
|
|
After we'd calculated the final set, we still need to deal with
|
|
reparenting - if an element of the final set has a child not in it,
|
|
we need to reparent such child.
|
|
|
|
Such children can only be root-overmounting (otherwise the set wouldn't
|
|
be non-shifting) and their parents can not belong to the original set,
|
|
since the original is guaranteed to be closed.
|
|
|
|
|
|
Putting all of that together
|
|
|
|
The plan is to
|
|
* find all candidates
|
|
* trim down to maximal non-shifting subset
|
|
* trim down to maximal non-revealing subset
|
|
* reparent anything that needs to be reparented
|
|
* return the resulting set to the caller
|
|
|
|
For the 2nd and 3rd steps we want to separate the set into growing
|
|
non-revealing subset, initially containing the original set ("U" in
|
|
terms of the pseudocode above) and everything we are still not sure about
|
|
("candidates"). It means that for the output of the 1st step we'd like
|
|
the extra candidates separated from the stuff already in the original set.
|
|
For the 4th step we would like the additions to U separate from the
|
|
original set.
|
|
|
|
So let's go for
|
|
* original set ("set"). Linkage via mnt_list
|
|
* undecided candidates ("candidates"). Subset of a list,
|
|
consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE).
|
|
Initially all elements of the list will be marked that way; in the
|
|
end the list will become empty and no mounts will remain marked with
|
|
that flag.
|
|
* Reuse T_MARKED for "has been already seen by trim_ancestors()".
|
|
* anything in U that hadn't been in the original set - elements of
|
|
candidates will gradually be either discarded or moved there. In other
|
|
words, it's the candidates we have already decided to unmount. Its role
|
|
is reasonably close to the old "to_umount", so let's use that name.
|
|
Linkage via mnt_list.
|
|
|
|
For gather_candidates() we'll need to maintain both candidates (S -
|
|
set) and intersection of S with set. Use T_UMOUNT_CANDIDATE for
|
|
all elements we encounter, putting the ones not already in the original
|
|
set into the list of candidates. When we are done, strip that flag from
|
|
all elements of the original set. That gives a cheap way to check
|
|
if element belongs to S (in gather_candidates) and to candidates
|
|
itself (at later stages). Call that predicate is_candidate(); it would
|
|
be m->mnt_t_flags & T_UMOUNT_CANDIDATE.
|
|
|
|
All elements of the original set are marked with MNT_UMOUNT and we'll
|
|
need the same for elements added when joining the contents of to_umount
|
|
to set in the end. Let's set MNT_UMOUNT at the time we add an element
|
|
to to_umount; that's close to what the old 'umount_one' is doing, so
|
|
let's keep that name. It also gives us another predicate we need -
|
|
"belongs to union of set and to_umount"; will_be_unmounted() for now.
|
|
|
|
Removals from the candidates list should strip both T_MARKED and
|
|
T_UMOUNT_CANDIDATE; call it remove_from_candidates_list().
|