-
Notifications
You must be signed in to change notification settings - Fork 167
revision: add --maximal option #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When inspecting a range of commits from some set of starting references, it is sometimes useful to learn which commits are not reachable from any other commits in the selected range. One such application is in the creation of a sequence of bundles for the bundle URI feature. Creating a stack of bundles representing different slices of time includes defining which references to include. If all references are used, then this may be overwhelming or redundant. Instead, selecting commits that are maximal to the range could help defining a smaller reference set to use in the bundle header. Add a new '--maximal' option to restrict the output of a revision range to be only the commits that are not reachable from any other commit in the range, based on the reachability definition of the walk. This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is set as we walk. This does extend the bit range in object.h, but using an earlier bit may collide with another feature. The tests demonstrate the behavior of the feature with a positive-only range, ranges with negative references, and walk-modifying flags like --first-parent and --exclude-first-parent-only. Signed-off-by: Derrick Stolee <stolee@gmail.com>
|
/submit |
|
Submitted as pull.2032.git.1768703645125.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
Johannes Sixt wrote on the Git mailing list (how to reply to this email): Am 18.01.26 um 03:34 schrieb Derrick Stolee via GitGitGadget:
> diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
> index 453ec59057..f0d2ab32a9 100644
> --- a/Documentation/rev-list-options.adoc
> +++ b/Documentation/rev-list-options.adoc
> @@ -444,6 +444,10 @@ The following options affect the way the simplification is performed:
> times; if so, a commit is included if it is any of the commits
> given or if it is an ancestor or descendant of one of them.
>
> +`--maximal`::
> + Restrict the output commits to be those that are not reachable
> + from any other commits in the revision range.
I had to read this sentence three times to understand what it wants to
say, and that even though I had a rough idea what it was supposed to
mean. I tried to come up with a better wording, but found it to be
really hard.
Restrict output to the commits at the tips of the
revision range.
is all I could do, but this isn't a lot better, I am afraid.
The option name is too generic IMHO. How about "--starting-point",
"--topmost-only"? It's function is somewhat parallel to --boundary, but
at the positive end of the revision range. Perhaps we can use that as
inspiration.
The option is listed among options that affect the way the
simplification is performed. But is this true? Isn't it just an option
that changes what output is produced?
-- Hannes
|
|
User |
|
Derrick Stolee wrote on the Git mailing list (how to reply to this email): On 1/18/26 4:05 AM, Johannes Sixt wrote:
> Am 18.01.26 um 03:34 schrieb Derrick Stolee via GitGitGadget:
>> diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
>> index 453ec59057..f0d2ab32a9 100644
>> --- a/Documentation/rev-list-options.adoc
>> +++ b/Documentation/rev-list-options.adoc
>> @@ -444,6 +444,10 @@ The following options affect the way the simplification is performed:
>> times; if so, a commit is included if it is any of the commits
>> given or if it is an ancestor or descendant of one of them.
>> >> +`--maximal`::
>> + Restrict the output commits to be those that are not reachable
>> + from any other commits in the revision range.
> > I had to read this sentence three times to understand what it wants to
> say, and that even though I had a rough idea what it was supposed to
> mean. I tried to come up with a better wording, but found it to be
> really hard.
> > Restrict output to the commits at the tips of the
> revision range.
> > is all I could do, but this isn't a lot better, I am afraid.
> > The option name is too generic IMHO. How about "--starting-point",
> "--topmost-only"? It's function is somewhat parallel to --boundary, but
> at the positive end of the revision range. Perhaps we can use that as
> inspiration.
My perspective is skewed, because "maximal" is a concrete term in the
world of partially-ordered sets (such as commit history ordered by
reachability across child-to-parent relationships). It's important to
distinguish from "starting points" because the inputs to the command
are a list of starting points, not all of which are maximal within the
set. In fact, if some positive starting points are reachable from the
negative starting points, then they are already excluded.
My familiarity with this term is skewed by my experience working with
such terms, so I'm very open to new names for this option.
Your comparison to --boundary is interesting, because --boundary _adds_
commits to the range by selecting the commits from the negative range
that are reachable from the output commits. --maximal as defined here
_restricts_ to the output of commits in the range. It's interaction with
--boundary is trivial because no boundary commits would be included as
they are necessarily reachable from a maximal commit.
> The option is listed among options that affect the way the
> simplification is performed. But is this true? Isn't it just an option
> that changes what output is produced?
You're right that this is poorly placed. I'll put it in a better location
in v2.
Thanks,
-Stolee |
|
Johannes Sixt wrote on the Git mailing list (how to reply to this email): Am 18.01.26 um 19:27 schrieb Derrick Stolee:
> On 1/18/26 4:05 AM, Johannes Sixt wrote:
>> Am 18.01.26 um 03:34 schrieb Derrick Stolee via GitGitGadget:
>> > The option name is too generic IMHO. How about "--starting-point",
>> "--topmost-only"? It's function is somewhat parallel to --boundary, but
>> at the positive end of the revision range. Perhaps we can use that as
>> inspiration.
>
> My perspective is skewed, because "maximal" is a concrete term in the
> world of partially-ordered sets (such as commit history ordered by
> reachability across child-to-parent relationships). It's important to
> distinguish from "starting points" because the inputs to the command
> are a list of starting points, not all of which are maximal within the
> set. In fact, if some positive starting points are reachable from the
> negative starting points, then they are already excluded.
AFAICS, we don't have options named after graph- or set-theoretical
terms, but tend to stick to terms established in the Git ecosystem. I
assume that "maximal" isn't a meaning that an average Git user would
associate with the operation that is performed here.
But even if we decide to use "maximal", the option must be named
something other than *just* "--maximal"; this is simply too generic.
Perhaps "--only-maximal" or "--maximal-only".
Other ideas:
- --hide-reachable
- --range-head
- --range-head-only
- --most-recent
- --most-recent-only
> [--maximal]'s interaction with
> --boundary is trivial because no boundary commits would be included as
> they are necessarily reachable from a maximal commit.
So, --boundary --maximal shows only the maximal commits? That sounds
unexpected. Boundary commits are shown with additional mark-up; they
don't need to be suppressed. But in a first iteration it's probably
better to just make the two options incompatible.
-- Hannes
|
|
Derrick Stolee wrote on the Git mailing list (how to reply to this email): On 1/19/2026 6:15 AM, Johannes Sixt wrote:
> Am 18.01.26 um 19:27 schrieb Derrick Stolee:
>> On 1/18/26 4:05 AM, Johannes Sixt wrote:
>>> Am 18.01.26 um 03:34 schrieb Derrick Stolee via GitGitGadget:
>>>> The option name is too generic IMHO. How about "--starting-point",
>>> "--topmost-only"? It's function is somewhat parallel to --boundary, but
>>> at the positive end of the revision range. Perhaps we can use that as
>>> inspiration.
>>
>> My perspective is skewed, because "maximal" is a concrete term in the
>> world of partially-ordered sets (such as commit history ordered by
>> reachability across child-to-parent relationships). It's important to
>> distinguish from "starting points" because the inputs to the command
>> are a list of starting points, not all of which are maximal within the
>> set. In fact, if some positive starting points are reachable from the
>> negative starting points, then they are already excluded.
>
> AFAICS, we don't have options named after graph- or set-theoretical
> terms, but tend to stick to terms established in the Git ecosystem. I
> assume that "maximal" isn't a meaning that an average Git user would
> associate with the operation that is performed here.
My mindset is usually "all words are made up by somebody" and since
there isn't an established term for this in the existing Git ecosystem,
it is up to us to create a term. Borrowing one that exists elsewhere is
a valuable way to build upon any context that term brings with it.
It is also helpful that the term has an explicit technical definition
that means exactly what we're using it for here. It explicitly
differentiates from any "maximum" or confusion with a collapse to a
total order (such as Git's --date-order or --topo-order apply).
> But even if we decide to use "maximal", the option must be named
> something other than *just* "--maximal"; this is simply too generic.
> Perhaps "--only-maximal" or "--maximal-only".
When the argument is moved in the documentation into the set of
filters, then the fact that --maximal restricts the set of commits
makes any modifier such as "only" redundant.
> Other ideas:
> - --hide-reachable
> - --range-head
> - --range-head-only
> - --most-recent
> - --most-recent-only
These all have issues, such as being technically wrong (maximal commits
are reachable) or imply total orders, date orders, or generally only a
single result.
>> [--maximal]'s interaction with
>> --boundary is trivial because no boundary commits would be included as
>> they are necessarily reachable from a maximal commit.
>
> So, --boundary --maximal shows only the maximal commits? That sounds
> unexpected. Boundary commits are shown with additional mark-up; they
> don't need to be suppressed. But in a first iteration it's probably
> better to just make the two options incompatible.
Sure. But I'd like to counter that filters like --author also restrict
the set, including not showing boundary commits that don't fit the
--author pattern. It just happens that no boundary commits are also
maximal by definition.
I do sense that a lot of this is a matter of taste, and that you and I
differ greatly in our tastes on this topic. I look forward to more
opinions that can lead us towards one side or another (or in a new
direction).
Thanks,
-Stolee
|
|
Johannes Sixt wrote on the Git mailing list (how to reply to this email): Am 19.01.26 um 17:44 schrieb Derrick Stolee:
> When the argument is moved in the documentation into the set of
> filters, then the fact that --maximal restricts the set of commits
> makes any modifier such as "only" redundant.
Keep in mind that rev-list arguments are also understood by git-log.
Then one could expect --maximal to be somehow the opposite of --minimal.
(Which is badly named for the same reason, but that ship has sailed.)
-- Hannes
|
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): Johannes Sixt <j6t@kdbg.org> writes:
Johannes Sixt <j6t@kdbg.org> writes:
> But even if we decide to use "maximal", the option must be named
> something other than *just* "--maximal"; this is simply too generic.
> Perhaps "--only-maximal" or "--maximal-only".
>
> Other ideas:
> - --hide-reachable
> - --range-head
> - --range-head-only
> - --most-recent
> - --most-recent-only
>
>> [--maximal]'s interaction with
>> --boundary is trivial because no boundary commits would be included as
>> they are necessarily reachable from a maximal commit.
>
> So, --boundary --maximal shows only the maximal commits? That sounds
> unexpected. Boundary commits are shown with additional mark-up; they
> don't need to be suppressed. But in a first iteration it's probably
> better to just make the two options incompatible.
If I am reading the answer to "what is minimal/maximal elements in
partially ordered set?" correctly, our "--boundary" essentially is
to show direct parents of those commits that would be shown with the
(nonexistent) "--minimal-only" option. So I agree with you that it
makes perfect sense to make "--boundary" and "--maximal-only"
incompatible (it is like asking for both "--minimal-only" and
"--maximal-only" at the same time).
|
My motivation for this feature is very similar to the bundle URI application. I can get around it by creating a tool that uses
git rev-list --parentsand then uses a hashset to collect the parent list and filter out any commits that ever appear as parents. It would be more efficient to use Git's native revision-walking feature.This does bring the object struct up to a 32-bit boundary with 28 flag bits, 3 type bits, and a parsed bit. That's the biggest concern I have about this update adding a new flag bit. I would understand if this feature is not worth running out of room for extensions there.
I considered looking through the earlier bit positions to see the impact of an overlap, but they certainly looked potentially risky to reuse.
I wonder if anyone else has thought about this as a useful technique. For instance, it could be part of a strategy for choosing commits for reachability bitmaps.
Thanks,
-Stolee
cc: gitster@pobox.com
cc: Johannes Sixt j6t@kdbg.org