Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: ce74208c2fa13943fffa58f168ac27a76d0eb789
Choose a base ref
...
head repository: git/git
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: d54da84bd9de09fc339accff553f1fc8a5539154
Choose a head ref
  • 17 commits
  • 13 files changed
  • 1 contributor

Commits on Feb 24, 2026

  1. midx: mark get_midx_checksum() arguments as const

    To make clear that the function `get_midx_checksum()` does not do
    anything to modify its argument, mark the MIDX pointer as const.
    
    The following commit will rename this function altogether to make clear
    that it returns the raw bytes of the checksum, not a hex-encoded copy of
    it.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    00c0d84 View commit details
    Browse the repository at this point in the history
  2. midx: rename get_midx_checksum() to midx_get_checksum_hash()

    Since 541204a (Documentation: document naming schema for structs and
    their functions, 2024-07-30), we have adopted a naming convention for
    functions that would prefer a name like, say, `midx_get_checksum()` over
    `get_midx_checksum()`.
    
    Adopt this convention throughout the midx.h API. Since this function
    returns a raw (that is, non-hex encoded) hash, let's suffix the function
    with "_hash()" to make this clear. As a side effect, this prepares us
    for the subsequent change which will introduce a "_hex()" variant that
    encodes the checksum itself.
    
    Suggested-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    de811c2 View commit details
    Browse the repository at this point in the history
  3. midx: introduce midx_get_checksum_hex()

    When trying to print out, say, the hexadecimal representation of a
    MIDX's hash, our code will do something like:
    
        hash_to_hex_algop(midx_get_checksum_hash(m),
                          m->source->odb->repo->hash_algo);
    
    , which is both cumbersome and repetitive. In fact, all but a handful of
    callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
    repetitive nature of calling `midx_get_checksum_hash()` by having it
    return a pointer into a static buffer containing the above result.
    
    For the handful of callers that do need to compare the raw bytes and
    don't want to deal with an encoded copy (e.g., because they are passing
    it to hasheq() or similar), they may still rely on
    `midx_get_checksum_hash()` which returns the raw bytes.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    6e86f67 View commit details
    Browse the repository at this point in the history
  4. builtin/multi-pack-index.c: make '--progress' a common option

    All multi-pack-index sub-commands (write, verify, repack, and expire)
    support a '--progress' command-line option, despite not listing it as
    one of the common options in `common_opts`.
    
    As a result each sub-command declares its own `OPT_BIT()` for a
    "--progress" command-line option. Centralize this within the
    `common_opts` to avoid re-declaring it in each sub-command.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    1 Configuration menu
    Copy the full SHA
    6b8fb17 View commit details
    Browse the repository at this point in the history
  5. git-multi-pack-index(1): remove non-existent incompatibility

    Since fcb2205 (midx: implement support for writing incremental MIDX
    chains, 2024-08-06), the command-line options '--incremental' and
    '--bitmap' were declared to be incompatible with one another when
    running 'git multi-pack-index write'.
    
    However, since 27afc27 (midx: implement writing incremental MIDX
    bitmaps, 2025-03-20), that incompatibility no longer exists, despite the
    documentation saying so. Correct this by removing the stale reference to
    their incompatibility.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    f775d5b View commit details
    Browse the repository at this point in the history
  6. git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'

    Since c39fffc (tests: start asserting that *.txt SYNOPSIS matches -h
    output, 2022-10-13), the manual page for 'git multi-pack-index' has a
    SYNOPSIS section which differs from 'git multi-pack-index -h'.
    
    Correct this while also documenting additional options accepted by the
    'write' sub-command.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    d0e91c1 View commit details
    Browse the repository at this point in the history
  7. t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39

    Commit d4bf1d8 (multi-pack-index: verify missing pack, 2018-09-13)
    adds a new test to the MIDX test script to test how we handle missing
    packs.
    
    While the commit itself describes the test as "verify missing pack[s]",
    the test itself is actually called "verify packnames out of order",
    despite that not being what it tests.
    
    Likely this was a copy-and-paste of the test immediately above it of the
    same name. Correct this by renaming the test to match the commit
    message.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    8043782 View commit details
    Browse the repository at this point in the history
  8. midx-write.c: don't use pack_perm when assigning bitmap_pos

    In midx_pack_order(), we compute for each bitmapped pack the first bit
    to correspond to an object in that pack, along with how many bits were
    assigned to object(s) in that pack.
    
    Initially, each bitmap_nr value is set to zero, and each bitmap_pos
    value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
    that there are no packs who have an unknown bit position but a somehow
    non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
    midx-write.c).
    
    Once the pack order is fully determined, midx_pack_order() sets the
    bitmap_pos field for any bitmapped packs to zero if they are still
    listed as BITMAP_POS_UNKNOWN.
    
    However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
    This is fine for existing cases, since the only time the
    `ctx->pack_perm` array holds a value outside of the addressable range of
    `ctx->info` is when there are expired packs, which only occurs via 'git
    multi-pack-index expire', which does not support writing MIDX bitmaps.
    As a result, the range of ctx->pack_perm covers all values in [0,
    `ctx->nr`), so enumerating in this order isn't an issue.
    
    A future change necessary for compaction will complicate this further by
    introducing a wrapper around the `ctx->pack_perm` array, which turns the
    given `pack_int_id` into one that is relative to the lower end of the
    compaction range. As a result, indexing into `ctx->pack_perm` through
    this helper, say, with "0" will produce a crash when the lower end of
    the compaction range has >0 pack(s) in its base layer, since the
    subtraction will wrap around the 32-bit unsigned range, resulting in an
    uninitialized read.
    
    But the process is completely unnecessary in the first place: we are
    enumerating all values of `ctx->info`, and there is no reason to process
    them in a different order than they appear in memory. Index `ctx->info`
    directly to reflect that.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    ac10f6a View commit details
    Browse the repository at this point in the history
  9. midx-write.c: introduce struct write_midx_opts

    In the MIDX writing code, there are four functions which perform some
    sort of MIDX write operation. They are:
    
     - write_midx_file()
     - write_midx_file_only()
     - expire_midx_packs()
     - midx_repack()
    
    All of these functions are thin wrappers over `write_midx_internal()`,
    which implements the bulk of these routines. As a result, the
    `write_midx_internal()` function takes six arguments.
    
    Future commits in this series will want to add additional arguments, and
    in general this function's signature will be the union of parameters
    among *all* possible ways to write a MIDX.
    
    Instead of adding yet more arguments to this function to support MIDX
    compaction, introduce a `struct write_midx_opts`, which has the same
    struct members as `write_midx_internal()`'s arguments.
    
    Adding additional fields to the `write_midx_opts` struct is preferable
    to adding additional arguments to `write_midx_internal()`. This is
    because the callers below all zero-initialize the struct, so each time
    we add a new piece of information, we do not have to pass the zero value
    for it in all other call-sites that do not care about it.
    
    For now, no functional changes are included in this patch.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    82c905e View commit details
    Browse the repository at this point in the history
  10. midx: do not require packs to be sorted in lexicographic order

    The MIDX file format currently requires that pack files be identified by
    the lexicographic ordering of their names (that is, a pack having a
    checksum beginning with "abc" would have a numeric pack_int_id which is
    smaller than the same value for a pack beginning with "bcd").
    
    As a result, it is impossible to combine adjacent MIDX layers together
    without permuting bits from bitmaps that are in more recent layer(s).
    
    To see why, consider the following example:
    
              | packs       | preferred pack
      --------+-------------+---------------
      MIDX #0 | { X, Y, Z } | Y
      MIDX #1 | { A, B, C } | B
      MIDX #2 | { D, E, F } | D
    
    , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
    to combine MIDX layers #0 and #1, to create a new layer #0' containing
    the packs from both layers. With the original three MIDX layers, objects
    are laid out in the bitmap in the order they appear in their source
    pack, and the packs themselves are arranged according to the pseudo-pack
    order. In this case, that ordering is Y, X, Z, B, A, C.
    
    But recall that the pseudo-pack ordering is defined by the order that
    packs appear in the MIDX, with the exception of the preferred pack,
    which sorts ahead of all other packs regardless of its position within
    the MIDX. In the above example, that means that pack 'Y' could be placed
    anywhere (so long as it is designated as preferred), however, all other
    packs must be placed in the location listed above.
    
    Because that ordering isn't sorted lexicographically, it is impossible
    to compact MIDX layers in the above configuration without permuting the
    object-to-bit-position mapping. Changing this mapping would affect all
    bitmaps belonging to newer layers, rendering the bitmaps associated with
    MIDX #2 unreadable.
    
    One of the goals of MIDX compaction is that we are able to shrink the
    length of the MIDX chain *without* invalidating bitmaps that belong to
    newer layers, and the lexicographic ordering constraint is at odds with
    this goal.
    
    However, packs do not *need* to be lexicographically ordered within the
    MIDX. As far as I can gather, the only reason they are sorted lexically
    is to make it possible to perform a binary search over the pack names in
    a MIDX, necessary to make `midx_contains_pack()`'s performance
    logarithmic in the number of packs rather than linear.
    
    Relax this constraint by allowing MIDX writes to proceed with packs that
    are not arranged in lexicographic order. `midx_contains_pack()` will
    lazily instantiate a `pack_names_sorted` array on the MIDX, which will
    be used to implement the binary search over pack names.
    
    This change produces MIDXs which may not be correctly read with external
    tools or older versions of Git. Though older versions of Git know how to
    gracefully degrade and ignore any MIDX(s) they consider corrupt,
    external tools may not be as robust. To avoid unintentionally breaking
    any such tools, guard this change behind a version bump in the MIDX's
    on-disk format.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    b2ec8e9 View commit details
    Browse the repository at this point in the history
  11. midx-write.c: introduce midx_pack_perm() helper

    The `ctx->pack_perm` array can be considered as a permutation between
    the original `pack_int_id` of some given pack to its position in the
    `ctx->info` array containing all packs.
    
    Today we can always index into this array with any known `pack_int_id`,
    since there is never a `pack_int_id` which is greater than or equal to
    the value `ctx->nr`.
    
    That is not necessarily the case with MIDX compaction. For example,
    suppose we have a MIDX chain with three layers, each containing three
    packs. The base of the MIDX chain will have packs with IDs 0, 1, and 2,
    the next layer 3, 4, and 5, and so on. If we are compacting the topmost
    two layers, we'll have input `pack_int_id` values between [3, 8], but
    `ctx->nr` will only be 6.
    
    In that example, if we want to know where the pack whose original
    `pack_int_id` value was, say, 7, we would compute `ctx->pack_perm[7]`,
    leading to an uninitialized read, since there are only 6 entries
    allocated in that array.
    
    To address this, there are a couple of options:
    
     - We could allocate enough entries in `ctx->pack_perm` to accommodate
       the largest `orig_pack_int_id` value.
    
     - Or, we could internally shift the input values by the number of packs
       in the base layer of the lower end of the MIDX compaction range.
    
    This patch prepare us to take the latter approach, since it does not
    allocate more memory than strictly necessary. (In our above example, the
    base of the lower end of the compaction range is the first MIDX layer
    (having three packs), so we would end up indexing `ctx->pack_perm[7-3]`,
    which is a valid read.)
    
    Note that this patch does not actually implement that approach yet, but
    merely performs a behavior-preserving refactoring which will make the
    change easier to carry out in the future.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    4f85432 View commit details
    Browse the repository at this point in the history
  12. midx-write.c: extract fill_pack_from_midx()

    When filling packs from an existing MIDX, `fill_packs_from_midx()`
    handles preparing a MIDX'd pack, and reading out its pack name from the
    existing MIDX.
    
    MIDX compaction will want to perform an identical operation, though the
    caller will look quite different than `fill_packs_from_midx()`. To
    reduce any future code duplication, extract `fill_pack_from_midx()`
    from `fill_packs_from_midx()` to prepare to call our new helper function
    in a future change.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    5f3e7f7 View commit details
    Browse the repository at this point in the history
  13. midx-write.c: enumerate pack_int_id values directly

    Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
    the range [0, m->num_packs), and then shifts its index variable up by
    `m->num_packs_in_base` to produce a valid `pack_int_id`.
    
    Instead, directly enumerate the range:
    
        [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
    
    , which are the original pack_int_ids themselves as opposed to the
    indexes of those packs relative to the MIDX layer they are contained
    within.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    93c67df View commit details
    Browse the repository at this point in the history
  14. midx-write.c: factor fanout layering from compute_sorted_entries()

    When computing the set of objects to appear in a MIDX, we use
    compute_sorted_entries(), which handles objects from various existing
    sources one fanout layer at a time.
    
    The process for computing this set is slightly different during MIDX
    compaction, so factor out the existing functionality into its own
    routine to prevent `compute_sorted_entries()` from becoming too
    difficult to read.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    9aea84c View commit details
    Browse the repository at this point in the history
  15. t/helper/test-read-midx.c: plug memory leak when selecting layer

    Though our 'read-midx' test tool is capable of printing information
    about a single MIDX layer identified by its checksum, no caller in our
    test suite exercises this path.
    
    Unfortunately, there is a memory leak lurking in this (currently) unused
    path that would otherwise be exposed by the following commit.
    
    This occurs when providing a MIDX layer checksum other than the tip. As
    we walk over the MIDX chain trying to find the matching layer, we drop
    our reference to the top-most MIDX layer. Thus, our call to
    'close_midx()' later on leaks memory between the top-most MIDX layer and
    the MIDX layer immediately following the specified one.
    
    Plug this leak by holding a reference to the tip of the MIDX chain, and
    ensure that we call `close_midx()` before terminating the test tool.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    dedf71f View commit details
    Browse the repository at this point in the history
  16. midx: implement MIDX compaction

    When managing a MIDX chain with many layers, it is convenient to combine
    a sequence of adjacent layers into a single layer to prevent the chain
    from growing too long.
    
    While it is conceptually possible to "compact" a sequence of MIDX layers
    together by running "git multi-pack-index write --stdin-packs", there
    are a few drawbacks that make this less than desirable:
    
     - Preserving the MIDX chain is impossible, since there is no way to
       write a MIDX layer that contains objects or packs found in an earlier
       MIDX layer already part of the chain. So callers would have to write
       an entirely new (non-incremental) MIDX containing only the compacted
       layers, discarding all other objects/packs from the MIDX.
    
     - There is (currently) no way to write a MIDX layer outside of the MIDX
       chain to work around the above, such that the MIDX chain could be
       reassembled substituting the compacted layers with the MIDX that was
       written.
    
     - The `--stdin-packs` command-line option does not allow us to specify
       the order of packs as they appear in the MIDX. Therefore, even if
       there were workarounds for the previous two challenges, any bitmaps
       belonging to layers which come after the compacted layer(s) would no
       longer be valid.
    
    This commit introduces a way to compact a sequence of adjacent MIDX
    layers into a single layer while preserving the MIDX chain, as well as
    any bitmap(s) in layers which are newer than the compacted ones.
    
    Implementing MIDX compaction does not require a significant number of
    changes to how MIDX layers are written. The main changes are as follows:
    
     - Instead of calling `fill_packs_from_midx()`, we call a new function
       `fill_packs_from_midx_range()`, which walks backwards along the
       portion of the MIDX chain which we are compacting, and adds packs one
       layer a time.
    
       In order to preserve the pseudo-pack order, the concatenated pack
       order is preserved, with the exception of preferred packs which are
       always added first.
    
     - After adding entries from the set of packs in the compaction range,
       `compute_sorted_entries()` must adjust the `pack_int_id`'s for all
       objects added in each fanout layer to match their original
       `pack_int_id`'s (as opposed to the index at which each pack appears
       in `ctx.info`).
    
       Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
       here, as it unconditionally recurs through the `->base_midx`. Factor
       out a `_1()` variant that operates on a single layer, reimplement
       the existing function in terms of it, and use the new variant from
       `midx_fanout_add_compact()`.
    
       Since we are sorting the list of objects ourselves, the order we add
       them in does not matter.
    
     - When writing out the new 'multi-pack-index-chain' file, discard any
       layers in the compaction range, replacing them with the newly written
       layer, instead of keeping them and placing the new layer at the end
       of the chain.
    
    This ends up being sufficient to implement MIDX compaction in such a way
    that preserves bitmaps corresponding to more recent layers in the MIDX
    chain.
    
    The tests for MIDX compaction are so far fairly spartan, since the main
    interesting behavior here is ensuring that the right packs/objects are
    selected from each layer, and that the pack order is preserved despite
    whether or not they are sorted in lexicographic order in the original
    MIDX chain.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    9df44a9 View commit details
    Browse the repository at this point in the history
  17. midx: enable reachability bitmaps during MIDX compaction

    Enable callers to generate reachability bitmaps when performing MIDX
    layer compaction by combining all existing bitmaps from the compacted
    layers.
    
    Note that because of the object/pack ordering described by the previous
    commit, the pseudo-pack order for the compacted MIDX is the same as
    concatenating the individual pseudo-pack orderings for each layer in the
    compaction range.
    
    As a result, the only non-test or documentation change necessary is to
    treat all objects as non-preferred during compaction so as not to
    disturb the object ordering.
    
    In the future, we may want to adjust which commit(s) receive
    reachability bitmaps when compacting multiple .bitmap files into one, or
    even generate new bitmaps (e.g., if the references have moved
    significantly since the .bitmap was generated). This commit only
    implements combining all existing bitmaps in range together in order to
    demonstrate and lay the groundwork for more exotic strategies.
    
    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    ttaylorr authored and gitster committed Feb 24, 2026
    Configuration menu
    Copy the full SHA
    d54da84 View commit details
    Browse the repository at this point in the history
Loading