soc progress 5
Last week has been more active again. I have mostly finished my 2.3 agenda, I have been thinking about the repository format and I have started doing some post-2.3 work as well.
In the pre-2.3 department, these have been mostly API cleanups and finishing bits. There are also few pieces missing in the puzzle. I wanted to add a magic word to the start of the index, so we could quickly identify an old (or future) version and discard it immediately (triggering a full index rebuild). This is not particularly destructive, since we can easily use a magic word that won’t match any realistic index start (for the current format). The other thing that missed the 2.3 beta 1 is the endianity conversion for the on-disk format. This can be done together with the magic word.
For repository format, so far I have been thinking about how to efficiently pack mostly static data (think git object database) on-the-fly (as opposed to manual, git-gc style approach) while avoiding excessive re-downloading (and re-packing). So far, the best I could think of was that I can keep say at most 16 objects in completely unpacked form. When this threshold is reached, I take 8 of these files and pack them up into a single indexed object (with these 8 items as sub-objects). Now I can have a fixed-size header, keeping hashes and offsets (and sizes) of those 8 sub-objects. This would work recursively: composed objects could be again composed to bigger composed objects. This way, we would have around 16 files on average representing the whole repository. It would also be easy to only download the relevant parts of the newly appeared files from remote repositories: we can grab the header and then the unknown sub-objects with http range request. We would also need to map from primitive object hashes to their current locations (basically all their parent patches). I’ll have to think more about a suitable data structure for this purpose. Finally, we would still need a gc-style command, since an object filesystem used by darcs would accumulate unreferenced garbage. Especially if we also used the system for pristine cache. Moreover, the purely academical N-ary tree approach would suffer from performance problems, so some real-worldly hacks will be needed to make things work out in practice. (But the tree structure should be useful to show some bounds on the complexity of particular operations.)
Finally, for the post-2.3 bits. In darcs-hs, I have bitten the bullet and flipped all unrecorded-state (basically pristine -> working copy diffing) machinery over to Gorsvet’s unrecordedState (implemented using Index and hashed-storage). This might have introduced some performance regressions, sadly. However, the thing now completely passes the testsuite (after I fixed a bug in the mmap package… I have to submit a patch to the upstream author). Nevertheless, this also means obliteration of a chunk of old code, and a complete fix for the timestamp de-synchronisation issue of current darcs. There’s still a bunch of work to do, which would allow complete removal of unsafeDiff and a bunch of related functionality.
Finally, changes for this week… hashed-storage:
- Move darcs-specific utilities to separate module (Storage.Hashed.Darcs).
- Export the TreeIO alias from Monad.
- Also parametrise the Tree hashing function in readIndex.
- Replace all unfold terminology with expand (breaks API).
- Remove unused bit in Index.
- Fix a silly bug in AnchoredPath parents.
- Fix compilation of tests.
- Further simplify AnchoredPath parents.
- Do not forget to include Storage.Hashed.Test in distribution.
- Fix AnchoredPath parents again.
- Bump version to 0.3.3.1.
- Fix build with GHC 6.8.2 (needs extension field in cabal). Bump version.
… and darcs-hs:
- Basic “show index” implementation.
- Also curse haskell_policy in Czech.
- Clean up unused bits in Darcs.Gorsvet.
- Use TreeIO alias in instance declarations (do not spell out the type).
- Import darcsFormatHash from Storage.Hashed.Darcs.
- Update to reflect Index API change, provide darcs-specific readIndex in Gorsvet.
- Unfold has been renamed to ‘expand’ in Storage.Hashed.Tree.
- Also provide “darcs show pristine” to go with darcs show index.
- Put blank lines between command groups in “darcs help”.
- Cut down descriptions, so that darcs help does not wrap on an 80-column TTY.
- Make “darcs clone” a hidden alias for “darcs get”.
- Flip “darcs changes” to index-based diffing.
- Flip “darcs mark-conflicts” over to index-based diffing.
- Use index-based diffing in Remove.
- Flip AmendRecord to index-based diffing, too.
- Use index-based diffing in unrevert.
- Make revert use index-based diffing.
- Also use index-based diffing in unrecord/obliterate.
- Provide readRecorded in Gorsvet as well.
- Factor out applyToTree in Gorsvet.
- Use index-based diffing in “darcs wh -l”.
- Unexport get_unrecorded* from Repository, remove unused functions from Internal.
- Move tentativelyMergePatches and friends to a new module, Repository.Merge.
- Move add_to_pending to Repository, use unrecordedChanges.
- Clean up unused bits from Repository.Internal.
- Invalidate the index in add_to_pending, as it was getting rebuilt too soon.
- Remove unused import from Gorsvet.
And I need to sleep now. I’m in Berlin now, so I’ll be probably fairly unproductive till about Saturday. I’ll sort out the 2.3 beta 1 tomorrow, since I really really need to sleep now. Goodnight!