============ Using bfiles ============ ``bfiles`` is a Mercurial extension for handling large binary files. Such files tend to be: * not very compressible * not very "diffable" (small modifications can result in unexpectedly large deltas) * not at all mergeable Mercurial was not designed to handle large binary files. This shows in a number of ways: * Internally, Mercurial generally reads file contents entirely into memory; for doing diffs and merges, it reads two whole revisions into memory. * Mercurial's revlog format is based on compressed deltas. This doesn't work very well with non-diffable, non-compressible data. Large binary files with lots of history can take up quite a lot of space in the repository. * Mercurial's distributed nature means that the overhead of all that history is repeated for every clone, making a bad situation worse. The goal of ``bfiles`` is to avoid these problems by taking large binary files entirely out of Mercurial's repository (``.hg/store/data``). Instead, their history lives in a *central store* somewhere, and ``bfiles`` downloads only the revisions you need when you need them. Thus, ``bfiles`` introduces old-fashioned client/server centralized version control for large files. See design.txt for information on the design of ``bfiles``. Getting started =============== In the best case, you have an existing Mercurial repository with no ugly history of large files added by mistake, and you want to start adding new large files to it. Obviously, you have to enable ``bfiles`` for that repository by editing ``.hg/hgrc``:: [extensions] bfiles = /path/to/hg-bfiles/bfiles.py Then you can add big files with the ``bfadd`` command:: hg bfadd file ... All ``bfiles`` commands start with the ``bf`` prefix. (The **b** stands for *big* or *binary*, as you wish.) Like the core ``add`` command, ``bfadd`` does not commit anything. Instead, it creates a *standin file* for each big file and adds it. Mercurial then tracks the standin file like any other file. For example, the command :: hg bfadd lib/biglib1.jar lib/biglib2.jar adds ``.hgbfiles/lib/biglib1.jar`` and ``.hgbfiles/lib/biglib2.jar``. Each standin file contains just the SHA-1 hash of the corresponding big file. You can see the result of ``bfadd`` in two ways. First, the regular ``status`` command shows the addition of two standins:: $ hg status A .hgbfiles/lib/biglib1.jar A .hgbfiles/lib/biglib2.jar and the ``bfstatus`` command gives information about the big files:: $ hg bfstatus BPA lib/biglib1.jar BPA lib/biglib2.jar The first column of ``bfstatus`` is always ``B``, to remind you that these are big files. The second column lets you know if you have uploaded this revision to the central store yet: ``P`` means *pending*, i.e. you still have to run ``bfput`` to upload this revision. The third column corresponds to ``hg status`` output: ``A`` for added, ``M`` for modified, ``C`` for clean, and ``!`` for missing. (It's impossible for big files to be unknown or ignored.) Committing and uploading ------------------------ At this point you can commit your change:: $ hg commit -m"Add libraries." Note that this creates a changeset that references big files not yet uploaded to the central store. If that changeset escapes into the wild (push, pull, clone), then someone will have a repository with the ``bfiles`` equivalent of dangling links. So you really should upload your files before pushing your changeset. Before you can do that, you need to edit ``.hg/hgrc`` and configure the central store location:: [bfiles] store = /home/hg/bfiles-store (This can be a local filesystem path, an SSH URL, or an HTTP URL. The "local" path could of course be a remote NFS or SMB filesystem. Support for uploading files via HTTP is not yet implemented.) Now you can upload your two pending revisions to your central store:: $ hg bfput -v lib/biglib1.jar (b93589dc8c7e14d5afe51291abf3f6cc1c0b2b91) lib/biglib2.jar (a6022036ed0d504ac6bdf81886aa51532d9f74f0) (``bfput -v`` reports the revision ID of each uploaded revision, since it's possible to have multiple pending revisions of a single big file.) You can also pass files or directories to ``bfput`` to upload only certain big files. (If you have multiple pending revisions for one file, though, there's no way to select which revisions to upload.) Modifying big files ------------------- When you modify a big file, the ``bfstatus`` command notices:: [...update lib/biglib1.jar...] $ hg bfstatus B-M lib/biglib1.jar Here, the second column is not ``P``: despite your modification, ``bfiles`` is not aware of a revision pending upload. You have to use ``bfrefresh`` to update the corresponding standin file:: $ hg bfrefresh -v lib/biglib1.jar lib/biglib1.jar 1 files refreshed, 0 files unchanged $ hg bfstatus BPM lib/biglib1.jar Note the change in status: until you ``bfrefresh``, ``bfiles`` assumes that you are still actively editing the file. Once you ``bfrefresh``, that implies that you are done editing it and ready to commit. So ``bfstatus`` now reports ``P`` (pending revision(s)) plus ``M`` (modified). After this, you can ``bfput`` and ``commit`` as above. (The downside of ``commit`` first, as discussed above, is that you will create a changeset that references big file revisions not yet in the central store. The downside of ``bfput`` first is that you might upload a useless revision: if you change your mind and edit the file again before committing, the revision you upload takes up space in the central store without actually being referenced by any changesets.) If you run ``bfrefresh`` without arguments (no explicit filenames), it checks all big files and refreshes only those that have been modified. Implementation note: pending revisions -------------------------------------- Whenever you ``bfadd`` or ``bfrefresh``, ``bfiles`` copies the big file(s) referenced into ``.hg/bfiles/pending``. There are a couple of reasons for doing this: * If you ``bfadd`` or ``bfrefresh`` a file, modify it, and then commit, then you've created a changeset that references a big file revision that no longer exists. It's impossible to upload that revision to the central store because it's been edited out of existence. Keeping a copy in ``.hg/bfiles/pending`` prevents this. * It's how ``bfiles`` knows that you have pending revisions to upload. The obvious downside to this is the time and disk space required to copy big files whenever you ``bfadd`` or ``bfrefresh``. Using an existing repository ============================ If you need to work with an existing repository that uses ``bfiles``, your main task is to download the big files you need. After enabling the extension (see above), you can use ``bfstatus`` to see what big files you don't have:: $ hg bfsta B-! lib/biglib1.jar B-! lib/biglib2.jar As with ``status``, ``!`` means "missing". But in this case, you need to use ``bfupdate`` to download the missing file(s):: $ hg bfupdate -v Without any filenames, ``bfupdate`` downloads all big files required by the current changeset (as specified by the standin files in ``.hgbfiles/``) into your working copy. If you only need a subset of the big files, you can pass file or directory names:: $ hg bfupdate -v app1/stuff/lib Naturally, you can add or modify big files using ``bfadd`` and ``bfrefresh`` as described above. If you switch to another changeset with ``hg update``, then ``bfupdate`` may replace or remove existing big files, depending on the difference between the two changesets. Tighter integration =================== **Not implemented yet!** You've probably noticed that ``bfiles`` adds an extra layer on top of core Mercurial, requiring the user to be aware of which files are big files and which are normal files. It's nice to have this layer to interact directly with ``bfiles`` for troubleshooting, or if you only want to download big files in certain directories. But most users, most of the time, want big file handling to be smoothly integrated into the core Mercurial command set. Thus, ``bfiles`` provides three configuration options for various degrees of integration. ``autoupdate`` (boolean, default false) If true: modifies the ``update`` command to run ``bfupdate`` after updating normal files. Also modifies commands that do an implicit update, like ``pull -u`` or ``clone`` (without ``-U``). ``autorefresh`` (boolean, default false) If true: modifies the ``commit`` command to run ``bfrefresh`` before committing. This lets you modify big files and then immediately commit the changes. ``autoput`` (string, default ``no``) If ``push``: modifies the ``push`` command to run ``bfput`` before pushing to certain repositories. If ``commit``: modifies the ``commit`` command to run ``bfput`` after committing changesets that include ``.hgbfiles/`` standin files. For maximum integration, then, you would edit ``.hg/hgrc`` and add:: [bfiles] autostatus = true autoupdate = true autorefresh = true autoput = push # or "commit" if you prefer to bfput early Keep in mind that the above options are **not implemented yet!** Other commands ============== You can check that your current ``.hgbfiles/`` tree is consistent with the central store:: hg bfverify This merely checks for the existence of every file/revision listed in ``.hgbfiles/``. To check that the file contents match the SHA-1 hashes:: hg bfverify --contents ``bfverify --contents`` transfers the current revision of every file from the central store, so you might only want to run it on the server where the central store lives. If the central store is a filesystem path (e.g. you are on the server), you can run hg bfverify --unused to scan for big file revisions in the central store that are not referenced by any changeset. To verify every changeset in your repository, use ``bfverify --all``. [**Not implemented yet!**] You can fetch specific revisions of a particular large file:: hg bfcat -rREV FILE That looks up the standin for ``FILE`` from ``REV`` to determine which revision of ``FILE`` to fetch. Then it downloads that revision and writes it to stdout. It interacts with the central store in exactly the same way as ``bfupdate``.