How do medium to large development teams constantly push changes to a DVCS?

The company I work for is attempting to get off of StarTeam, a centralized VCS, and hopefully into something better. I don't want to just pick SVN because it's the "easy option" (most similar to what we do today). Can someone help me understand how a medium to large team of developers changing roughly 150 files per day can operate on a DVCS like Git or Mercurial?

We have:

  • 50 developers working on a single repository that has
  • 15 years of history
  • 15,000 files (mostly code/text)
  • Approximately 150 single file check-ins a day
  • Need to pull changes from central every day due to our unique language's requirements and data structure
  • After days of research and experimenting, I understand the processes and workflows of both Git and Hg. What I don't understand is how large teams that need to make changes, often to single files, very quickly can operate with the restraints of the systems?

    For example: I might be working on 3 different things and a developer comes up to me presenting a small issue. I do a little research, make the necessary 1 line change, and check in the file. From what I can tell, with git I would need to commit the change, stash all my other changes, pull, push, apply stash, deal with any merges applying my stash over recently pulled code might present. Is there a better way?

    I've read the article about Facebook switching from Git to Mercurial. How do they manage thousands of check-ins a day on a 40,000 file repository? I can't fathom how that would work with the pull/push model of a DVCS. There must be something I'm missing about the workflow or functionality of these systems. Maybe they don't need to pull/re-base every day?

    Any help is appreciated.


    I've read the article about Facebook switching from Git to Mercurial. How do they manage thousands of check-ins a day on a 40,000 file repository? I can't fathom how that would work with the pull/push model of a DVCS. There must be something I'm missing about the workflow or functionality of these systems. Maybe they don't need to pull/re-base every day?

    First of all, Facebook doesn't use Mercurial in a purely DVCS fashion. They use the remotefilelog extension (which they wrote themselves) to only pull changesets on demand (this works because of how Mercurial stores history so that history accesses can mostly be localized). They also use MySQL and memcached for their backend servers to improve scalability.

    For the rebase/merge bottleneck (when dozens of developers need to integrate their work in the main branch simultaneously), they have a work in progress feature to have this done mostly server-side; I note that this bottleneck is in part the result of (1) using a monorepo and (2) using trunk-based development, a problem that not every large organization will have.

    For example: I might be working on 3 different things and a developer comes up to me presenting a small issue. I do a little research, make the necessary 1 line change, and check in the file. From what I can tell, with git I would need to commit the change, stash all my other changes, pull, push, apply stash, deal with any merges applying my stash over recently pulled code might present. Is there a better way?

    You can use the new Git worktree feature (or, for Mercurial, hg share ) to have multiple checkouts of the same repository and work on them independently. Bazaar and Fossil have always been able to do that natively; Fossil also has the autosync feature to operate in an SVN-like fashion (with caveats) and Bazaar has always been able to work in a quasi-centralized fashion with commits going directly to the server. In particular, having "only a single checkout per repository" is largely a historical Git artifact and not representative of DVCS systems (and, as I noted, gone as of the more recent Git versions).

    That said, you generally have to commit -> pull -> merge or rebase -> push for changes to the main branch; this is a tradeoff. The distributed model allows you have to have easy local versioning so that work in progress doesn't go straight to the main repository before it is ready. It also generally assumes that most of your work will happen on a personal branch so that 90% of the time it's really just commits (with the occasional push).

    Note that you won't have more or less conflicts than in a centralized VCS; they may just appear at different points.


    All that said, if you're happy with your current VCS and you can't identify unambiguous benefits from switching, it's probably not worth changing your VCS. Switching VCSs incurs a measurable cost (rebuilding your repo, retraining your developers, adjusting workflows, adjusting the surrounding tooling, unexpected transition issues) and these costs need to be offset by equally measurable benefits to make them worthwhile.


    I hope, it will be closed quickly as flame-war source, but...

    I don't want to just pick SVN because it's the "easy option"

    Excuse me, do you want to work productively or to suffer? Due to your last (not very clear) requirement real CVCS with rather good branching|merging may be better choice (see What does SVN do better than Git? fe) than any DVCS in pseudo-CVCS mode

    After days of research and experimenting, I understand the processes and workflows of both Git and Hg.

    Rather rough and shallow understanding, BTW. Even in Git you have

  • "Branches" (while hit-branches aren't branches per se for other DVCSes)
  • Mutable history
  • DAG
  • For example: I might be working on 3 different things

    VCS-agnostic way in common (with Mercurial in mind, git-boys may want to adopt it to Git)

    You created 3 branches ("Task-based development", "Branch per task" strategy) with some (any) amount of commits in each

    and a developer comes up to me presenting a small issue

  • commit your current Working dir "as is"
  • return to the point of divergence of your local history (last shared changeset?)
  • "do a little research, make the necessary 1 line change, and check in the file"
  • developer pull part of (your) repository with your last chanseset on top
  • you can return back to the breakpoint and continue to work

  • I might be working on 3 different things and a developer comes up to me presenting a small issue. I do a little research, make the necessary 1 line change, and check in the file. From what I can tell, with git I would need to commit the change, stash all my other changes, pull, push, apply stash, deal with any merges applying my stash over recently pulled code might present. Is there a better way?

    Yes: since Git 2.5, you can clone your repo once, but checkout multiple times.
    If you create a new branch, you can create it in a separate working tree folder, leaving your current environment undisturbed.

    See "Multiple working directories with Git?".

    链接地址: http://www.djcxy.com/p/23280.html

    上一篇: CruiseControl.NET ForceBuild上的无限循环

    下一篇: 大中型开发团队如何不断推动对DVCS的改变?