Commit graph

27 commits

Author SHA1 Message Date
epriestley db2425b300 Do initial repository imports at a lower priority and finish importing commits before starting new ones
Summary:
Fixes T11677. This makes two minor adjustments to the repository import daemons:

  - The first step ("Message") now queues at a slightly-lower-than-default (for already-imported repositories) or very-low (for newly importing repositories) priority level.
  - The other steps now queue at "default" priority level. This is actually what they already did, but without this change their behavior would be to inherit the priority level of their parents.

This has two effects:

  - When adding new repositories to an existing install, they shouldn't block other things from happening anymore.
  - The daemons will tend to start one commit and run through all of its steps before starting another commit. This makes progress through the queue more even and predictable.
    - Before, they did ALL the message tasks, then ALL the change tasks, etc. This works fine but is confusing/uneven/less-predictable because each type of task takes a different amount of time.

Test Plan:
  - Added a new repository.
  - Saw all of its "message" steps queue at priority 4000.
  - Saw followups queue at priority 2000.
  - Saw progress generally "finish what you started" -- go through the queue one commit at a time, instead of one type of task at a time.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11677

Differential Revision: https://secure.phabricator.com/D16585
2016-09-21 16:41:01 -07:00
epriestley d952dd5912 When importing Git repositories, treat out-of-range timestamps as the current time
Summary:
Fixes T11537. See that task for discussion.

Although we could accommodate these faithfully, it requires a huge migration and affects one repository on one install which was written with buggy tools.

At least for now, just replace out-of-32-bit-range epoch values with the current time, which is often somewhat close to the real value.

Test Plan:
  - Following the instructions in T11537, created commits in 40,000 AD.
  - Tried to import them, reproducing the "epoch" database issue.
  - Applied the patch.
  - Successfully imported future-commits, with some liberties around commit dates. Note that author date (not stored in an `epoch` column) is still shown faithfully:

{F1789302}

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T11537

Differential Revision: https://secure.phabricator.com/D16456
2016-08-26 07:38:53 -07:00
Aviv Eyal c56a4fce66 Only load refs that are actual commits
Summary: Fix T11301. Git is git.

Test Plan: tagged a file! run discover. no crash.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin

Maniphest Tasks: T11301

Differential Revision: https://secure.phabricator.com/D16261
2016-07-08 22:34:25 +00:00
epriestley 5ffdb73273 Don't try to prune unreachable commits from repositories with no outdated refs
Summary:
Fixes T11269. The basic issue is that `git log` in an empty repository exits with an error message.

Prior to recent Git (2.6?), this message reads:

> fatal: bad default revision 'HEAD'

This message was somewhat recently changed by <ce11360467>. After that, it reads:

> fatal: your current branch 'master' does not have any commits yet

This change isn't //technically// a //complete// fix because you could still hit this issue like this:

  - Create an empty repository.
  - Push some stuff to `master`.
  - Delete `master`.

However, this is very rare and even in this case the repository will fix itself once you push something again. We can try to fix that if any users ever actually hit it.

Test Plan:
  - Created a new empty Git repository.
  - Ran `bin/repository update Rxx`.
  - Before patch: "git log" error because of the empty repository.
  - After patch: clean update.
  - Also ran `repository update` on a non-empty repository.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11269

Differential Revision: https://secure.phabricator.com/D16234
2016-07-05 09:09:46 -07:00
epriestley 9a2c2505a0 Handle tag tags properly in discovery
Summary:
Fixes T11180. In Git, it's possible to tag a tag (????). When you do, we try to log the tag-object, which automatically resolves to the commit and fails.

Just skip these. If "A" points at "B" which points at "C", it's fine to ignore "A" and "B" since we'll get the same stuff when we process "C".

Test Plan:
  - Tagged a tag.
  - Pushed it.
  - Discovered it.
  - Before patch: got exception similar to the one in T11180.
  - After patch: got tag-tag skipped. Also got slightly better error messages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11180

Differential Revision: https://secure.phabricator.com/D16149
2016-06-20 11:10:02 -07:00
epriestley 1c63ac6a3a When a ref is moved or deleted, put it on a list; later, check for reachability
Summary:
Ref T9028. This allows us to detect when commits are unreachable:

  - When a ref (tag, branch, etc) is moved or deleted, store the old thing it pointed at in a list.
  - After discovery, go through the list and check if all the stuff on it is still reachable.
  - If something isn't, try to follow its ancestors back until we find something that is reachable.
  - Then, mark everything we found as unreachable.
  - Finally, rebuild the repository summary table to correct the commit count.

Test Plan:
  - Deleted a ref, ran `pull` + `refs`, saw oldref in database.
  - Ran `discover`, saw it process the oldref, mark the unreachable commit, and update the summary table.
  - Visited commit page, saw it properly marked.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9028

Differential Revision: https://secure.phabricator.com/D16133
2016-06-16 11:21:38 -07:00
epriestley 77ee518d88 Make daemons ignore "Unreachable" commits and avoid duplicate work
Summary:
Ref T9028. This improves the daemon behavior for unreachable commits. There is still no way for commits to become marked unreachable on their own.

  - When a daemon encounters an unreachable commit, fail permanently.
  - When we revive a commit, queue new daemons to process it (since some of the daemons might have failed permanently the first time around).
  - Before doing a step on a commit, check if the step has already been done and skip it if it has. This can't happen normally, but will soon be possible if a commit is repeatedly deleted and revived very quickly.
  - Steps queued with `bin/repository reparse ...` still execute normally.

Test Plan:
  - Used `bin/repository reparse` to run every step, verified they all mark the commit with the proper flag.
  - Faked the `reparse` exception in the "skip step" code, used `repository reparse` to skip every step.
  - Marked a commit as unreachable, ran `discover`, saw daemons queue for it.
  - Ran daemons with `bin/worker execute --id ...`, saw them all skip + queue the next step.
  - Marked a commit as unreachable, ran `bin/repository reparse` on it, got permanent failures immediately for each step.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9028

Differential Revision: https://secure.phabricator.com/D16131
2016-06-16 11:20:56 -07:00
epriestley ec89c7d63e Add an "Unreachable" flag for commits and revive them during discovery
Summary:
Ref T9028. This is the easy part of dealing with deleted commits:

  - Add a flag for unreachable commits (nothing sets this flag yet).
  - Ignore unreachable commits when querying for known commits during discovery, so we pretend they do not exist.
  - When recording a commit, try just reviving an existing unreachable commit first. If that works, bail out.

Test Plan:
  - Artificially marked a commit as unreachable with raw SQL.
  - Verified it said "deleted: unreachable" in the UI.
  - Ran `repository discover --trace --verbose`.
  - Saw the discovery process ignore the commit when filling the cache.
  - Saw the discovery process revive the commit instead of trying to record it again.
  - Web UI now shows the commit as normal.
  - Running `repository discover` again doesn't make any further changes.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9028

Differential Revision: https://secure.phabricator.com/D16130
2016-06-16 11:20:37 -07:00
epriestley 2949905c04 Fetch and discover all Git ref types, not just branches
Summary:
Ref T9028. Fixes T6878. Currently, we only fetch and discover branches. This is fine 99% of the time but sometimes commits are pushed to just a tag, e.g.:

```
git checkout <some hash>
nano file.c
git commit -am '...'
git tag wild-wild-west
git push origin wild-wild-west
```

Through a similar process, commits can also be pushed to some arbitrary named ref (we do this for staging areas).

With the current rules, we don't fetch tag refs and won't discover these commits.

Change the rules so:

  - we fetch all refs; and
  - we discover ancestors of all refs.

Autoclose rules for tags and arbitrary refs are just hard-coded for now. We might make these more flexible in the future, or we might do forks instead, or maybe we'll have to do both.

Test Plan:
Pushed a commit to a tag ONLY (`vegetable1`).

<cf508b8de6>

On `master`, prior to the change:

  - Used `update` + `refs` + `discover`.
  - Verified tag was not fetched with `git for-each-ref` in local working copy and the web UI.
  - Verified commit was not discovered using the web UI.

With this patch applied:

  - Used `update`, saw a `refs/*` fetch instead of a `refs/heads/*` fetch.
  - Used `git for-each-ref` to verify that tag fetched.
  - Used `repository refs`.
  - Saw new tag appear in the tags list in the web UI.
  - Saw new refcursor appear in refcursor table.
  - Used `repository discover --verbose` and examine refs for sanity.
  - Saw commit row appear in database.
  - Saw commit skeleton appear in web UI.
  - Ran `bin/phd debug task`.
  - Saw commit fully parse.

{F1689319}

Reviewers: chad

Reviewed By: chad

Subscribers: avivey

Maniphest Tasks: T6878, T9028

Differential Revision: https://secure.phabricator.com/D16129
2016-06-16 11:20:05 -07:00
epriestley f5f784f4c1 Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.

We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.

For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").

This implements a simple way to version an observed repository:

  - Take the head of every branch/tag.
  - Look them up.
  - Pick the biggest internal ID number.

This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.

Test Plan:
  - Created an observed repository.
  - Ran `bin/repository update` and observed a sensible version number appear in the version table.
  - Pushed to the remote, did another update, saw a sensible update.
  - Did an update with no push, saw no effect on version number.
  - Toggled repository to hosted, saw the version reset.
  - Simulated read traffic to out-of-sync node, saw it do a remote fetch.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15986
2016-05-30 09:53:01 -07:00
epriestley 1c73ad6a1b Make repository daemon locks more granular and forgiving
Summary:
Ref T4292. Currently, we hold one big lock around the whole `bin/repository update` workflow.

When running multiple daemons on different hosts, this lock can end up being contentious. In particular, we'll hold it during `git fetch` on every host globally, even though it's only useful to hold it locally per-device (that is, it's fine/good/expected if `repo001` and `repo002` happen to be fetching from a repository they are observing at the same time).

Instead, split it into two locks:

  - One lock is scoped to the current device, and held during pull (usually `git fetch`). This just keeps multiple daemons accidentally running on the same host from making a mess when trying to initialize or update a working copy.
  - One lock is scoped globally, and held during discovery. This makes sure daemons on different hosts don't step on each other when updating the database.

If we fail to acquire either lock, assume some other process is legitimately doing the work and bail more quietly instead of fataling. In approximately 100% of cases where users have hit this lock contention, that was the case: some other daemon was running somewhere doing the work and the error didn't actually represent an issue.

If there's an actual problem, we still raise a diagnostically useful message if you run `bin/repository update` manually, so there are still tools to figure out that something is hung or whatever.

Test Plan:
  - Ran `bin/repository update`, `pull`, `discover`.
  - Added `sleep(5)`, forced processes to contend, got lock exceptions and graceful exit with diagnostic message.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15903
2016-05-13 05:17:27 -07:00
epriestley 71a97d8af5 When observing a repository, switch to "importing" mode on a large discovery in an empty repository
Summary:
Ref T10923. Fixes T9554.

When hosting a repository, we currently have a heuristic that tries to detect when you're doing an initial import: if you push more than 7 commits to an empty repository, it counts as an import and we disable mail/feed/etc.

Do something similar for observed repositories: if the repository is empty and we discover more than 7 commits, switch to import mode until we catch up.

This should align behavior with user expectation more often when juggling hosted vs imported repositories.

Test Plan:
  - Created a new hosted repository.
  - Activated it and allowed it to fully import.
  - Added an "Observe URI".
  - Saw it automatically drop into "Importing" mode until the import completed.
  - Swapped it back to hosted mode.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9554, T10923

Differential Revision: https://secure.phabricator.com/D15877
2016-05-11 06:36:38 -07:00
epriestley d9e034f02c Remove calls to getCallsign() from repository daemons
Summary: Ref T4245. These are all descriptive or UI-facing.

Test Plan:
- Ran `bin/repository pull ...` with various identifiers.
- Ran `bin/repository mirror ...` with various identifiers.
- Ran `bin/repository discover ...` with various identifiers.
- Ran `bin/phd debug pull X Y --not Z` with various identifiers.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4245

Differential Revision: https://secure.phabricator.com/D14926
2016-01-02 11:03:51 -08:00
Joshua Spence 36e2d02d6e phtize all the things
Summary: `pht`ize a whole bunch of strings in rP.

Test Plan: Intense eyeballing.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D12797
2015-05-22 21:16:39 +10:00
Bob Trahan d2ea0bc5f0 Diffusion - add verifyGitOrigin check to git fetch operation
Summary: Fixes T4946. Theoretically.

Test Plan:
iiam

also unit tests.

also

```
cd /var/repo/X
git remote remove origin # simulates origin-missing clone under 1.7.1
cd /path/to/phabricator
./bin/repository pull X
```

and observed no errors

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, epriestley

Maniphest Tasks: T4946, T5938

Differential Revision: https://secure.phabricator.com/D10855
2014-11-14 18:02:27 -08:00
Joshua Spence 8fd098329b Rename AphrontQueryException subclasses
Summary: Ref T5655. Depends on D10149.

Test Plan: Ran `arc unit`

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin, hach-que

Maniphest Tasks: T5655

Differential Revision: https://secure.phabricator.com/D10150
2014-08-06 07:51:21 +10:00
Joshua Spence 0a62f13464 Change double quotes to single quotes.
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.

Test Plan: Eyeballed it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9431
2014-06-09 11:36:50 -07:00
epriestley e4ea092f60 Implement a chunked, APC-backed graph cache
Summary:
Ref T2683. This is a refinement and simplification of D5257. In particular:

  - D5257 only cached the commit chain, not path changes. This meant that we had to go issue an awkward query (which was slow on Facebook's install) periodically while reading the cache. This was reasonable locally but killed performance at FB scale. Instead, we can include path information in the cache. It is very rare that this is large except in Subversion, and we do not need to use this cache in Subversion. In other VCSes, the scale of this data is quite small (a handful of bytes per commit on average).
  - D5257 required a large, slow offline computation step. This relies on D9044 to populate parent data so we can build the cache online at will, and let it expire with normal LRU/LFU/whatever semantics. We need this parent data for other reasons anyway.
  - D5257 separated graph chunks per-repository. This change assumes we'll be able to pull stuff from APC most of the time and that the cost of switching chunks is not very large, so we can just build one chunk cache across all repositories. This allows the cache to be simpler.
  - D5257 needed an offline cache, and used a unique cache structure. Since this one can be built online it can mostly use normal cache code.
  - This also supports online appends to the cache.
  - Finally, this has a timeout to guarantee a ceiling on the worst case: the worst case is something like a query for a file that has never existed, in a repository which receives exactly 1 commit every time other repositories receive 4095 commits, on a cold cache. If we hit cases like this we can bail after warming the cache up a bit and fall back to asking the VCS for an answer.

This cache isn't perfect, but I believe it will give us substantial gains in the average case. It can often satisfy "average-looking" queries in 4-8ms, and pathological-ish queries in 20ms on my machine; `hg` usually can't even start up in less than 100ms. The major thing that's attractive about this approach is that it does not require anything external or complicated, and will "just work", even producing reasonble improvements for users without APC.

In followups, I'll modify queries to use this cache and see if it holds up in more realistic workloads.

Test Plan:
  - Used `bin/repository cache` to examine the behavior of this cache.
  - Did some profiling/testing from the web UI using `debug.php`.
  - This //appears// to provide a reasonable fast way to issue this query very quickly in the average case, without the various issues that plagued D5257.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley, jhurwitz

Maniphest Tasks: T2683

Differential Revision: https://secure.phabricator.com/D9045
2014-05-12 11:47:23 -07:00
epriestley 95eab2f3b0 Record parent relationships when discovering commits
Summary:
Ref T4455. This adds a `repository_parents` table which stores `<childCommitID, parentCommitID>` relationships.

For new commits, it is populated when commits are discovered.

For older commits, there's a `bin/repository parents` script to rebuild the data.

Right now, there's no UI suggestion that you should run the script. I haven't come up with a super clean way to do this, and this table will only improve performance for now, so it's not important that we get everyone to run the script right away. I'm just leaving it for the moment, and we can figure out how to tell admins to run it later.

The ultimate goal is to solve T2683, but solving T4455 gets us some stuff anyway (for example, we can serve `diffusion.commitparentsquery` faster out of this cache).

Test Plan:
  - Used `bin/repository discover` to discover new commits in Git, SVN and Mercurial repositories.
  - Used `bin/repository parents` to rebuild Git and Mercurial repositories (SVN repos just exit with a message).
  - Verified that the table appears to be sensible.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: jhurwitz, epriestley

Maniphest Tasks: T4455

Differential Revision: https://secure.phabricator.com/D9044
2014-05-12 11:47:22 -07:00
epriestley 417056932e Make discovery slightly cheaper in the common case
Summary:
Ref T4605. Before discovering branches, try to prefill the cache in bulk. For repositories with large numbers of branches, this allows us to issue dramatically fewer queries.

(Before D8780, this cache was usually held across discovery events, so being able to fill it cheaply was not as relevant.)

Test Plan: Ran discovery on Git, Mercurial and SVN repositories. Observed fewer queries for Git/Mercurial.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T4605

Differential Revision: https://secure.phabricator.com/D8781
2014-04-16 13:00:38 -07:00
epriestley d86bb086ca Reduce initial discovery from O(branches * commits) to O(commits)
Summary:
Fixes T4414. Currently, when we discover a new repository, we do something like this:

  foreach (branch) {
    foreach (commit on this branch) {
      do_something();
    }
  }

In cases where there are a lot of branches which mostly just branch `master`, this leads to us doing roughly `O(branches * commits)` work.

We have a `commitCache` to prevent this, but it has two problems:

  - It only fills out of the DB, and we do this whole thing before writing to the DB, which is the entire point.
  - It has a fixed size (2048) and on initial discovery we're usually dealing with way more commits than that.

Instead, also stop doing work if we hit a commit which is known already.

Test Plan:
  - Added `print` on the number of discovered refs and number of unique refs.
  - Ran `bin/repository discover --repair X` on a repo with several branches.
  - Before the patch, got 397 refs and 135 unique refs.
  - After the patch, got 135 refs and 135 unique refs.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4414

Differential Revision: https://secure.phabricator.com/D8374
2014-02-28 13:02:14 -08:00
epriestley c9a0ffa1cf Verify that SVN repository roots really are repository roots
Summary: Fixes T3238. Ref T4327. Although the instructions are fairly clear on this, it's easy to miss them. Make sure the root the user enters matches the real root.

Test Plan: Added unit tests. Used `bin/repository discover` to hit the check explicitly.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T3238, T4327

Differential Revision: https://secure.phabricator.com/D8020
2014-01-21 14:02:58 -08:00
epriestley 8520d9e070 Move more discovery responsibilities into DiscoveryEngine
Summary:
Ref T4327. This moves the last pieces of discovery responsibility out of the PullLocal daemon and into the DiscoveryEngine.

(This makes it easier to discover repositories in unit tests in the future, since we don't need to build a PullLocal daemon and can just build a DiscoveryEngine.)

Test Plan: Ran `phd debug pulllocal`. Ran `repostory discover`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4327

Differential Revision: https://secure.phabricator.com/D7997
2014-01-17 16:09:24 -08:00
epriestley 220addb249 Move Git discovery into DiscoveryEngine
Summary:
Ref T4327. Consolidates and simplifies infrastructure:

  - Moves Git discovery into DiscoveryEngine.
  - Collapses a bunch of the Git and Mercurial code related to stream discovery.
  - Removes all cach code from PullLocal daemon (it's no longer called).
  - Adds basic unit tests for Git and Mercurial discovery.

Various cleanup:

  - Makes GitStream and MercurialStream extend a common base.
  - Improves performance of MercurialStream in some cases, by requiring fewer commits be output and parsed.
  - Makes mirroring exceptions easier to debug.
  - Fixes discovery of Mercurial repositories with multiple branch heads.
  - Adds some missing `pht()`.

Test Plan:
I tested this fairly throughly because I think this phase is complete:

  - Made new repositories in multiple VCSes and did full imports.
    - Particularly, I reimported Arcanist to make sure that TODO was resolved. I think it was related to the toposort stuff.
  - Pushed commits to multiple VCSes.
  - Pushed commits to a non-close branch, then pushed a merge commit. Observed commits import initially as non-close, then get flagged for close.
  - Started full daemons and resolved various minor issues that showed up in the daemon log until everything ran cleanly.
  - Basically spent about 30 minutes banging on this in every way I could think of to try to break it. I found and fixed some minor stuff, but it seems solid.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4327

Differential Revision: https://secure.phabricator.com/D7987
2014-01-17 11:48:53 -08:00
epriestley e99c53da2e Fix an issue with SVN path construction in the presence of subpath configuration
Summary: D7590 made path construction more consistent, but affected this callsite if a subpath is configured. Currently, we end up with double `@@` in the URI.

Test Plan:
  - Ran unit tests.
  - Ran `bin/repostitory discover`.

Reviewers: staticshock, btrahan

Reviewed By: btrahan

CC: aran

Differential Revision: https://secure.phabricator.com/D7619
2013-11-21 14:41:38 -08:00
epriestley 5b873a74de Move Mercurial discovery to PhabricatorRepositoryDiscoveryEngine
Summary: Ref T4068. Partly, this moves discovery to the more unit-testable PhabricatorRepositoryDiscoveryEngine. It also fixes some issues, see inlines.

Test Plan: In a Mercurial repository, ran `bin/repository discover --repair`, verified commits came out topographically sorted. Ran without `--repair` and in various other contexts, like with no commits to discover and some-but-not-all commits to discover.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4068

Differential Revision: https://secure.phabricator.com/D7518
2013-11-06 17:20:22 -08:00
epriestley e7fde9a77c Make repository discovery partially testable
Summary:
Ref T2784. Begins pulling discovery into Engines and covering it with tests. In particular:

  - Discovery is currently a one-shot process where we find all the new commits and write them to the database in one go. Split it apart so we find and return the new commits first, then write them to the database separately. This makes things simpler and more testable.
  - This diff only brings SVN into an engine (and only the "find the commits" part), since it's simpler than Git or Mercurial.
  - Creates a base Engine class and moves common functionality there.
  - Restores the `--verbose` flag to `repository pull`.

Test Plan: Added unit tests. Ran `bin/repository discover`. Ran `bin/phd debug pulllocal`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2784

Differential Revision: https://secure.phabricator.com/D5906
2013-05-12 19:08:37 -07:00