Summary:
Fixes T12145. Ref T13210. See PHI570. See PHI536.
Currently, when you give Drydock an Almanac host pool with more than one host, it never voluntarily builds a second host resource: there is no way to say "maximum X working copies per host" (only "maximum X global working copies") to make the first host overflow, and the allocator tries to pack resources as tightly as possible.
If you can force it to allocate the 2nd..Nth host, things will work reasonably well from there (it will spread working copies across the hosts randomly), but tricking it is very hard, especially before D19761.
To deal with this, give blueprints a new behavior around "supplemental allocations". The idea here is that a blueprint may decide that it would prefer to allocate a fresh new resource instead of allowing an otherwise valid acquisition to occur.
These supplemental allocations follow all the normal allocation rules (they can't exceed limits or actually replace existing resources), so they can only happen if there's free space in the resource pool. But a blueprint can elect for a supplemental allocation to provide a "grow the pool" hint.
The only useful policies here are probably "true" (immediately use all resources, like Almanac) or "false" (pack resources as efficiently as possible) but some other policies //might// be useful (perhaps "start growing the pool when we're getting a bit full even if we aren't at the limit yet, since our workload is bursty").
Then, give Almanac host resources a "true" policy (always allocate supplemental resources) so they use all hosts once a similar number of concurrent jobs arrive.
One aspect of this approach is that we only do supplemental resources if the normal allocation algorithm already decided that the best resource to acquire was part of the same blueprint. I started with an approach like "look at all the blueprints and see if any of them want to be greedy", but then a not-very-desirable blueprint would end up filling up its whole pool before we skipped the supplemental allocation part and ended up picking a different resource. That felt a bit silly and this feels a little cleaner and more focused.
Test Plan:
- Without changing the Almanac blueprint policy, allocated hosts. Got A, A, A, A, ... (second host never used).
- Changed the Almanac policy.
- Allocated hosts, got A, B, random mix of A and B.
- Destroyed B. Destroyed all leases on A. Allocated. Got A. This tests the "don't build a supplemental resource if there are no leases on the natural resource".
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210, T12145
Differential Revision: https://secure.phabricator.com/D19762
Summary:
This has been replaced by `PolicyCodex` after D16830. Also:
- Rebuild Celerity map to fix grumpy unit test.
- Fix one issue on the policy exception workflow to accommodate the new code.
Test Plan:
- `arc unit --everything`
- Viewed policy explanations.
- Viewed policy errors.
Reviewers: chad
Reviewed By: chad
Subscribers: hach-que, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D16831
Summary:
This search engine ports cleanly to Conduit out of the box.
Ref T11694
Test Plan: called the API method from the console, browsed blueprints in the ui
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Maniphest Tasks: T11694
Differential Revision: https://secure.phabricator.com/D16593
Summary:
Ref T10537. More infrastructure:
- Put a `bin/nuance` in place with `bin/nuance import`. This has no useful behavior yet.
- Allow sources to be searched by substring. This supports `bin/nuance import --source whatever` so you don't have to dig up PHIDs.
Test Plan:
- Applied migrations.
- Ran `bin/nuance import --source ...` (no meaningful effect, but works fine).
- Searched for sources by substring in the UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10537
Differential Revision: https://secure.phabricator.com/D15436
Summary:
Ref T10457.
- Let blueprints be tagged so you can search and annotate them a little more easily.
- Give each blueprint type an optional icon to make things a little easier to parse visually.
Test Plan:
- Tagged blueprints.
- Searched by tags.
- Looked at nice little icons.
{F1139712}
Reviewers: chad
Reviewed By: chad
Subscribers: yelirekim
Maniphest Tasks: T10457
Differential Revision: https://secure.phabricator.com/D15392
Summary:
Ref T10457. The ngram indexing seems to be working well; extend it into Drydock.
Also clean up the list controller a little bit.
Test Plan:
- Ran migrations.
- Searched for blueprints by name.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10457
Differential Revision: https://secure.phabricator.com/D15389
Summary:
Ref T9252. This primarily allows Harbormaster to request (and Drydock to fulfill) working copies with a patch from a staging area. Doing this means we can do builds on in-review changes from `arc diff`.
This is a little cobbled-together but should basically work.
Also fix some other issues:
- Yielded, awakend workers are fine to update but could complain.
- We can't log slot lock failures to resources if we don't end up saving them.
- Killing the transaction would wipe out the log.
- Fix some TODOs, etc.
Test Plan: Ran Harbormaster builds on a local revision.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14214
Summary:
Ref T9252. Long ago you sometimes manually created resources, so they had human-enterable names. However, users never make resources manually any more, so this field isn't really useful any more.
In particular, it means we write a lot of untranslatable strings like "Working Copy" to the database in the default locale. Instead, do the call at runtime so resource names are translatable.
Also clean up a few minor things I hit while kicking the tires here.
It's possible we might eventually want to introduce a human-choosable label so you can rename your favorite resources and this would just be a default name. I don't really have much of a use case for that yet, though, and I'm not sure there will ever be one.
Test Plan:
- Restarted a Harbormaster build, got a clean build.
- Released all leases/resources, restarted build, got a clean build with proper resource names.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14213
Summary: Ref T9252. If you have a blueprint and you do not like that blueprint very much, you can disable it.
Test Plan: Disabled / enabled some blueprints.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14156
Summary:
Ref T9252. Move these to the more modern stuff to pick up ordering and interface support for free.
Also work around the blueprint / custom field integration a little more gracefully.
Test Plan: Searched for blueprints, resources and leases.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14155
Summary: Ref T9252. Some leases or resources may need to remove data, tear down VMs, etc., during cleanup. After they are released, queue a "destroy" phase for performing teardown.
Test Plan:
- Used `bin/drydock lease ...` to create a working copy lease.
- Used `bin/drydock release-lease` and `bin/drydock release-resource` to release the lease and then the working copy and host.
- Saw working copy and host get destroyed and cleaned up properly.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569, T9252
Differential Revision: https://secure.phabricator.com/D14144
Summary:
Ref T9253. For resources and leases that need to do something which takes a lot of time or requires waiting, allow them to allocate/acquire first and then activate later.
When we allocate a resource or acquire a lease, the blueprint can either activate it immediately (if all the work can happen quickly/inline) or activate it later. If the blueprint activates it later, we queue a worker to handle activating it.
Rebuild the "working copy" blueprint to work with this model: it allocates/acquires and activates in a separate step, once it is able to acquire a host.
Test Plan: With some power of imagination, brought up a bunch of working copies with `bin/drydock lease --type working-copy ...`
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14127
Summary: Ref T9253. Provide a meaningful command interface for Almanac hosts.
Test Plan:
Configued and leased a real host (`sbuild001.phacility.net`) and ran a command on it.
```
$ ./bin/drydock command --lease 90 -- ls /
bin
boot
core
dev
etc
home
initrd.img
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
vmlinuz
```
Reviewers: chad, hach-que
Reviewed By: chad, hach-que
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14126
Summary:
See discussion in D10304. There's a lot of context there, but the general idea is:
- Blueprints should manage locks in a granular way during the actual allocation/acquisition phase.
- Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases.
The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work:
```
...
->needSlotLock("mylock(PHID-XYZQ-...)")
...
```
When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do.
If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this:
```
mylock(whatever).slot(2)
```
Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks.
This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing.
(The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.)
Test Plan:
To create a race where the same binding is allocated as a resource twice:
- Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated.
- (Comment out slot lock acquisition if you have this patch.)
- Run `bin/drydock lease ...` in two windows, within 10 seconds of one another.
This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free.
To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Differential Revision: https://secure.phabricator.com/D14118
Summary:
Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes.
This attempts to make language more consistent: resources are "allocated" and leases are "acquired".
This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do.
In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like:
$lease
->setActivateWhenAcquired(true)
->needSlotLock('x')
->needSlotLock('y')
->acquireOnResource($resource);
In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward.
This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes:
- The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode.
- The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster.
- This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks.
Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem).
The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier.
Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to.
This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself:
- A bunch of code from that (e.g., interfaces) doesn't exist yet.
- I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing.
This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct.
NOTE: This technically works but doesn't do anything useful yet.
Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Subscribers: Mnkras
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14117
Summary:
Ref T9253. The Drydock allocator is very pseudocodey right now. Particularly, it was written before Blueprints were concrete.
Reorganize it to make its responsibilities and error handling behaviors more clear.
In particular, the Allocator does not manage locks. It's primarily trying to reject allocations which can not possibly work. Blueprints are responsible for locks. See some discussion in D10304.
NOTE: This code probably doesn't work as written, see future diffs.
Test Plan: See future diffs.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Subscribers: cburroughs
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14114
Summary: Ref T2015. This allows searching based on blueprints, resources or leases when viewing the logs, which helps when searching for events that occured to a particular blueprint / resource / lease. Unlike the logs shown on the resource / lease pages, the search engine supports paging properly, which means it can be used to find entries in the past.
Test Plan: Used the Drydock log search page.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: joshuaspence, Korvin, epriestley
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D10874
Summary: Fixes T6693.
Test Plan:
Made a bunch of comments on a diff with differential, being sure to leave inlines here and there. This reproduced the issue in T6693. With this patch this issue no longer reproduces!
Successfully "showed older changes" in Maniphest too.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, epriestley
Maniphest Tasks: T6693
Differential Revision: https://secure.phabricator.com/D10931
Summary:
Ref T1191. Notable:
- Allowed objects to remove default columns (some feed tables have no `id`).
- Added a "note" severity and moved all the charset stuff down to that to make progress more clear.
Test Plan:
Trying to make the whole thing blue...
{F205970}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10519
Summary: Ref T5655. Rename `PhabricatorPHIDType` subclasses for clarity (see discussion in D9839). I'm not too keen on some of the resulting class names, so feel free to suggest alternatives.
Test Plan: Ran unit tests.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin, hach-que
Maniphest Tasks: T5655
Differential Revision: https://secure.phabricator.com/D9986
Summary: Ref T5655. Some discussion in D9839. Generally speaking, `Phabricator{$name}Application` is clearer than `PhabricatorApplication{$name}`.
Test Plan:
# Pinned and uninstalled some applications.
# Applied patch and performed migrations.
# Verified that the pinned applications were still pinned and that the uninstalled applications were still uninstalled.
# Performed a sanity check on the database contents.
Reviewers: btrahan, epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: hach-que, epriestley, Korvin
Maniphest Tasks: T5655
Differential Revision: https://secure.phabricator.com/D9982
Summary: Ref T2015. Allow configuration of default edit/view policies for blueprints. Add create policy. Remove administrative exception in policies.
Test Plan: Configured these settings and created (or, with a restrictive create setting, tried to create) blueprints.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7921
Summary:
Ref T2015. Adds human-readable names to Drydock blueprints.
Also the new patches stuff is so much nicer.
Test Plan: Edited, created, and reviewed migrated blueprints.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7918
Summary:
Ref T2015. These never got updated to the new stuff, move them out of the old `Constants` class and let them load handles, etc.
Also some half-cleanup of some Blueprint/BlueprintImplementation stuff.
Test Plan: Used `phid.query` to query a Resource, Lease, and Blueprint.
Reviewers: btrahan
Reviewed By: btrahan
CC: hach-que, aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7828
Summary:
//(this diff used to be about applying policies to blueprints)//
This restructures Drydock so that blueprints are instances in the DB, with an associated implementation class. Thus resources now have a `blueprintPHID` instead of `blueprintClass` and DrydockBlueprint becomes a DAO. The old DrydockBlueprint is renamed to DrydockBlueprintImplementation, and the DrydockBlueprint DAO has a `blueprintClass` column on it.
This now just implements CAN_VIEW and CAN_EDIT policies for blueprints, although they are probably not enforced in all of the places they could be.
Test Plan: Used the `create-resource` and `lease` commands. Closed resources and leases in the UI. Clicked around the new and old lists to make sure everything is still working.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley
CC: Korvin, epriestley, aran
Maniphest Tasks: T4111, T2015
Differential Revision: https://secure.phabricator.com/D7638