Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Sweepable API #5993

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
@LittleLittleCloud
Copy link
Contributor

@LittleLittleCloud LittleLittleCloud commented Nov 2, 2021

This is the initial design/proposal doc for a Sweepable API.

I would appreciate it if you could each review it and give me your feedback.

@codecov
Copy link

@codecov codecov bot commented Nov 2, 2021

Codecov Report

No coverage uploaded for pull request base (main@b31b7ca). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #5993   +/-   ##
=======================================
  Coverage        ?   68.23%           
=======================================
  Files           ?     1143           
  Lines           ?   242869           
  Branches        ?    25388           
=======================================
  Hits            ?   165721           
  Misses          ?    70449           
  Partials        ?     6699           
Flag Coverage Δ
Debug 68.23% <0.00%> (?)
production 62.89% <0.00%> (?)
test 88.62% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Loading

@@ -0,0 +1,162 @@
# AutoML.Net Sweepable API proposal
Copy link
Contributor

@JakeRadMSFT JakeRadMSFT Nov 3, 2021

@ericstj @eerhardt @michaelgsharp @justinormont

Hey All!
This proposal is to bring over the lower layers of our tooling's AutoML. All of the techniques that we worked with MSR to use in tooling (NNI, FLAML) are built on top of this.

@LittleLittleCloud is proposing that we bring over these lower layers first and then bring over the actual NNI+FLAML based AutoML next or upgrade AutoMl.NET to be the NNI+FLAML based approach. I think it's possible we could keep or nearly keep the same AutoMl.NET API.

I'm open to keeping these layers internal if we need to ... but sweeping is a common thing to do in ML world. We've had several community members come up with their own ways to sweep over models and parameters. This would just expose our method of doing it.

The end goal is to only have one ML.NET AutoML.

Loading

Copy link
Contributor

@JakeRadMSFT JakeRadMSFT Nov 3, 2021

@jwood803 ( I saw your like ) we'd love to hear your feedback by as well.

Loading

Copy link
Contributor

@jwood803 jwood803 Nov 4, 2021

Hey, @JakeRadMSFT. I definitely like having this API! This helps make it more on part with scikit-learn that has grid search. Plus, I believe this will help model creators get better models.

Loading

Copy link
Member

@eerhardt eerhardt Nov 4, 2021

I'm open to keeping these layers internal if we need to

My suggestion would be to start with them internal, and ensure it can meet the existing AutoML scenarios. Then make them public as we go when we have data that supports they are needed to be public.

The end goal is to only have one ML.NET AutoML.

💯

Loading

Copy link
Contributor Author

@LittleLittleCloud LittleLittleCloud Nov 8, 2021

Sounds good, if we want to start from internal first, a smooth start without breaking change can be search space, either through package reference or source code. The reasons are

  • search space has the least dependency. It only depends on newtonsoft, which makes it easy to introduce.
  • after introducing, we can use it to create search space for existing trainers in AutoML.Net. Which provides us a chance to improve the performance of AutoML.Net experiments by using larger, better search space in hpo as well.

Loading

Copy link
Contributor

@justinormont justinormont Nov 8, 2021

Sounds like a fine plan, including keeping as internal for a bit.

If it fits within your methods, I'd recommend having the default sweeping space { range, scaling, data type } as part of the component being swept (as current params). For AutoML․NET, due to non-great reasons, we kept a duplicate copy. Traveling with the component is more clean since the ranges show up (and disappear) with the availability of the component. For instance, as the user includes the NuGet for FastTree, the ranges are immediately available. In addition the ranges can be created inline with the new component.

Some historical perspective, there is also a command-line sweeper in ML․NET, which may be working (anything untested is assumed non-functional). This does let users sweep an individual (or sets of) components including fully switching out portions of the pipeline. The MAML command-line sweeper was very powerful, but hard to use. Mentioned previously: #5019 (comment)

Loading

```csharp
public class Option
{
[Range(2, 32768, init: 2, logBase: true)]
Copy link
Member

@eerhardt eerhardt Nov 4, 2021

Do we need to have a reflection based solution to start? Or would the "weakly-typed" API solve most of the scenarios:

var ss = new SearchSpace();
ss.Add("WindowSize", new UniformIntOption(2, 32768, true, 2));
ss.Add("SeriesLength", new ChoiceOption(2,3,4));
ss.Add("UseSoftmax", new ChoiceOption(true, false));
ss.Add("AnotherOption", ss.Clone())

Loading

Copy link
Member

@eerhardt eerhardt Nov 4, 2021

The reason I ask is: I would not add multiple ways to do something right away. Instead, make the "core" thing first, and the simpler ones can be built later, if necessary.

Loading

Copy link
Contributor Author

@LittleLittleCloud LittleLittleCloud Nov 8, 2021

The reflection API (let's call it strong-typed correspond with "weakly-typed") is just a handy way for creating search space and it's built on top of "weakly-typed". So if we must make a choice at the beginning I would pick "weakly-typed" to implement/migrate first.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment