Proposal: Sweepable API #5993

LittleLittleCloud · 2021-11-02T21:02:55Z

This is the initial design/proposal doc for a Sweepable API.

I would appreciate it if you could each review it and give me your feedback.

codecov · 2021-11-02T22:41:21Z

Codecov Report

❗ No coverage uploaded for pull request base (main@b31b7ca). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #5993   +/-   ##
=======================================
  Coverage        ?   68.23%           
=======================================
  Files           ?     1143           
  Lines           ?   242869           
  Branches        ?    25388           
=======================================
  Hits            ?   165721           
  Misses          ?    70449           
  Partials        ?     6699

Flag	Coverage Δ
Debug	`68.23% <0.00%> (?)`
production	`62.89% <0.00%> (?)`
test	`88.62% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

JakeRadMSFT · 2021-11-03T18:03:55Z

docs/specs/ML.Net Sweepable API.md

@@ -0,0 +1,162 @@
+# AutoML.Net Sweepable API proposal


@ericstj @eerhardt @michaelgsharp @justinormont

Hey All!
This proposal is to bring over the lower layers of our tooling's AutoML. All of the techniques that we worked with MSR to use in tooling (NNI, FLAML) are built on top of this.

@LittleLittleCloud is proposing that we bring over these lower layers first and then bring over the actual NNI+FLAML based AutoML next or upgrade AutoMl.NET to be the NNI+FLAML based approach. I think it's possible we could keep or nearly keep the same AutoMl.NET API.

I'm open to keeping these layers internal if we need to ... but sweeping is a common thing to do in ML world. We've had several community members come up with their own ways to sweep over models and parameters. This would just expose our method of doing it.

The end goal is to only have one ML.NET AutoML.

@jwood803 ( I saw your like ) we'd love to hear your feedback by as well.

Hey, @JakeRadMSFT. I definitely like having this API! This helps make it more on part with scikit-learn that has grid search. Plus, I believe this will help model creators get better models.

I'm open to keeping these layers internal if we need to

My suggestion would be to start with them internal, and ensure it can meet the existing AutoML scenarios. Then make them public as we go when we have data that supports they are needed to be public.

The end goal is to only have one ML.NET AutoML.

💯

Sounds good, if we want to start from internal first, a smooth start without breaking change can be search space, either through package reference or source code. The reasons are

search space has the least dependency. It only depends on newtonsoft, which makes it easy to introduce.

after introducing, we can use it to create search space for existing trainers in AutoML.Net. Which provides us a chance to improve the performance of AutoML.Net experiments by using larger, better search space in hpo as well.

Sounds like a fine plan, including keeping as internal for a bit.

If it fits within your methods, I'd recommend having the default sweeping space { range, scaling, data type } as part of the component being swept (as current params). For AutoML․NET, due to non-great reasons, we kept a duplicate copy. Traveling with the component is more clean since the ranges show up (and disappear) with the availability of the component. For instance, as the user includes the NuGet for FastTree, the ranges are immediately available. In addition the ranges can be created inline with the new component.

Some historical perspective, there is also a command-line sweeper in ML․NET, which may be working (anything untested is assumed non-functional). This does let users sweep an individual (or sets of) components including fully switching out portions of the pipeline. The MAML command-line sweeper was very powerful, but hard to use. Mentioned previously: #5019 (comment)

eerhardt · 2021-11-04T22:58:48Z

docs/specs/ML.Net Sweepable API.md

+```csharp
+public class Option
+{
+    [Range(2, 32768, init: 2, logBase: true)]


Do we need to have a reflection based solution to start? Or would the "weakly-typed" API solve most of the scenarios:

var ss = new SearchSpace(); ss.Add("WindowSize", new UniformIntOption(2, 32768, true, 2)); ss.Add("SeriesLength", new ChoiceOption(2,3,4)); ss.Add("UseSoftmax", new ChoiceOption(true, false)); ss.Add("AnotherOption", ss.Clone())

The reason I ask is: I would not add multiple ways to do something right away. Instead, make the "core" thing first, and the simpler ones can be built later, if necessary.

The reflection API (let's call it strong-typed correspond with "weakly-typed") is just a handy way for creating search space and it's built on top of "weakly-typed". So if we must make a choice at the beginning I would pick "weakly-typed" to implement/migrate first.

Add files via upload

8b0cdaa

LittleLittleCloud added the AutoML.NET label Nov 2, 2021

LittleLittleCloud requested review from JakeRadMSFT, michaelgsharp, luisquintanilla, eerhardt and ericstj Nov 2, 2021

JakeRadMSFT reviewed Nov 3, 2021

View changes

JakeRadMSFT mentioned this pull request Nov 4, 2021

Proposal: AutoML Sweepable API #5992

Closed

eerhardt reviewed Nov 4, 2021

View changes

dotnet / machinelearning Public

Proposal: Sweepable API #5993

Proposal: Sweepable API #5993

LittleLittleCloud commented Nov 2, 2021 •

edited by JakeRadMSFT

Loading

codecov bot commented Nov 2, 2021

JakeRadMSFT Nov 3, 2021 •

edited

Loading

JakeRadMSFT Nov 3, 2021 •

edited

Loading

jwood803 Nov 4, 2021

eerhardt Nov 4, 2021

LittleLittleCloud Nov 8, 2021

justinormont Nov 8, 2021

eerhardt Nov 4, 2021

eerhardt Nov 4, 2021

LittleLittleCloud Nov 8, 2021

dotnet / machinelearning Public

Proposal: Sweepable API #5993

Are you sure you want to change the base?

Proposal: Sweepable API #5993

Conversation

LittleLittleCloud commented Nov 2, 2021 • edited by JakeRadMSFT Loading

codecov bot commented Nov 2, 2021

Codecov Report

JakeRadMSFT Nov 3, 2021 • edited Loading

JakeRadMSFT Nov 3, 2021 • edited Loading

jwood803 Nov 4, 2021

eerhardt Nov 4, 2021

LittleLittleCloud Nov 8, 2021

justinormont Nov 8, 2021

eerhardt Nov 4, 2021

eerhardt Nov 4, 2021

LittleLittleCloud Nov 8, 2021

LittleLittleCloud commented Nov 2, 2021 •

edited by JakeRadMSFT

Loading

JakeRadMSFT Nov 3, 2021 •

edited

Loading

JakeRadMSFT Nov 3, 2021 •

edited

Loading