dotnet / machinelearning Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Sweepable API #5993
base: main
Are you sure you want to change the base?
Proposal: Sweepable API #5993
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5993 +/- ##
=======================================
Coverage ? 68.23%
=======================================
Files ? 1143
Lines ? 242869
Branches ? 25388
=======================================
Hits ? 165721
Misses ? 70449
Partials ? 6699
Flags with carried forward coverage won't be shown. Click here to find out more. |
| @@ -0,0 +1,162 @@ | |||
| # AutoML.Net Sweepable API proposal | |||
@ericstj @eerhardt @michaelgsharp @justinormont
Hey All!
This proposal is to bring over the lower layers of our tooling's AutoML. All of the techniques that we worked with MSR to use in tooling (NNI, FLAML) are built on top of this.
@LittleLittleCloud is proposing that we bring over these lower layers first and then bring over the actual NNI+FLAML based AutoML next or upgrade AutoMl.NET to be the NNI+FLAML based approach. I think it's possible we could keep or nearly keep the same AutoMl.NET API.
I'm open to keeping these layers internal if we need to ... but sweeping is a common thing to do in ML world. We've had several community members come up with their own ways to sweep over models and parameters. This would just expose our method of doing it.
The end goal is to only have one ML.NET AutoML.
@jwood803 ( I saw your like ) we'd love to hear your feedback by as well.
Hey, @JakeRadMSFT. I definitely like having this API! This helps make it more on part with scikit-learn that has grid search. Plus, I believe this will help model creators get better models.
I'm open to keeping these layers internal if we need to
My suggestion would be to start with them internal, and ensure it can meet the existing AutoML scenarios. Then make them public as we go when we have data that supports they are needed to be public.
The end goal is to only have one ML.NET AutoML.
Sounds good, if we want to start from internal first, a smooth start without breaking change can be search space, either through package reference or source code. The reasons are
- search space has the least dependency. It only depends on newtonsoft, which makes it easy to introduce.
- after introducing, we can use it to create search space for existing trainers in AutoML.Net. Which provides us a chance to improve the performance of AutoML.Net experiments by using larger, better search space in hpo as well.
Sounds like a fine plan, including keeping as internal for a bit.
If it fits within your methods, I'd recommend having the default sweeping space { range, scaling, data type } as part of the component being swept (as current params). For AutoML․NET, due to non-great reasons, we kept a duplicate copy. Traveling with the component is more clean since the ranges show up (and disappear) with the availability of the component. For instance, as the user includes the NuGet for FastTree, the ranges are immediately available. In addition the ranges can be created inline with the new component.
Some historical perspective, there is also a command-line sweeper in ML․NET, which may be working (anything untested is assumed non-functional). This does let users sweep an individual (or sets of) components including fully switching out portions of the pipeline. The MAML command-line sweeper was very powerful, but hard to use. Mentioned previously: #5019 (comment)
| ```csharp | ||
| public class Option | ||
| { | ||
| [Range(2, 32768, init: 2, logBase: true)] |
Do we need to have a reflection based solution to start? Or would the "weakly-typed" API solve most of the scenarios:
var ss = new SearchSpace();
ss.Add("WindowSize", new UniformIntOption(2, 32768, true, 2));
ss.Add("SeriesLength", new ChoiceOption(2,3,4));
ss.Add("UseSoftmax", new ChoiceOption(true, false));
ss.Add("AnotherOption", ss.Clone())The reason I ask is: I would not add multiple ways to do something right away. Instead, make the "core" thing first, and the simpler ones can be built later, if necessary.
The reflection API (let's call it strong-typed correspond with "weakly-typed") is just a handy way for creating search space and it's built on top of "weakly-typed". So if we must make a choice at the beginning I would pick "weakly-typed" to implement/migrate first.
This is the initial design/proposal doc for a Sweepable API.
I would appreciate it if you could each review it and give me your feedback.
The text was updated successfully, but these errors were encountered: