Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solar sample #688

Closed
wants to merge 7 commits into from
Closed

solar sample #688

wants to merge 7 commits into from

Conversation

@moonlanderr
Copy link
Collaborator

moonlanderr commented May 13, 2020

solar energy prediction


Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

  • All imports are in the first cell? First block of imports are standard libraries, second block are 3rd party libraries, third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
  • All GIS object instantiations are one of the following?
    • gis = GIS()
    • gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123')
    • gis = GIS(profile="your_online_profile")
    • gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')
    • gis = GIS(profile="your_enterprise_portal")
  • If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
  • If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
  • Code refactored & split out across multiple cells, useful comments?
  • Consistent voice/tense/narrative style? Thoroughly checked for typos?
  • All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
  • All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
  • IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @ DavidJVitale so he can add it to the list for the next deploy
Supratim Banik
@review-notebook-app
Copy link

review-notebook-app bot commented May 13, 2020

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@priyankatuteja
Copy link
Collaborator

priyankatuteja commented May 21, 2020

@guneetmutreja can you do the first round of review for this?

@guneetmutreja guneetmutreja self-requested a review May 21, 2020
@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:02Z
----------------------------------------------------------------

Accessing & Visualizing the datasets

1 — Fully Connected Network (FCN)

FCN Model Result Visualization

ML Model Result Visualization

These contents in the TOC does not match with headings in the NB below.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:03Z
----------------------------------------------------------------

Some grammatical changes fixes:

Recently there has been a great emphasis on reducing carbon footprint by moving away from fossil fuel to renewable energy sources for running our cities. Various local city governments across the world like in this case the City of Calgary in Canada is leading this change by becoming energy independent by installing solar power plants either on the rooftop or within the site area of their city utilities for running its operation. In view of the scenario here is a notebook that would compute the amount of energy a solar power plant would produce using weather variables at any such site and subsequently estimate the total capacity of the power plant required to satisfy its daily need.

Given a location in latitude and longitude, this notebook can predict the daily hence annual solar energy generation by a solar power station at the site. The hypothesis is that various weather parameters such as temperature, wind speed, vapor pressure, solar radiation, day length, precipitation, snowfall along with altitude of a place would impact the generation of solar energy for a certain day.

Accordingly, these variables are used to train a model on actual solar power generated by solar stations located in Calgary, Canada, which could then be used to predict solar generation for probable solar plants at other locations. Besides the total energy generation would also depend on the capacity of the solar station established. For example, a 100kwp solar plant will generate more energy than a 50kwp plant, hence for the final output, the capacity of the plant is to be taken into consideration.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:04Z
----------------------------------------------------------------

Some grammatical changes.

Out of the several solar photovoltaic power plants in the City of Calgary, 11 were selected for the study. The dataset contains two components:

1) Daily solar energy production for each power plant from September 2015 to December 2019.

2) Corresponding daily weather measurements for the given sites.

The datasets were obtained from multiple sources as mentioned here (Data resources) and preprocessed to obtain the main dataset used here. Two feature layer was subsequently created out of them.

The hyperlink to "Data Resources" does not take to intended location.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:05Z
----------------------------------------------------------------

Please add screenshot of the map.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:05Z
----------------------------------------------------------------

Some grammatical changes:

In the above table each row represents each day starting from September 2015 to December 2019, with the corresponding dates shown in the field Field1, and the field solar_plan gives names of the solar sites.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:06Z
----------------------------------------------------------------

Please add screenshot of the map.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:07Z
----------------------------------------------------------------

Once the training and the validation dataset is processed and analyzed, it is ready to be used for modeling. In this sample two types of methodology are used for modeling:

1) Fully Connected Network - First a deep learning framework called Fully Connected Network (fcn) available in the arcgis.learn module in ArcGIS API for Python is used.

2) Machine Learning Model - In the second option, one of the machine learning algorithms from scikit learn will be implemented via the MLModel framework available in arcgis.learn. This framework can deploy any ML algorithm from the scikit learn library just by passing the name of the algorithm and its relevant parameters as keyword arguments.

Finally, performance between the two methods will be compared in terms of model training and validation accuracy.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:08Z
----------------------------------------------------------------

First, a list is made consisting of the feature data that will be used for predicting daily solar energy generation. By default, it will receive a continuous variable, while in case of a categorical variable the true value should be passed inside a tuple along with the variable.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:09Z
----------------------------------------------------------------

Here the suggested learning rate by the lr_find method was around 0.000575. The automatic lr_finder will take a conservative estimate of the learning rate, but some experts can interpret the graph more appropriately and find a better learning rate to be used for final training of the model.  


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 8, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-08T11:21:10Z
----------------------------------------------------------------

In the above table, the predicted values by the model on the test set in the last column named prediction_results and the actual values in the column named capacity_f of the target variable are highly similar.

Accordingly, the model metrics of the trained model is now estimated as follows: the mean absolute error score and r-square of the model fit is checked for the trained model.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T05:26:03Z
----------------------------------------------------------------

In the above table the predicted values by the model on the test set in the last column named prediction_results and the actual values in the column named capacity_f of the target variable are highly similar.

Accordingly, the model metrics of the trained model is now estimated as follows: the mean absolute error score and r-square of the model fit is checked for the trained model.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T05:26:04Z
----------------------------------------------------------------

  • I faced
KeyError: "['prediction'] not in index"

in the line: test_pred_datetime = test_pred_layer_sdf[['Field1','capacity_f','prediction']].copy()

  • Can you please check once.

Also, it will be good if we can merge some functions in one line itself like the two lines:

test_pred_datetime = test_pred_datetime.drop(['date','capacity_f','prediction'], axis=1)

test_pred_datetime = test_pred_datetime.sort_index() 

can be merged into one as:

test_pred_datetime = test_pred_datetime.drop(['date','capacity_f','prediction'], axis=1).sort_index() 


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:58:59Z
----------------------------------------------------------------

Can we mentioned the variable symbols also with their names in brackets in the above line like:

The plot shows that variable of shortwave radiation per meter square (srad__W_m_) ....


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:00Z
----------------------------------------------------------------

Can we have a little more detail of this here? I feel this is the first sample featuring FCN so a little more detail will be required.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:00Z
----------------------------------------------------------------

Please reduce the comment length or make it in next line to avoid scroll in the cells


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:01Z
----------------------------------------------------------------

Are we checking MAE also? If so, please add that too as the code below has only R-squared value


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:02Z
----------------------------------------------------------------

# the model.score method from the tabular learner returns mean squared error


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:02Z
----------------------------------------------------------------

"The table above returns the predicted values for the Southland photovoltaic power plant stored in the field called prediction which has the model estimated daily capacity factor of energy generation, whereas the actual capacity factor is in the field named capacity_f. "

I saw prediction_results instead of prediction.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:03Z
----------------------------------------------------------------

It seems seaborn does not come as default package with installation.

I would including a cell with the command to install seaborn just before these lines:

conda install -c anaconda seaborn


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:04Z
----------------------------------------------------------------

Similar to the data preparation process for the neural network, first a list is made consisting of the feature data that will be used for predicting daily solar energy generation. By default, it will receive continuous variable, otherwise for a categorical variable the true value should be passed inside a tuple along with the variable. These variables are then transformed by the RobustScaler function from scikit learn by passing it along with the variable list into the column transformer function as follows:


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:05Z
----------------------------------------------------------------

The input parameters required for the tool are similar to the ones mentioned previously :


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:06Z
----------------------------------------------------------------

Finally, the model is now ready for training, and the model.fit method is used for fitting the machine learning model with its defined parameters mentioned in the previous step.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:07Z
----------------------------------------------------------------

Subsequently the model metrics of the trained model is now estimated as follows: the mean absolute error score and r-square of the model fit is checked for the trained model. Currently the model.score() function gives the r-square, while the mean squared error is obtained using scikit learn metrics.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:08Z
----------------------------------------------------------------

The low MSE and high r-square value indicates that the model has been trained well, and as well this model achieved a higher r-square and a lower MSE compared to the previous fully connected network model.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:08Z
----------------------------------------------------------------

The trained RandomForestRegressor model implemented via the MLModel will now be used to predict the daily lifetime solar energy generation for the solar plant installed at the Southland Leisure Centre similarly since it was installed during 2015. The aim is to compare and validate its performance as obtained by the FCN model previously.


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 9, 2020

View / edit / reply to this conversation on ReviewNB

guneetmutreja commented on 2020-06-09T06:59:09Z
----------------------------------------------------------------

This shows error for me:

KeyError: "['prediction'] not in index"

Request you to have a look.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:47Z
----------------------------------------------------------------

In the plots above, it can be seem that each of the variables has a high seasonality, and it seems that there is a relationship between the dependent variable kWh_filled and the explanatory variables. As such, a correlation plot should be created to check the correlation between the variables.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:48Z
----------------------------------------------------------------

The resulting correlation plot shows that the variable of shortwave radiation per meter square (sradW_m_) has the largest correlation with the dependent variable of total solar energy produced expressed in terms of capacity factor (capacity_f). This is followed by the variable of day length (dayls_), as longer days are likely to produce more solar energy. These two are closely followed by max (tmaxdeg) and min (tmindeg) daily temperatures, and lastly the remaining variables with weaker correlation values.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:48Z
----------------------------------------------------------------

The validation set consists of daily solar generation data from September 2015 to December 2019 for one solar site, known as Southland Leisure Centre, and will be used to validate the trained model.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:49Z
----------------------------------------------------------------

Model Building

Once the training and the validation datasets have been processed and analyzed, they are ready to be used for modeling.

In this sample, two methods are used for modeling:

1) FullyConnectedNetwork - First a deep learning framework called FullyConnectedNetwork ,available in the arcgis.learn module in ArcGIS API for Python, is used.

2) MLModel - In the second method, a regression model from scikit-learn is implemented via the MLModel framework in arcgis.learn. This framework can deploy any regression or classification model from the library by passing the name of the algorithm and its relevant parameters as keyword arguments.

Finally, performance between the two methods will be compared in terms of model training and validation accuracy.

Further details on FullyConnectedNetwork & MLModel are available here in the Deep Learning with ArcGIS section.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:50Z
----------------------------------------------------------------

This is an Artificial Neural Network model from the arcgis.learn module, which is used here for modeling.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:51Z
----------------------------------------------------------------

Data Preprocessing

First, a list is made that consists of the feature data that will be used for predicting daily solar energy generation. By default, it will receive continuous variables, and in the case of categorical variables, the True value should be passed inside a tuple along with the variable. In this example, all of the variables are continuous.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:51Z
----------------------------------------------------------------

Once the explanatory variables are identified, the main preprocessing of the data is carried out by the prepare_tabulardata method from the arcgis.learn module in the ArcGIS API for Python. This function will take either a feature layer or a spatial dataframe containing the dataset as an input and will return a TabularDataObject that can then be fed into the model.

The input parameters required for the tool are:

  • input_features : feature layer or spatial dataframe containing the primary dataset
  • variable_predict : field name containing the y-variable from the input feature layer/dataframe
  • explanatory_variables : list of the field names as 2-sized tuples containing the explanatory variables as mentioned above


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:52Z
----------------------------------------------------------------

Once the data has been prepared by the prepare_tabulardata method, it is ready to be passed to the ANN for training. First, the ANN, known as FullyConnectedNetwork ,is imported from arcgis.learn and initialized as follows:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:53Z
----------------------------------------------------------------

Model Training

Finally, the model is now ready for training. To train the model, the model.fit is called and provided with the number of epochs for training and the estimated learning rate suggested by lr_find in the previous step:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:54Z
----------------------------------------------------------------

The train_loss an valid_loss fields are plotted to check whether the model is over-fitting. The resulting plot shows that the model has been trained well and that the losses are gradually decreasing, but not significantly.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:54Z
----------------------------------------------------------------

In the table above, the values predicted by the model when applied to the test set, prediction_results, are similar to the actual values of the test set, capacity_f.

As such, the model metrics of the trained model can now be estimated using the model.score function, which returns the r-squared measure of the fitted model.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:55Z
----------------------------------------------------------------

Solar Energy Generation Forecast & Validation

The trained model(FullyConnectedNetwork) will now be used to predict the daily lifetime solar energy generation for the solar plant installed at the Southland Leisure Centre, since its installation in 2015. The aim is to validate the trained model and measure its performance of solar output estimation using only weather variables from the Southland Leisure Center.

Accordingly, the model.predict method from arcgis.learn is used with the daily weather variables as input for the mentioned site, ranging from September 2015 to December 2019, to predict daily solar energy output in KWh for that same time period. The predictors are automatically chosen from the input feature layer of southland_layer by the trained model without mentioning them explicitly, since their names are exactly that same as those used to train the model.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:56Z
----------------------------------------------------------------

The table above returns the predicted values for the Southland photovoltaic power plant stored in the field called prediction_results , which holds the model estimated daily capacity factor of energy generation, whereas the actual capacity factor is in the field named capacity_f.

The capacity factor is a normalized value that will be rescaled back to the original unit of KWh by using the peak capacity of the Southland photovoltaic power plant of 153KWp.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:57Z
----------------------------------------------------------------

The comparison returns a high r-square of 0.86, showing a high similarity between the actual and predicted values.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:57Z
----------------------------------------------------------------

Summarizing the values, the actual average annual energy generated by the solar plant is 170.03 MWh, which is close to the predicted annual average generated energy of 170.08 MWh, indicating a high level of precision.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:58Z
----------------------------------------------------------------

In the plot above, the blue line represents the actual generation values, and the orange line represents the predicted generation values. The two show a high degree of overlap, indicating that the model has a high predictive capacity.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:59Z
----------------------------------------------------------------

2 - MLModel

In the second method, a machine learning model is applied to model the same data using the MLModel framework from arcgis.learn. This framework can be used to import and apply any machine learning model from the scikit-learn library on the data returned by the prepare_tabulardata function from arcgis.learn.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:23:59Z
----------------------------------------------------------------

Data Preprocessing

Like the data preparation process for the neural network, first a list is made consisting of the feature data that will be used for predicting daily solar energy generation. By default, it will receive continuous variables, whereas for categorical variables, the True value should be passed inside a tuple along with the variables. These variables are then transformed by the RobustScaler function from scikit-learn by passing it, along with the variable list, into the column transformer function as follows:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:00Z
----------------------------------------------------------------

Once the explanatory variables list is defined and the precrocessors are computed, they are now used as input for the prepare_tabulardata method in arcgis.learn. The method takes a feature layer or a spatial dataframe containing the dataset and returns a TabularDataObject that can be fed into the model.

The input parameters required for the tool are similar to the ones mentioned previously:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:01Z
----------------------------------------------------------------

Model Initialization

Once the data has been prepared by the prepare_tabulardata method, it is ready to be passed to the selected machine learning model for training. Here, the GradientBoostingRegressor model from scikit-learn is used, which is passed into the MLModel function, along with its parameters, as follows:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:02Z
----------------------------------------------------------------

In the table above, the last column, capacity_f_results, returns the values predicted by the model, which are similar to the actual values in the target variable column, capacity_f.

Subsequently, the model metrics of the trained model are now estimated using the model.score() function, which currently returns the r-squared of the model fit as follows:


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:03Z
----------------------------------------------------------------

Solar Energy Generation Forecast & Validation

The trained GradientBoostingRegressor model, implemented via the MLModel, will now be used to predict the daily lifetime solar energy generation for the solar plant installed at the Southland Leisure Centre, since its installation in 2015. The aim is to compare and validate its performance to the performance of the FullyConnectedNetwork model developed in earlier in this lesson.

To reiterate, the model.predict method from arcgis.learn is used with the daily weather variables as input for the mentioned site, ranging from September 2015 to December 2019, to predict daily solar energy output in KWh for the same time period. The predictors are automatically chosen from the input feature layer of southland_layer by the trained model, without mentioning them explicitly, as their names are the same as those used for training the model.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:04Z
----------------------------------------------------------------

The table above returns the MLModel predicted values for the Southland plant stored in the field prediction, while the actual capacity factor is stored in the field named capacity_f.

The capacity factor is a normalized value that will be rescaled back to the original unit of KWh by using the peak capacity of the Southland photovoltaic power plant of 153KWp.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:04Z
----------------------------------------------------------------

The comparison returns a high R-squared of 0.84, indicating a high similarity between the actual and predicted values.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:06Z
----------------------------------------------------------------

Summarizing the values, the actual average annual energy generated by the solar plant is 170.03 MWh, which is close to the predicted annual average generated energy of 171.48 Mwh. This indicates a high level of precision.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 6, 2020

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2020-08-06T22:24:06Z
----------------------------------------------------------------

The goal of this project was to create a model that could predict the daily solar energy efficiency, the actual output of photovoltaic solar energy, of a location using the daily weather variables of the site as inputs, thereby demonstrating the newly implemented artificial neural network of FullyConnectedNetwork and machine learning models, called MLModel, available in the arcgis.learn module in ArcGIS API for Python.

Accordingly, data from 10 solar energy installation sites in the City of Calgary in Canada were used to train two different models — the first being the FullyConnectedNetwork model and the second being the MLModel framework from the arcgis.learn module. These were eventually used to predict the daily solar output of a different solar plant in Calgary, which was withheld from the training set. The steps for implementing these models are elaborated on in the notebook, and include the steps of data preprocessing, model training, and final inferencing.

Comparison of the result shows that both models successfully predicted the solar energy output of the test solar plant with predicted values of 171.76 MWh and 171.51 MWh by the FullyConnectedNetwork and the MLModel algorithm respectively, compared to the actual value of average annual solar generation of 170.74 MWh for the station.

Finally, to expand on this model further in the furture, it would be interesting to apply this model to other solar generation plants located across different geographies and to record its performance to understand the generalizability of the model.


Copy link
Collaborator

BP-Ent left a comment

Suggested changes noted in ReviewNB.

@priyankatuteja
Copy link
Collaborator

priyankatuteja commented Sep 9, 2020

#782 duplicate

@priyankatuteja
Copy link
Collaborator

priyankatuteja commented Sep 9, 2020

@BP-Ent Thanks for the review. Your suggestions are incorporated and a new PR is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.