Skip to main content
Log in

Convex and concave envelopes of artificial neural network activation functions for deterministic global optimization

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this work, we present general methods to construct convex/concave relaxations of the activation functions that are commonly chosen for artificial neural networks (ANNs). The choice of these functions is often informed by both broader modeling considerations balanced with a need for high computational performance. The direct application of factorable programming techniques to compute bounds and convex/concave relaxations of such functions often lead to weak enclosures due to the dependency problem. Moreover, the piecewise formulation that defines several popular activation functions, prevents the computation of convex/concave relaxations as they violate the factorable function requirement. To improve the performance of relaxations of ANNs for deterministic global optimization applications, this study presents the development of a library of envelopes of the thoroughly studied rectifier-type and sigmoid activation functions, in addition to the novel self-gated sigmoid-weighted linear unit (SiLU) and Gaussian error linear unit activation functions. We demonstrate that the envelopes of activation functions directly lead to tighter relaxations of ANNs on their input domain. In turn, these improvements translate to a dramatic reduction in CPU runtime required for solving optimization problems involving ANN models to epsilon-global optimality. We further demonstrate that the factorable programming approach leads to superior computational performance over alternative state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The datasets generated and analysed during the current study are available in the Github repository, https://github.com/PSORLab/RSActivationFunctions.

References

  1. Kahrs, O., Marquardt, W.: The validity domain of hybrid models and its application in process optimization. Chem. Eng. Process. 46(11), 1054–1066 (2007). https://doi.org/10.1016/j.cep.2007.02.031

    Article  Google Scholar 

  2. Henao, C.A., Maravelias, C.T.: Surrogate-based superstructure optimization framework. AIChE J. 57(5), 1216–1232 (2010). https://doi.org/10.1002/aic.12341

    Article  Google Scholar 

  3. Schweidtmann, A.M., Mitsos, A.: Deterministic global optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180(3), 925–948 (2018). https://doi.org/10.1007/s10957-018-1396-0

    Article  MathSciNet  MATH  Google Scholar 

  4. Williams, C., Rasmussen, C.: Gaussian processes for regression. Adv. Neural. Inf. Process. Syst. 8, 514–520 (1995). https://doi.org/10.5555/2998828.2998901

    Article  Google Scholar 

  5. Caballero, J.A., Grossmann, I.E.: An algorithm for the use of surrogate models in modular flowsheet optimization. AIChE J. 54(10), 2633–2650 (2008). https://doi.org/10.1002/aic.11579

    Article  Google Scholar 

  6. Schweidtmann, A.M., Bongartz, D., Grothe, D., Kerkenhoff, T., Lin, X., Najman, J., Mitsos, A.: Deterministic global optimization with gaussian processes embedded. Math. Program. Comput. 13(3), 553–581 (2021). https://doi.org/10.1007/s12532-021-00204-y

    Article  MathSciNet  MATH  Google Scholar 

  7. Schweidtmann, A.M., Weber, J.M., Wende, C., Netze, L., Mitsos, A.: Obey validity limits of data-driven models through topological data analysis and one-class classification. Optim. Eng. (2021). https://doi.org/10.1007/s11081-021-09608-0

    Article  MATH  Google Scholar 

  8. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/

  9. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pp. 8024–8035. Curran Associates, Inc., Vancouver (2019)

  10. Fahmi, I., Cremaschi, S.: Process synthesis of biodiesel production plant using artificial neural networks as the surrogate models. Comput. Chem. Eng. 46, 105–123 (2012). https://doi.org/10.1016/j.compchemeng.2012.06.006

    Article  Google Scholar 

  11. Nagata, Y., Chu, K.H.: Optimization of a fermentation medium using neural networks and genetic algorithms. Biotech. Lett. 25(21), 1837–1842 (2003). https://doi.org/10.1023/a:1026225526558

    Article  Google Scholar 

  12. Anna, H.R.S., Barreto, A.G., Tavares, F.W., de Souza, M.B.: Machine learning model and optimization of a PSA unit for methane-nitrogen separation. Comput. Chem. Eng. 104, 377–391 (2017). https://doi.org/10.1016/j.compchemeng.2017.05.006

    Article  Google Scholar 

  13. Dornier, M., Decloux, M., Trystram, G., Lebert, A.M.: Interest of neural networks for the optimization of the crossflow filtration process. LWT Food Sci. Technol. 28(3), 300–309 (1995). https://doi.org/10.1016/s0023-6438(95)94364-1

    Article  Google Scholar 

  14. Nascimento, C.A.O., Giudici, R.: Neural network based approach for optimisation applied to an industrial nylon-6,6 polymerisation process. Comput. Chem. Eng. 22, 595–600 (1998). https://doi.org/10.1016/s0098-1354(98)00105-7

    Article  Google Scholar 

  15. Hussain, M.A.: Review of the applications of neural networks in chemical process control—simulation and online implementation. Artif. Intell. Eng. 13(1), 55–68 (1999). https://doi.org/10.1016/s0954-1810(98)00011-9

    Article  Google Scholar 

  16. Onel, M., Kieslich, C.A., Pistikopoulos, E.N.: A nonlinear support vector machine-based feature selection approach for fault detection and diagnosis: application to the tennessee eastman process. AIChE J. 65(3), 992–1005 (2019). https://doi.org/10.1002/aic.16497

    Article  Google Scholar 

  17. Seong, Y., Park, C., Choi, J., Jang, I.: Surrogate model with a deep neural network to evaluate gas–liquid flow in a horizontal pipe. Energies 13(4), 968 (2020). https://doi.org/10.3390/en13040968

    Article  Google Scholar 

  18. Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M.: Investigation of activation functions for generalized learning vector quantization. In: International Workshop on Self-Organizing Maps, pp. 179–188. Springer, Berlin (2019). https://doi.org/10.1007/978-3-030-19642-4_18

  19. Xu, L., Chen, C.P.: Comparison and combination of activation functions in broad learning system. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3537–3542 (2020). https://doi.org/10.1109/SMC42975.2020.9282871

  20. Nader, A., Azar, D.: Searching for activation functions using a self-adaptive evolutionary algorithm. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 145–146 (2020). https://doi.org/10.1145/3377929.3389942

  21. Cristina, G.N.M., Sanchez, V.G.C., Villegas, O.O.V., Nandayapa, M., Dominguez, H.d.J.O., Azuela, J.H.S.: Study of the effect of combining activation functions in a convolutional neural network. IEEE Lat. Am. Trans. 19(5), 844–852 (2021). https://doi.org/10.1109/TLA.2021.9448319

  22. Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23(3), 296–309 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  23. Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. Math. Program. 1–37 (2020)

  24. Kronqvist, J., Misener, R., Tsay, C.: Between steps: Intermediate relaxations between big-m and convex hull formulations. In: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 299–314. Springer, Berlin (2021)

  25. Tsay, C., Kronqvist, J., Thebelt, A., Misener, R.: Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

  26. Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed-integer linear programs. Comput. Chem. Eng. 131, 106580 (2019). https://doi.org/10.1016/j.compchemeng.2019.106580

    Article  Google Scholar 

  27. Schweidtmann, A.M., Huster, W.R., Lüthje, J.T., Mitsos, A.: Deterministic global process optimization: Accurate (single-species) properties via artificial neural networks. Comput. Chem. Eng. 121, 67–74 (2019). https://doi.org/10.1016/j.compchemeng.2018.10.007

    Article  Google Scholar 

  28. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM, Philadelpha (2009). https://doi.org/10.1137/1.9780898717716

  29. Sahlodin, A.M., Chachuat, B.: Convex/concave relaxations of parametric odes using Taylor models. Comput. Chem. Eng. 35(5), 844–857 (2011). https://doi.org/10.1016/j.compchemeng.2011.01.031

    Article  MATH  Google Scholar 

  30. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: part I—convex underestimating problems. Math. Program. 10(1), 147–175 (1976). https://doi.org/10.1007/bf01580665

    Article  MATH  Google Scholar 

  31. Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized McCormick relaxations. J. Global Optim. 51(4), 569–606 (2011). https://doi.org/10.1007/s10898-011-9664-7

    Article  MathSciNet  MATH  Google Scholar 

  32. Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20(2), 573–601 (2009). https://doi.org/10.1137/080717341

    Article  MathSciNet  MATH  Google Scholar 

  33. Stuber, M.D., Scott, J.K., Barton, P.I.: Convex and concave relaxations of implicit functions. Optim. Methods Softw. 30(3), 424–460 (2015). https://doi.org/10.1080/10556788.2014.924514

    Article  MathSciNet  MATH  Google Scholar 

  34. Yajima, Y.: Convex envelopes in optimization problems: Convex envelopes in optimization problems. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization, pp. 343–344. Springer, Boston (2001). https://doi.org/10.1007/0-306-48332-7_74

    Chapter  Google Scholar 

  35. Wilhelm, M.E., Gottlieb, R.X., Stuber, M.D.: PSORLab/McCormick.jl. Zenodo (2020). https://doi.org/10.5281/ZENODO.5749918. https://github.com/PSORLab/McCormick.jl

  36. Funahashi, K.-I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 6(6), 801–806 (1993). https://doi.org/10.1016/S0893-6080(05)80125-X

    Article  Google Scholar 

  37. Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017). https://doi.org/10.1088/1361-6420/aa9a90

    Article  MathSciNet  MATH  Google Scholar 

  38. Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: International Conference on Machine Learning, pp. 3276–3285 (2018). PMLR

  39. Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62(3), 352–364 (2020). https://doi.org/10.1007/s10851-019-00903-1

    Article  MathSciNet  MATH  Google Scholar 

  40. Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6572–6583 (2018). https://doi.org/10.5555/3327757.3327764

  41. Rackauckas, C., Innes, M., Ma, Y., Bettencourt, J., White, L., Dixit, V.: Diffeqflux.jl—A Julia library for neural differential equations. arXiv preprint arXiv:1902.02376 (2019)

  42. Scott, J.K., Barton, P.I.: Improved relaxations for the parametric solutions of odes using differential inequalities. J. Global Optim. 57(1), 143–176 (2013). https://doi.org/10.1007/s10898-012-9909-0

    Article  MathSciNet  MATH  Google Scholar 

  43. Scott, J.K., Chachuat, B., Barton, P.I.: Nonlinear convex and concave relaxations for the solutions of parametric odes. Optimal Control Appl. Methods 34(2), 145–163 (2013). https://doi.org/10.1002/oca.2014

    Article  MathSciNet  MATH  Google Scholar 

  44. Wilhelm, M.E., Le, A.V., Stuber, M.D.: Global optimization of stiff dynamical systems. AIChE J. (2019). https://doi.org/10.1002/aic.16836

    Article  Google Scholar 

  45. Song, Y., Khan, K.A.: Optimization-based convex relaxations for nonconvex parametric systems of ordinary differential equations. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01654-x

    Article  Google Scholar 

  46. El Ghaoui, L., Gu, F., Travacca, B., Askari, A., Tsai, A.: Implicit deep learning. SIAM J. Math. Data Sci. 3(3), 930–958 (2021). https://doi.org/10.1137/20M1358517

    Article  MathSciNet  MATH  Google Scholar 

  47. Celik, A.N., Kolhe, M.: Generalized feed-forward based method for wind energy prediction. Appl. Energy 101, 582–588 (2013). https://doi.org/10.1016/j.apenergy.2012.06.040

    Article  Google Scholar 

  48. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). https://proceedings.mlr.press/v9/glorot10a.html

  49. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006). https://doi.org/10.1016/j.neucom.2005.12.126

    Article  Google Scholar 

  50. Medsker, L., Jain, L.C.: Recurrent Neural Networks: Design and Applications, pp. 64–67. CRC Press, Boca Raton (1999). https://doi.org/10.1201/9781003040620

    Book  Google Scholar 

  51. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  52. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013)

    MATH  Google Scholar 

  53. Schweidtmann, A.M., Bongartz, D., Huster, W.R., Mitsos, A.: Deterministic global process optimization: Flash calculations via artificial neural networks. Comput. Aided Chem. Eng. 46, 937–942 (2019). https://doi.org/10.1016/b978-0-12-818634-3.50157-0

    Article  Google Scholar 

  54. Chachuat, B.C.: MC++: Toolkit for Construction, Manipulation and Bounding of Factorable Functions (2020). https://omega-icl.github.io/mcpp/

  55. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  56. Lu, L., Shin, Y., Su, Y., Karniadakis, G.E.: Dying ReLU and initialization: theory and numerical examples. Commun. Comput. Phys. 28(5), 1671–1706 (2020). https://doi.org/10.4208/cicp.OA-2020-0165

    Article  MathSciNet  MATH  Google Scholar 

  57. Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)

  58. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017). https://doi.org/10.5555/3294771.3294864

  59. Wilhelm, M.E., Stuber, M.D.: EAGO.jl: easy advanced global optimization in Julia. Optim. Methods Softw. 1, 1–26 (2020). https://doi.org/10.1080/10556788.2020.1786566

    Article  MATH  Google Scholar 

  60. Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Global Optim. 52(1), 1–28 (2011). https://doi.org/10.1007/s10898-011-9685-2

    Article  MathSciNet  MATH  Google Scholar 

  61. Kannan, R., Barton, P.I.: The cluster problem in constrained global optimization. J. Global Optim. 69(3), 629–676 (2017). https://doi.org/10.1007/s10898-017-0531-z

    Article  MathSciNet  MATH  Google Scholar 

  62. Ryoo, H.S., Sahinidis, N.V.: A branch-and-reduce approach to global optimization. J. Global Optim. 8(2), 107–138 (1996). https://doi.org/10.1007/bf00138689

    Article  MathSciNet  MATH  Google Scholar 

  63. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005). https://doi.org/10.1007/s10107-005-0581-8

    Article  MathSciNet  MATH  Google Scholar 

  64. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010). https://doi.org/10.5555/3104322.3104425

  65. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE, Santiago, Chile (2015). https://doi.org/10.1109/iccv.2015.123

  66. Eger, S., Youssef, P., Gurevych, I.: Is it time to swish? Comparing deep learning activation functions across NLP tasks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018). https://doi.org/10.18653/v1/d18-1472

  67. Zheng, H., Yang, Z., Liu, W., Liang, J., Li, Y.: Improving deep neural networks using softplus units. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–4 (2015). https://doi.org/10.1109/IJCNN.2015.7280459. IEEE

  68. Nwankpa, C.E., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning. In: 2nd International Conference on Computational Sciences and Technology, pp. 124–133 (2021)

  69. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8

    Article  MATH  Google Scholar 

  70. Elliott, D.L.: A better activation function for artificial neural networks. Technical report, Institute for Systems Research (1993). http://hdl.handle.net/1903/5355

  71. Sahlodin, A.M.: Global optimization of dynamic process systems using complete search methods. Ph.D. thesis, McMaster University (2013). https://macsphere.mcmaster.ca/handle/11375/12803

  72. Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint (2016) arXiv:1606.08415

  73. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012

    Article  Google Scholar 

  74. Elfwing, S., Uchibe, E., Doya, K.: Expected energy-based restricted Boltzmann machine for classification. Neural Netw. 64, 29–38 (2015). https://doi.org/10.1016/j.neunet.2014.09.006

    Article  MATH  Google Scholar 

  75. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint (2017) https://arxiv.org/abs/1710.05941

  76. Chen, J., Revels, J.: Robust benchmarking in noisy environments. arXiv e-prints arXiv:1608.04295 [cs.PF] (2016)

  77. Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Global Optim. 66(4), 597–628 (2016). https://doi.org/10.1007/s10898-016-0408-6

    Article  MathSciNet  MATH  Google Scholar 

  78. Du, K., Kearfott, R.B.: The cluster problem in multivariate global optimization. J. Global Optim. 5(3), 253–265 (1994). https://doi.org/10.1007/bf01096455

    Article  MathSciNet  MATH  Google Scholar 

  79. Wechsung, A., Schaber, S.D., Barton, P.I.: The cluster problem revisited. J. Global Optim. 58(3), 429–438 (2014). https://doi.org/10.1007/s10898-013-0059-9

    Article  MathSciNet  MATH  Google Scholar 

  80. Epperly, T.G.W., Pistikopoulos, E.N.: A reduced space branch and bound algorithm for global optimization. J. Global Optim. 11(3), 287–311 (1997). https://doi.org/10.1023/A:1008212418949

    Article  MathSciNet  MATH  Google Scholar 

  81. Stuber, M.D.: Evaluation of process systems operating envelopes. Ph.D. thesis, Massachusetts Institute of Technology (2012). https://doi.org/10.13140/2.1.1775.4409

  82. Wechsung, A.: Global optimization in reduced space. Ph.D. thesis, Massachusetts Institute of Technology (2014). https://dspace.mit.edu/handle/1721.1/87131

  83. Bongartz, D., Mitsos, A.: Deterministic global optimization of process flowsheets in a reduced space using McCormick relaxations. J. Global Optim. 69(4), 761–796 (2017). https://doi.org/10.1007/s10898-017-0547-4

    Article  MathSciNet  MATH  Google Scholar 

  84. Sahinidis, N.V.: BARON 21.1.13: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual (2017). https://www.minlp.com/downloads/docs/baron%20manual.pdf

  85. Misener, R., Floudas, C.A.: ANTIGONE: Algorithms for continuous/integer global optimization of nonlinear equations. J. Global Optim. 59(2–3), 503–526 (2014). https://doi.org/10.1007/s10898-014-0166-2

    Article  MathSciNet  MATH  Google Scholar 

  86. Bongartz, D., Najman, J., Sass, S., Mitsos, A.: MAiNGO: McCormick based algorithm for mixed integer nonlinear global optimization. Process Systems Engineering (AVT. SVT), RWTH Aachen University (2018). https://git.rwth-aachen.de/avt-svt/public/maingo

  87. Kearfott, R.B., Castille, J., Tyagi, G.: A general framework for convexity analysis in deterministic global optimization. J. Global Optim. 56(3), 765–785 (2013). https://doi.org/10.1007/s10898-012-9905-4

    Article  MathSciNet  MATH  Google Scholar 

  88. Khan, K.A., Watson, H.A.J., Barton, P.I.: Differentiable McCormick relaxations. J. Global Optim. 67(4), 687–729 (2016). https://doi.org/10.1007/s10898-016-0440-6

    Article  MathSciNet  MATH  Google Scholar 

  89. Khan, K.A., Wilhelm, M., Stuber, M.D., Cao, H., Watson, H.A.J., Barton, P.I.: Corrections to: Differentiable McCormick relaxations. J. Global Optim. 70(3), 705–706 (2018). https://doi.org/10.1007/s10898-017-0601-2

    Article  MathSciNet  MATH  Google Scholar 

  90. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). https://doi.org/10.1007/s101070100263

    Article  MathSciNet  MATH  Google Scholar 

  91. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017). https://doi.org/10.1137/141000671

    Article  MathSciNet  MATH  Google Scholar 

  92. Sanders, D.P., Benet, L., lucaferranti, Agarwal, K., Richard, B., Grawitter, J., Gupta, E., Herbst, M.F., Forets, M., yashrajgupta, Hanson, E., van Dyk, B., Rackauckas, C., Vasani, R., Micluţa-Câmpeanu, S., Olver, S., Koolen, T., Wormell, C., Vázquez, F.A., TagBot, J., O’Bryant, K., Carlsson, K., Piibeleht, M., Reno, Deits, R., Holy, T., Kaluba, M., matsueushi: JuliaIntervals/IntervalArithmetic.jl: V0.18.2. https://doi.org/10.5281/zenodo.4739394

  93. Fedorov, G., Nguyen, K.T., Harrison, P., Singh, A.: Intel Math Kernel Library 2019 Update 2 Release Notes. (2019). https://software.intel.com/en-us/mkl

  94. Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1999). https://doi.org/10.1137/1.9780898719604

  95. Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: Intel math kernel library. In: High-Performance Computing on the Intel® Xeon Phi™, pp. 167–188. Springer, New York (2014). https://doi.org/10.1007/978-3-319-06486-4_7

  96. Blackford, L.S., Petitet, A., Pozo, R., Remington, K., Whaley, R.C., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M.: An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002). https://doi.org/10.1145/567806.567807

    Article  MathSciNet  Google Scholar 

  97. Vigerske, S., Gleixner, A.: SCIP: global optimization of mixed-integer nonlinear programs in a branch-and-cut framework. Optim. Methods Softw. 33(3), 563–593 (2018). https://doi.org/10.1080/10556788.2017.1335312

    Article  MathSciNet  MATH  Google Scholar 

  98. Grant, M., Boyd, S., Ye, Y.: In: Liberti, L., Maculan, N. (eds.) Disciplined convex programming, pp. 155–210. Springer, Boston (2006). https://doi.org/10.1007/0-387-30528-9_7

  99. Khajavirad, A., Sahinidis, N.V.: A hybrid LP/NLP paradigm for global optimization relaxations. Math. Program. Comput. 10(3), 383–421 (2018). https://doi.org/10.1007/s12532-018-0138-5

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1932723. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew D. Stuber.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilhelm, M.E., Wang, C. & Stuber, M.D. Convex and concave envelopes of artificial neural network activation functions for deterministic global optimization. J Glob Optim 85, 569–594 (2023). https://doi.org/10.1007/s10898-022-01228-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-022-01228-x

Keywords

Navigation