What loss function to use when labels are probabilities? Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Why would neural networks be a particularly good framework for “embodied AI”?Understanding GAN Loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?What is batch / batch size in neural networks?Comparing and studying Loss FunctionsLoss function spikesPredicting sine using LSTM: Small output range and delayed output?

What do you call the holes in a flute?

How does modal jazz use chord progressions?

Stop battery usage [Ubuntu 18]

Typsetting diagram chases (with TikZ?)

How to say that you spent the night with someone, you were only sleeping and nothing else?

Cauchy Sequence Characterized only By Directly Neighbouring Sequence Members

Was credit for the black hole image misattributed?

What loss function to use when labels are probabilities?

How can I make names more distinctive without making them longer?

What is the electric potential inside a point charge?

Stopping real property loss from eroding embankment

What's the point in a preamp?

Antler Helmet: Can it work?

When is phishing education going too far?

Determine whether f is a function, an injection, a surjection

New Order #5: where Fibonacci and Beatty meet at Wythoff

Why does this iterative way of solving of equation work?

Do working physicists consider Newtonian mechanics to be "falsified"?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Can I throw a longsword at someone?

Why use gamma over alpha radiation?

How is simplicity better than precision and clarity in prose?

grandmas drink with lemon juice

Is it possible to ask for a hotel room without minibar/extra services?



What loss function to use when labels are probabilities?



Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Why would neural networks be a particularly good framework for “embodied AI”?Understanding GAN Loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?What is batch / batch size in neural networks?Comparing and studying Loss FunctionsLoss function spikesPredicting sine using LSTM: Small output range and delayed output?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model with x=[some features] and y=[0.2, 0.3, 0.5].



It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



Would something like MSE (after applying softmax) make sense, or is there a better loss function?










share|improve this question







New contributor




Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    2












    $begingroup$


    What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model with x=[some features] and y=[0.2, 0.3, 0.5].



    It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



    Would something like MSE (after applying softmax) make sense, or is there a better loss function?










    share|improve this question







    New contributor




    Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      2












      2








      2





      $begingroup$


      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model with x=[some features] and y=[0.2, 0.3, 0.5].



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?










      share|improve this question







      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model with x=[some features] and y=[0.2, 0.3, 0.5].



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?







      neural-networks loss-functions probability-distribution






      share|improve this question







      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 7 hours ago









      Thomas JohnsonThomas Johnson

      1133




      1133




      New contributor




      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



          You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



          $$H(p,q)=-sum_xin X p(x) log q(x).$$
          $ $



          Note that one-hot labels would mean that
          $$
          p(x) =
          begincases
          1 & textif x text is the true label\
          0 & textotherwise
          endcases$$



          which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



          $$H(p,q) = -log q(x_label)$$






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



            You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



            $$H(p,q)=-sum_xin X p(x) log q(x).$$
            $ $



            Note that one-hot labels would mean that
            $$
            p(x) =
            begincases
            1 & textif x text is the true label\
            0 & textotherwise
            endcases$$



            which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



            $$H(p,q) = -log q(x_label)$$






            share|improve this answer









            $endgroup$

















              3












              $begingroup$

              Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



              You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



              $$H(p,q)=-sum_xin X p(x) log q(x).$$
              $ $



              Note that one-hot labels would mean that
              $$
              p(x) =
              begincases
              1 & textif x text is the true label\
              0 & textotherwise
              endcases$$



              which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



              $$H(p,q) = -log q(x_label)$$






              share|improve this answer









              $endgroup$















                3












                3








                3





                $begingroup$

                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$






                share|improve this answer









                $endgroup$



                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 6 hours ago









                Philip RaeisghasemPhilip Raeisghasem

                988119




                988119




















                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.












                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.











                    Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Reverse int within the 32-bit signed integer range: [−2^31, 2^31 − 1]Combining two 32-bit integers into one 64-bit integerDetermine if an int is within rangeLossy packing 32 bit integer to 16 bitComputing the square root of a 64-bit integerKeeping integer addition within boundsSafe multiplication of two 64-bit signed integersLeetcode 10: Regular Expression MatchingSigned integer-to-ascii x86_64 assembler macroReverse the digits of an Integer“Add two numbers given in reverse order from a linked list”

                    Category:Fedor von Bock Media in category "Fedor von Bock"Navigation menuUpload mediaISNI: 0000 0000 5511 3417VIAF ID: 24712551GND ID: 119294796Library of Congress authority ID: n96068363BnF ID: 12534305fSUDOC authorities ID: 034604189Open Library ID: OL338253ANKCR AUT ID: jn19990000869National Library of Israel ID: 000514068National Thesaurus for Author Names ID: 341574317ReasonatorScholiaStatistics

                    Kiel Indholdsfortegnelse Historie | Transport og færgeforbindelser | Sejlsport og anden sport | Kultur | Kendte personer fra Kiel | Noter | Litteratur | Eksterne henvisninger | Navigationsmenuwww.kiel.de54°19′31″N 10°8′26″Ø / 54.32528°N 10.14056°Ø / 54.32528; 10.14056Oberbürgermeister Dr. Ulf Kämpferwww.statistik-nord.deDen danske Stats StatistikKiels hjemmesiderrrWorldCat312794080n790547494030481-4