Maximum likelihood parameters deviate from posterior distributionslme() and lmer() giving conflicting resultsWhy is MCMC needed when estimating a parameter using MAPGiven MCMC samples, what are the options for estimating posterior of parameters?Maximizing likelihood versus MCMC sampling: Comparing Parameters and DevianceModelling parameters in maximum likelihoodMarkov chain Monte Carlo (MCMC) for Maximum Likelihood Estimation (MLE)What is the common criterion to decide the performance of prior selection in MCMCConnection between MCMC and Optimization for Inverse/Parameter-Estmation ProblemsFailure of Maximum Likelihood EstimationDo posterior probability values from an MCMC analysis have any use?Monte Carlo maximum likelihood vs Bayesian inferenceExample of maximum a posteriori that does not match the mean of a marginalized posterior

Is it possible to do 50 km distance without any previous training?

Can you really stack all of this on an Opportunity Attack?

DC-DC converter from low voltage at high current, to high voltage at low current

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?

What would happen to a modern skyscraper if it rains micro blackholes?

Why can't we play rap on piano?

Do infinite dimensional systems make sense?

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

Doing something right before you need it - expression for this?

Malformed Address '10.10.21.08/24', must be X.X.X.X/NN or

Arrow those variables!

How much of data wrangling is a data scientist's job?

Why doesn't H₄O²⁺ exist?

Did Shadowfax go to Valinor?

Maximum likelihood parameters deviate from posterior distributions

Today is the Center

Why can't I see bouncing of a switch on an oscilloscope?

Approximately how much travel time was saved by the opening of the Suez Canal in 1869?

Is it unprofessional to ask if a job posting on GlassDoor is real?

Unable to deploy metadata from Partner Developer scratch org because of extra fields

Does detail obscure or enhance action?

Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?

RSA: Danger of using p to create q

Has there ever been an airliner design involving reducing generator load by installing solar panels?



Maximum likelihood parameters deviate from posterior distributions


lme() and lmer() giving conflicting resultsWhy is MCMC needed when estimating a parameter using MAPGiven MCMC samples, what are the options for estimating posterior of parameters?Maximizing likelihood versus MCMC sampling: Comparing Parameters and DevianceModelling parameters in maximum likelihoodMarkov chain Monte Carlo (MCMC) for Maximum Likelihood Estimation (MLE)What is the common criterion to decide the performance of prior selection in MCMCConnection between MCMC and Optimization for Inverse/Parameter-Estmation ProblemsFailure of Maximum Likelihood EstimationDo posterior probability values from an MCMC analysis have any use?Monte Carlo maximum likelihood vs Bayesian inferenceExample of maximum a posteriori that does not match the mean of a marginalized posterior






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


I have a likelihood function $mathcalL(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbfR^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.



Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^ML approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.



This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.



My questions are:



  1. Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?


  2. Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?










share|cite|improve this question







New contributor




mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    4












    $begingroup$


    I have a likelihood function $mathcalL(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbfR^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.



    Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^ML approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.



    This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.



    My questions are:



    1. Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?


    2. Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?










    share|cite|improve this question







    New contributor




    mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      4












      4








      4


      1



      $begingroup$


      I have a likelihood function $mathcalL(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbfR^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.



      Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^ML approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.



      This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.



      My questions are:



      1. Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?


      2. Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?










      share|cite|improve this question







      New contributor




      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I have a likelihood function $mathcalL(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbfR^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.



      Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^ML approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.



      This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.



      My questions are:



      1. Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?


      2. Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?







      bayesian maximum-likelihood optimization inference mcmc






      share|cite|improve this question







      New contributor




      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question







      New contributor




      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question






      New contributor




      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 14 hours ago









      mgc70mgc70

      212




      212




      New contributor




      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      mgc70 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          7












          $begingroup$

          With flat priors, the posterior is identical to the likelihood up to a constant. Thus



          1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.


          2. For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.


          3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Ah, sorry for the almost identical answer, typed in parallel!
            $endgroup$
            – Xi'an
            14 hours ago


















          7












          $begingroup$

          Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):



          1. in large dimensions $N$, the posterior does not concentrate on the
            maximum but something of a distance of order $sqrtN$ from the
            mode, meaning that the largest values of the likelihood function
            encountered by an MCMC sampler are often quite below the value of
            the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_n)$, $theta$ is at least at a distance $N-2sqrt2N$ from the mode, $0$.


          2. While the MAP and the MLE are indeed confounded under a flat prior, the
            marginal densities of the different parameters of the model may have (marginal) modes
            that are far away from the corresponding MLEs (i.e., MAPs).


          3. The MAP is a position
            in the parameter space where the posterior density is highest but
            this does not convey any indication of posterior weight or volume
            for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.


          4. The fact that most parameters are bounded may lead to some
            components of the MAP=MLE occurring at a boundary.


          See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.



          As an example of point 1 above, here is a short R code



          N=100
          T=1e4
          lik=dis=rep(0,T)
          mu=rmvnorm(1,mean=rep(0,N))
          xobs=rmvnorm(1,mean=rep(0,N))
          lik[1]=dmvnorm(xobs,mu,log=TRUE)
          dis[1]=(xobs-mu)%*%t(xobs-mu)
          for (t in 2:T)
          prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
          proike=dmvnorm(xobs,prop,log=TRUE)
          if (log(runif(1))<proike-lik[t-1])
          mu=prop;lik[t]=proike
          elselik[t]=lik[t-1]
          dis[t]=(xobs-mu)%*%t(xobs-mu)


          which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:



          > range(lik)
          [1] -183.9515 -126.6924


          which is explained by the fact that the sequence never comes near the observation:



          > range(dis)
          [1] 69.59714 184.11525





          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
            $endgroup$
            – Jacob Socolar
            10 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          mgc70 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401349%2fmaximum-likelihood-parameters-deviate-from-posterior-distributions%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7












          $begingroup$

          With flat priors, the posterior is identical to the likelihood up to a constant. Thus



          1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.


          2. For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.


          3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Ah, sorry for the almost identical answer, typed in parallel!
            $endgroup$
            – Xi'an
            14 hours ago















          7












          $begingroup$

          With flat priors, the posterior is identical to the likelihood up to a constant. Thus



          1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.


          2. For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.


          3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Ah, sorry for the almost identical answer, typed in parallel!
            $endgroup$
            – Xi'an
            14 hours ago













          7












          7








          7





          $begingroup$

          With flat priors, the posterior is identical to the likelihood up to a constant. Thus



          1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.


          2. For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.


          3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.






          share|cite|improve this answer









          $endgroup$



          With flat priors, the posterior is identical to the likelihood up to a constant. Thus



          1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.


          2. For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.


          3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered 14 hours ago









          Florian HartigFlorian Hartig

          4,1791322




          4,1791322











          • $begingroup$
            Ah, sorry for the almost identical answer, typed in parallel!
            $endgroup$
            – Xi'an
            14 hours ago
















          • $begingroup$
            Ah, sorry for the almost identical answer, typed in parallel!
            $endgroup$
            – Xi'an
            14 hours ago















          $begingroup$
          Ah, sorry for the almost identical answer, typed in parallel!
          $endgroup$
          – Xi'an
          14 hours ago




          $begingroup$
          Ah, sorry for the almost identical answer, typed in parallel!
          $endgroup$
          – Xi'an
          14 hours ago













          7












          $begingroup$

          Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):



          1. in large dimensions $N$, the posterior does not concentrate on the
            maximum but something of a distance of order $sqrtN$ from the
            mode, meaning that the largest values of the likelihood function
            encountered by an MCMC sampler are often quite below the value of
            the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_n)$, $theta$ is at least at a distance $N-2sqrt2N$ from the mode, $0$.


          2. While the MAP and the MLE are indeed confounded under a flat prior, the
            marginal densities of the different parameters of the model may have (marginal) modes
            that are far away from the corresponding MLEs (i.e., MAPs).


          3. The MAP is a position
            in the parameter space where the posterior density is highest but
            this does not convey any indication of posterior weight or volume
            for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.


          4. The fact that most parameters are bounded may lead to some
            components of the MAP=MLE occurring at a boundary.


          See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.



          As an example of point 1 above, here is a short R code



          N=100
          T=1e4
          lik=dis=rep(0,T)
          mu=rmvnorm(1,mean=rep(0,N))
          xobs=rmvnorm(1,mean=rep(0,N))
          lik[1]=dmvnorm(xobs,mu,log=TRUE)
          dis[1]=(xobs-mu)%*%t(xobs-mu)
          for (t in 2:T)
          prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
          proike=dmvnorm(xobs,prop,log=TRUE)
          if (log(runif(1))<proike-lik[t-1])
          mu=prop;lik[t]=proike
          elselik[t]=lik[t-1]
          dis[t]=(xobs-mu)%*%t(xobs-mu)


          which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:



          > range(lik)
          [1] -183.9515 -126.6924


          which is explained by the fact that the sequence never comes near the observation:



          > range(dis)
          [1] 69.59714 184.11525





          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
            $endgroup$
            – Jacob Socolar
            10 hours ago















          7












          $begingroup$

          Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):



          1. in large dimensions $N$, the posterior does not concentrate on the
            maximum but something of a distance of order $sqrtN$ from the
            mode, meaning that the largest values of the likelihood function
            encountered by an MCMC sampler are often quite below the value of
            the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_n)$, $theta$ is at least at a distance $N-2sqrt2N$ from the mode, $0$.


          2. While the MAP and the MLE are indeed confounded under a flat prior, the
            marginal densities of the different parameters of the model may have (marginal) modes
            that are far away from the corresponding MLEs (i.e., MAPs).


          3. The MAP is a position
            in the parameter space where the posterior density is highest but
            this does not convey any indication of posterior weight or volume
            for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.


          4. The fact that most parameters are bounded may lead to some
            components of the MAP=MLE occurring at a boundary.


          See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.



          As an example of point 1 above, here is a short R code



          N=100
          T=1e4
          lik=dis=rep(0,T)
          mu=rmvnorm(1,mean=rep(0,N))
          xobs=rmvnorm(1,mean=rep(0,N))
          lik[1]=dmvnorm(xobs,mu,log=TRUE)
          dis[1]=(xobs-mu)%*%t(xobs-mu)
          for (t in 2:T)
          prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
          proike=dmvnorm(xobs,prop,log=TRUE)
          if (log(runif(1))<proike-lik[t-1])
          mu=prop;lik[t]=proike
          elselik[t]=lik[t-1]
          dis[t]=(xobs-mu)%*%t(xobs-mu)


          which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:



          > range(lik)
          [1] -183.9515 -126.6924


          which is explained by the fact that the sequence never comes near the observation:



          > range(dis)
          [1] 69.59714 184.11525





          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
            $endgroup$
            – Jacob Socolar
            10 hours ago













          7












          7








          7





          $begingroup$

          Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):



          1. in large dimensions $N$, the posterior does not concentrate on the
            maximum but something of a distance of order $sqrtN$ from the
            mode, meaning that the largest values of the likelihood function
            encountered by an MCMC sampler are often quite below the value of
            the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_n)$, $theta$ is at least at a distance $N-2sqrt2N$ from the mode, $0$.


          2. While the MAP and the MLE are indeed confounded under a flat prior, the
            marginal densities of the different parameters of the model may have (marginal) modes
            that are far away from the corresponding MLEs (i.e., MAPs).


          3. The MAP is a position
            in the parameter space where the posterior density is highest but
            this does not convey any indication of posterior weight or volume
            for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.


          4. The fact that most parameters are bounded may lead to some
            components of the MAP=MLE occurring at a boundary.


          See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.



          As an example of point 1 above, here is a short R code



          N=100
          T=1e4
          lik=dis=rep(0,T)
          mu=rmvnorm(1,mean=rep(0,N))
          xobs=rmvnorm(1,mean=rep(0,N))
          lik[1]=dmvnorm(xobs,mu,log=TRUE)
          dis[1]=(xobs-mu)%*%t(xobs-mu)
          for (t in 2:T)
          prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
          proike=dmvnorm(xobs,prop,log=TRUE)
          if (log(runif(1))<proike-lik[t-1])
          mu=prop;lik[t]=proike
          elselik[t]=lik[t-1]
          dis[t]=(xobs-mu)%*%t(xobs-mu)


          which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:



          > range(lik)
          [1] -183.9515 -126.6924


          which is explained by the fact that the sequence never comes near the observation:



          > range(dis)
          [1] 69.59714 184.11525





          share|cite|improve this answer











          $endgroup$



          Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):



          1. in large dimensions $N$, the posterior does not concentrate on the
            maximum but something of a distance of order $sqrtN$ from the
            mode, meaning that the largest values of the likelihood function
            encountered by an MCMC sampler are often quite below the value of
            the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_n)$, $theta$ is at least at a distance $N-2sqrt2N$ from the mode, $0$.


          2. While the MAP and the MLE are indeed confounded under a flat prior, the
            marginal densities of the different parameters of the model may have (marginal) modes
            that are far away from the corresponding MLEs (i.e., MAPs).


          3. The MAP is a position
            in the parameter space where the posterior density is highest but
            this does not convey any indication of posterior weight or volume
            for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.


          4. The fact that most parameters are bounded may lead to some
            components of the MAP=MLE occurring at a boundary.


          See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.



          As an example of point 1 above, here is a short R code



          N=100
          T=1e4
          lik=dis=rep(0,T)
          mu=rmvnorm(1,mean=rep(0,N))
          xobs=rmvnorm(1,mean=rep(0,N))
          lik[1]=dmvnorm(xobs,mu,log=TRUE)
          dis[1]=(xobs-mu)%*%t(xobs-mu)
          for (t in 2:T)
          prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
          proike=dmvnorm(xobs,prop,log=TRUE)
          if (log(runif(1))<proike-lik[t-1])
          mu=prop;lik[t]=proike
          elselik[t]=lik[t-1]
          dis[t]=(xobs-mu)%*%t(xobs-mu)


          which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:



          > range(lik)
          [1] -183.9515 -126.6924


          which is explained by the fact that the sequence never comes near the observation:



          > range(dis)
          [1] 69.59714 184.11525






          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 7 hours ago

























          answered 14 hours ago









          Xi'anXi'an

          59.2k897365




          59.2k897365







          • 1




            $begingroup$
            I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
            $endgroup$
            – Jacob Socolar
            10 hours ago












          • 1




            $begingroup$
            I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
            $endgroup$
            – Jacob Socolar
            10 hours ago







          1




          1




          $begingroup$
          I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
          $endgroup$
          – Jacob Socolar
          10 hours ago




          $begingroup$
          I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
          $endgroup$
          – Jacob Socolar
          10 hours ago










          mgc70 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          mgc70 is a new contributor. Be nice, and check out our Code of Conduct.












          mgc70 is a new contributor. Be nice, and check out our Code of Conduct.











          mgc70 is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401349%2fmaximum-likelihood-parameters-deviate-from-posterior-distributions%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Reverse int within the 32-bit signed integer range: [−2^31, 2^31 − 1]Combining two 32-bit integers into one 64-bit integerDetermine if an int is within rangeLossy packing 32 bit integer to 16 bitComputing the square root of a 64-bit integerKeeping integer addition within boundsSafe multiplication of two 64-bit signed integersLeetcode 10: Regular Expression MatchingSigned integer-to-ascii x86_64 assembler macroReverse the digits of an Integer“Add two numbers given in reverse order from a linked list”

          Category:Fedor von Bock Media in category "Fedor von Bock"Navigation menuUpload mediaISNI: 0000 0000 5511 3417VIAF ID: 24712551GND ID: 119294796Library of Congress authority ID: n96068363BnF ID: 12534305fSUDOC authorities ID: 034604189Open Library ID: OL338253ANKCR AUT ID: jn19990000869National Library of Israel ID: 000514068National Thesaurus for Author Names ID: 341574317ReasonatorScholiaStatistics

          Kiel Indholdsfortegnelse Historie | Transport og færgeforbindelser | Sejlsport og anden sport | Kultur | Kendte personer fra Kiel | Noter | Litteratur | Eksterne henvisninger | Navigationsmenuwww.kiel.de54°19′31″N 10°8′26″Ø / 54.32528°N 10.14056°Ø / 54.32528; 10.14056Oberbürgermeister Dr. Ulf Kämpferwww.statistik-nord.deDen danske Stats StatistikKiels hjemmesiderrrWorldCat312794080n790547494030481-4