THIS.
But, just to add to it, a big part of it is whether you are sampling over a fully identified space. For example, if you have a finite mixture, you can post-process the draws, or you can put in identification, e.g., on the size of the components. That used to destroy conjugacy but, with HMC, that's not a thing anymore. My own experience with Stan and writing my own code is that mixture models are EXTREMELY hit-and-miss. Widely spaced peak? Yeah, fine. But for most real problems label-switching is nontrivial.
There's some work from about 20+ years ago by Ansari and by Kim and others using DPs for this, but that approach never caught on. Maybe if they implement it in Stan. Not sure I believe it for multimodal posteriors anyway.
You can easily get combinatorial multimodality in the posterior. Betancourt has a case study on mixture models. Bob Carpenter and others in Stan have discussed it and said they don’t focus on non parametric Bayes because of multimodality and tractability concerns. You can also show that VB approximations to multimodal posteriors get locked into local maxima/minima. IMO a lot of Bayesian models and the surrounding claims outstretch the samplers and fitting algorithms. The problem is not the model or the theory but rather that the samplers and solvers are orders of magnitude underpowered. Running vanilla MCMC on a multimodal posterior is fine theoretically in infinite time and with infinite starts. It’s only problematic when you want to identify how much time and how many starts. VB with state of the art trust region solvers is different than VB with line search solvers.
Give it 20 years and return to the model. If it still doesn’t work then it’s the model. Else it’s the sampler.