Drawing from a PDF in python

  • #1
ergospherical
943
1,261
Some python function f(x) defines an (unnormalised) pdf between x_min and x_max
and say we want to draw x randomly from this distribution

if we had the CDF F(x) and its inverse F^{-1}(x), we could take values y uniformly in [0,1], and then our random values of x would be x = F^{-1}(y).

but without the CDF is there a direct way, maybe even built into some python library?
 
Computer science news on Phys.org
  • #3
How is this related to PDFs?
 
  • #4
Greg Bernhardt said:
How is this related to PDFs?
Probability density function
 
  • Haha
Likes Greg Bernhardt
  • #5
Frabjous said:
Probability density function
oh boy, I had the wrong acronym :biggrin:
 
  • Like
Likes Frabjous
  • #6
ergospherical said:
but without the CDF is there a direct way, maybe even built into some python library?
Yes, a generalised version of rejection sampling (which allows for support over ## [-\infty, +\infty] ##) as pointed to by @f95toli is available as scipy.stats.sampling.RatioUniforms as well as other methods including some based on numerical integration and inversion of the PDF to get the CDF which is then interpolated.
 
  • Like
Likes ergospherical
  • #7
That is an interesting idea. I'd gone ahead and implemented a method based on the inverse cumulative density function, which is a bit cumbersome, namely

- define the pdf, i.e. pdf = lambda x: __some function of x__
- calculate the cdf, i.e. cdf = lambda x: quad(pdf, 0, x)[0]
- invert the cdf, i.e. return fsolve(lambda x: cdf(x) - random.random(), init_guess)[0]

where for init_guess I can simply feed in what is roughly the expected value.

I think it works, but I'm curious if rejection sampling would be faster (?). In particular, I'm now sampling from ##[0, \infty]## so I would need a generalized implementation.
 
  • #8
Also, if the pdf is nasty then the numerical integration can go wrong...
 
  • #9
Rejection sampling is usually quite fast. I would start there and only move to other methods if rejection sampling doesn't work.
 
  • #10
ergospherical said:
Also, if the pdf is nasty then the numerical integration can go wrong...
Yes, but a difficult integration often goes with a very small acceptance area so naive rejection sampling may be impossibly slow.

There is of course a trade-off depending on how many samples you want: if it is only a handful there is less to be gained by spending time on the integration, but if you want a few million and the acceptance is one in ## 10^9 ##...

The prebuilt SciPy routines are pretty good and easy to use, I'd experiment to see what works best.
 

Similar threads

  • Programming and Computer Science
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
869
  • Programming and Computer Science
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
854
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
990
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
795
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
2
Replies
50
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
Back
Top