Skip to main content


Some basic stat computations with Perl , Python and RLessons learned:
A) Performance freaks to stop using #rstat 's runif for random generation. The Hoshiro random number generator arxiv.org/abs/1805.01407 is 10x faster.
Implementations in #perl 's #PDL, #rstats (dqrng) and #python #numpy are within 20% of each other

B) But does it make a difference in applications? To get to the bottom of this, I coded a truncated random variate generator in #rstats and #perl using #pdl (as well as standard u/perl) using the #GSL packages metacpan.org/pod/PDL::GSL::CDF & metacpan.org/pod/Math::GSL for accessing the CDF & quantile functions. In this context, it's the calculation of the #CDF that is the computationally intensive part, not the drawing of the random number itself.
Well even in these case, the choice of the generator did matter. Note that the fully vectorized #PDL #perl versions were faster than #rstats

C) I should probably blog about these experiments at some point. Note that #pdl (but not base #perl) are rather competitive choices for large array processing with numerical operations. I mostly stay away of #python , but would not surprise me that for compute intensive stuff (where the heavy duty work is done in C/C++), it does not matter (much) which high level language one uses to build data applications

preview.redd.it/qn00sx78gbuc1.…

preview.redd.it/4by4jbh9gbuc1.…

submitted by /u/ReplacementSlight413
[link] [comments]