Skip to main content


Some basic stat computations with Perl , Python and RLessons learned:
A) Performance freaks to stop using #rstat 's runif for random generation. The Hoshiro random number generator https://arxiv.org/abs/1805.01407 is 10x faster.
Implementations in #perl 's #PDL, #rstats (dqrng) and #python #numpy are within 20% of each other

B) But does it make a difference in applications? To get to the bottom of this, I coded a truncated random variate generator in #rstats and #perl using #pdl (as well as standard u/perl) using the #GSL packages https://metacpan.org/pod/PDL::GSL::CDF & https://metacpan.org/pod/Math::GSL for accessing the CDF & quantile functions. In this context, it's the calculation of the #CDF that is the computationally intensive part, not the drawing of the random number itself.
Well even in these case, the choice of the generator did matter. Note that the fully vectorized #PDL #perl versions were faster than #rstats

C) I should probably blog about these experiments at some point. Note that #pdl (but not base #perl) are rather competitive choices for large array processing with numerical operations. I mostly stay away of #python , but would not surprise me that for compute intensive stuff (where the heavy duty work is done in C/C++), it does not matter (much) which high level language one uses to build data applications

https://preview.redd.it/qn00sx78gbuc1.png?width=1538&format=png&auto=webp&s=1874b9e710c239e9acea36fb54d957167a69b270

https://preview.redd.it/4by4jbh9gbuc1.png?width=1538&format=png&auto=webp&s=dc9944347983445126e4ab57b43c76202ca719d6

submitted by /u/ReplacementSlight413
[link] [comments]

Programming Feed reshared this.