Christos Argyropoulos MD, PhD

Christos Argyropoulos MD, PhD

1 year ago • •

Christos Argyropoulos MD, PhD
1 year ago • •

velociperl.com/
This is an interesting fork of #perl from a data science consultancy.
My 2c is that there numerous low hanging fruit in #Perl's underlying #clang code base that can boost performance even further.

VelociPerl

VelociPerl (or vPerl) is a speed and safety oriented fork of Perl.

^{velociperl.com}

in reply to Christos Argyropoulos MD, PhD

Mark Gardner

in reply to Christos Argyropoulos MD, PhD • 1 year ago • •

“I[#vPerl] is 100% compatible with all existing #Perl and XS code,” but since we opted for the "rename things" clause of Perl’s Artistic License, you have to exchange your email address for closed-source OS-specific binaries and will have to just trust us!

#perl #vperl

This entry was edited (1 year ago)

in reply to Mark Gardner

Christos Argyropoulos MD, PhD

in reply to Mark Gardner • 1 year ago • •

I only use the email accounts that Gmail has leaked multiple times on my behalf.
There is a real need for high performance perl

in reply to Christos Argyropoulos MD, PhD

Mark Gardner

in reply to Christos Argyropoulos MD, PhD • 1 year ago • •

I agree, but “100% compatibility” with normal #Perl is a strong claim that not even #RPerl tried to make, which is why the latter is unacceptable. So if this VelociPerl.com thing scratches your itch, go sign up and tell us what you find.

A couple of red flags:
* different speed claims for “Personal" and "Enterprise" editions
* "benchmarks" links to a GitHub repo and user established just this month
* posted on r/perl from a similarly young account

VelociPerl

VelociPerl (or vPerl) is a speed and safety oriented fork of Perl.

^{velociperl.com}

#perl #rperl

in reply to Mark Gardner

Mark Gardner

in reply to Mark Gardner • 1 year ago • •

I guess you read about it on r/perl? Here’s my red flags reply: reddit.com/r/perl/comments/1af…

in reply to Mark Gardner

Christos Argyropoulos MD, PhD

in reply to Mark Gardner • 1 year ago • •

@mjgardner Here is the problem: need to deduplicate a massive dataset that cannot fit in memory. #Perl script (on an AMD RYZEN server) about 3.7 sec for 2M rows (1/1000th of the dataset), #Python 3.4 sec, #clang (using the glib hash) about 2.8 sec. Perl actually faster than Python at 1M rows, but the larger the chunk the faster some downstream tasks will run (and the 2M is about the optimal size for this project). The 40% improvement (if verified) kills both C and Python

#perl #python #clang @Mark Gardner

in reply to Christos Argyropoulos MD, PhD

Mark Gardner

in reply to Christos Argyropoulos MD, PhD • 1 year ago • •

I understand your problem.

I hope the sketchy closed-source solution doesn’t come with malware on the side.

in reply to Mark Gardner

Christos Argyropoulos MD, PhD

in reply to Mark Gardner • 1 year ago • •

@mjgardner It is not really a problem for this problem if one is willing to use coarse level parallelism : we estimated that a 40 subprocess fork with all I/O done on a NVME will take less than 10 min with about 30%-40% of the time spent on I/O. Would be nice to illustrate that one can achieve near compiled performance, though, for some rather obvious reasons!

@Mark Gardner

in reply to Mark Gardner

Zaki

in reply to Mark Gardner • 1 year ago • •

@mjgardner
I actually met the author at TPRC in Toronto. They mentioned doing this back then.

@Mark Gardner

⇧