My job has led me down the rabbit hole of doing some scripting work in Perl, mainly utility tools. The challenge being that these tools need to parse several thousand source files, and doing so would take quite some time.
I initially dabbled in doing very light stuff with a perl -e
one-liner from within a shell script, which meant I could use xargs. However, as my parsing needs evolved on the Perl side of things, I ended up switching to an actual Perl file, which hindered my ability to do parallel processing as our VMs did not have the Perl interpreter built with threads support. In addition, installation of any non-builtin modules such as CPAN was not possible on my target system, so I had limited possibilities, some of which I would assume to be safer and/or less quirky than this.
So then I came up with a rather ugly solution which involved invoking xargs via backticks, which then called a perl one-liner (again) for doing the more computation-heavy parts, xargs splitting the array to process into argument batches for each mini-program to process. It looked like this thus far:
my $out = `echo "$str_in" | xargs -P $num_threads -n $chunk_size perl -e ' my \@args = \@ARGV; foreach my \$arg (\@args) { for my \$idx (1 .. 100000) { my \$var = \$idx; } print "\$arg\n"; } '`;
However, this had some drawbacks:
- No editor syntax highlighting (in my case, VSCode), since the inline program is a string.
- All variables within the inline program had to be escaped so as not to be interpolated themselves, which hindered readability quite a bit.
- Every time you would want to use this technique in different parts of the code, you'd have to copy-paste the entire shell command together with the mini-program, even if that very logic was somewhere else in your code.
After some playing around, I've come to a nifty almost-metaprogramming solution, which isn't perfect still, but fits my needs decently well:
sub processing_fct { my u/args = u/ARGV; foreach my $arg (@args) { for my $idx (1 .. 100000) { my $var = $idx; } print "A very extraordinarily long string that contains $arg words and beyond\n"; } } sub parallel_invoke { use POSIX qw{ceil}; my $src_file = $0; my $fct_name = shift; my $input_arg_array = shift; my $n_threads = shift; my $str_in = join("\n", @{$input_arg_array}); my $chunk_size = ceil(@{$input_arg_array} / $n_threads); open(my $src_fh, "<", $src_file) or die("parallel_invoke(): Unable to open source file"); my $src_content = do { local $/; <$src_fh> }; my $fct_body = ($src_content =~ /sub\s+$fct_name\s*({((?:[^}{]*(?1)?)*+)})/m)[1] or die("Unable to find function $fct_name in source file"); return `echo '$str_in' | xargs -P $n_threads -n $chunk_size perl -e '$fct_body'`; } my $out = parallel_invoke("processing_fct", \@array, $num_threads);
All parallel_invoke() does is open it's own source file, finds the subroutine declaration, and then passes the function body captured by the regex (which isn't too pretty, but it was necessary to reliably match a balanced construct of nested brackets) - to the xargs perl call.
My limited benchmarking has found this to be as fast if not faster than the perl-with-threads equivalent, in addition to circumventing the performance penalty for the thread safety.
I'd be curious to hear of your opinion of such method, or if you've solved a similar issue differently.
submitted by /u/Wynaan
[link] [comments]
submitted by /u/perlancar [link] [comments] |
List of new CPAN distributions – Jun 2024
dist author abstract date Alien-RtAudio JBARRETT Install RtAudio 2024-06-23T15:44:22 Alien-SunVox JBARRETT Install The SunVox Library – Alexander Zolotov's SunVox modular synthesizer and…perlancar's blog
I need some help with the old Perl Gunnar Hjalmarsson's Ringlink program on my site. The forms work, the database gets added to and everything seems ready to go except for the email functions that depend on sendmail.
I have tried several things, installed the CPAN dependencies the program needs, tried Auron SendEmail and other programs and have thoroughly confused myself.
There's a test installation on my site, with the admin and password are both 'test'. There are copies of the CGI files and probably what needs looking at are rlmain.pm, rlconfig.pm and sender.pm
I am running Apache 2.4.54 on Windows 10 with Strawberry Perl installed. I am using the last published version Ringlink (v3.4)
I know this is an old program and the project probably not worth pursuing, but I really would like to give this a go to get it working and would be grateful for any suggestions.
submitted by /u/brisray
[link] [comments]
Auron SendEmail - Auron Software Portable E-mail Freeware
The Auron SendEmail set of portable freeware tools for Windows enable you to send E-mails through from either GUI or commandline.Auron Software
[link] [comments]
(dii) 7 great CPAN modules released last week
Updates for great CPAN modules released last week. A module is considered great if its favorites count is greater or equal than 12. App...niceperl.blogspot.com
Damian on top form as always. The modules this talk is based on are, of course, all both brilliant and incredibly useful. But the thing that's really impressed me here is the way he has taken some of his modules from a couple of decades ago and replaced them with calls to LLMs. That's food for thought.
submitted by /u/davorg
[link] [comments]
The Once and Future Perl - Damian Conway - TPRC 2024
#tprc24 #perl #rakuRetroemotions, statistical outliers, lunar excursions, Lorentz contraction, extrasolar planets, atomic clocks, lucky bullets, Greek mythol...YouTube