This Week on perl5-porters - 10-16 July 2006

This Week on perl5-porters - 10-16 July 2006

"Kindly avoid inflicting upon this one the same fate shared by so much other documentation produced by a patchwork of well-meaning but disparate contributors seeing more the trees than the forest as they continually apply peephole optimizations that inadvertently compromise the overall integrity of the work, where the presentation eventually more resembles an illegible patchwork quilt of no especial application or intended purpose than it does a dedicated garment of one cloth woven." -- Tom Christiansen, at the height of his form.

Topics of Interest

Adding examples to the core documentation

Gabor Szabo continued his quest to add examples to the core documentation, based on feedback he had received giving training lessons. First up, he wanted to rewrite perlfunc to use the modern approach for opening files (using lexical file handles and the three argument form).

While no-one could dispute that this was a good idea, Nicholas objected to the fact that it meant that new stuff was being added but no old stuff was being pruned away.

Other changes that Gabor wished to make involved adding more examples to the documentation. Again, Nicholas wondered why people wouldn't have the reflex to toss off a few one-liners to see how things worked.

H.Merijn Brand suggested creating an examples directory. Gabor liked the idea, but regretted that the usual installer tools (CPAN and CPANPLUS) do not install the eg/ directories, making them all but invisible to mere mortals.

  Opening gambit

Gabor then continued with examples for index/rindex. Given that the debate about open was not resolved, this was not applied.

He then added a brief discussion of use strict in perlintro and replaced the use of $a and $b by some less-special variables. H.Merijn liked it enough to apply it.

perlintro also received another two batches of changes, here:

and here:

Gabor was briefly confused as to whether he should be patching blead or maint. H.Merijn Brand and Yves Orton explained that he should patch blead, and that the maint pumpking (currently Nicholas Clark, but will be Dave Mitchell in the future) will backport patches to maint as appropriate.

So Gabor patched perlhack in order to point future porters to the right thing to hack.

  X marks the spot

Gabor then gave perlopentut.pod a work-over to begin recommending the use of lexical file handles rather than typeglobs. H.Merijn Brand had a number of issues with the proposed changes, and the thread continued on for quite some time. People seemed to be in agreement that some sort of improvement was needed, but the hard question was what was truly essential for a tutorial, and in what order to present it.

  Opening night

After digesting the suggestions that everyone put forward, Gabor produced a second cut. After an initial false start, Tom Christiansen explained that from a tutorial perspective, omitting parentheses and preferring or over || is incorrect. Having taught thousands of people Perl, Tom has come to the conclusion that parens and low-precedence || helps people come to grips with the language. Omitting needless parentheses comes further up the curve of Perl mastery.

On the correct use of my_strlcpy

Steve Peters had tried to replace some uses of strcpy, (which is deemed "unsafe" in that it can copy without limit), with the safer my_strlcpy. Except that they don't return the same thing, which caused Steve Hay's smoke machine to have a fit. The first Steve was rather embarrassed and apologised.

On the plus side, this revealed a place where more tests are needed.

  Copy right

Hash::Util::FieldHash versus global destruction

Joshua ben Jore wanted to know whether Hash::Util::FieldHash makes inside-out objects safe in the face of object that defines a DESTROY method. Anno Siegel explained that everything should be fine during the normal course of events, but wasn't too sure whether global destruction (what happens when the interpreter is being taken down just before the perl process expires) changes the rules so much that this no longer holds true.

Joshua spelt out some of the potential problem scenarios he envisaged, and how best to deal with them. Anno thought that the proposed work-arounds were all rather bletcherous, and promised to think about it to see whether he could come up with a good solution.

The underlying problem is that inside-out objects reveal an aspect of global destruction that was never apparent before, and if you're really unlucky, it could bite your code.

  Unintended consequences

Confusing SvCUR and SvLEN documentation

Yves Orton read the documentation concerning these two functions and had difficulty in reconciling it with what Devel::Peek was telling him. Jan thought it made sense to him, and so he wondered what it was that Yves didn't understand. There ensued a long conversation between Jan and Yves that discussed the matter in more detail.

Yves finally understood the issue, and suggested that Jan write up some documentation to clarify the matter. Jan hoped that someone else could summarise the issue instead (who, me?) and Yves said that he would take a stab at it, and allow Jan to make fun of it until it all made sense.

  String theory

CPAN contributor feedback

Responding to the itchy thread from last week, David Muir Sharnoff explained that his number one gripe with the core was its less than complete support for tied arrays, magic on tied variables, and the lack of a GETREF method for ties. All of which would no doubt make his life easier.

  Not to mention closing a bug or two

Code that is only executed once

David Nicol proposed adding a 'once' statement modifier, that, after having been executed, would then delve into the optree and excise the statement out of the optree by rethreading the prior opcode to the next opcode.

Rafael agreed that Perl 6 has a similar construct, but that there were no plans to add it to 5.10. He was happy that state variables resolved the ugly my $x if 0 hack, and thought that UNITCHECK, from the perltodo list was more worthy of burning up some tuits.

  COBOL has an ALTER verb, after all

Removing the succeed path from the super-linear cache code

This is an optimisation in the regular expression engine that, as far as Dave Mitchell can tell, is never exercised by the test suite, and, try as he might, was unable to concoct any example that could. So after talking things over with Hugo van der Sanden, he decided to fire up the chain-saw.

So if your fancy patterns now starting taking longer than the scheduled moment of the heat death of the universe, you know who to blame. Hugo had hoped that some sort of instrumentation under a debugging build might be added to the engine to allow the measurements of different optimisation strategies. Dave wasn't planning to, although he figured it would probably be doable, even if he didn't know right now what shape or form it would take.

  Look for a patch tomorrow morning

Comments on recent API changes to threads

Artur Bergman, one of the original authors of the threads module, discovered all the work that had been done recently, and objected to the change that allowed exit to end a thread silently (and not the entire process), saying that it blurred the distinction between threads and processes.

He also spotted a section in the code that dealt with passing signals to threads, noting that it was not thread-safe. Jerry wasn't sure whether this was a true, and asked for a patch to help him understand the situation.

Jerry D. Hedden regretted that some of the design decisions that Artur mentioned, as a way of explaining why things were the way they were, were not mentioned in the documentation.

Jan defended the design decision to allow a nested function to terminate the thread, and not have to percolate the desire back up the call stack to the main thread function.

Liz Mattijsen voiced the opinion that it was important for a thread encountering some naughty library code, that contained evil exit somewhere, to not be able to take out the whole application.

Artur explained that a technique already existed for dealing with this situation.

  Picking up the thread

Threads signalling issue

Jerry D. Hedden was sufficiently worried by Artur's claim that the signal dispatching code was not safe, that he broke the discussion out into a separate thread. He stated that as far as he could see, the code was thread-safe. Jan Dubois confirmed that he was certain that it was fine, as well.

Artur described the problem he envisaged in more detail, which allowed Jan to point out where the argument was flawed, since the code already took the situation Artur outlined into account.

And so now Artur owes Jan a beer.

Nicholas Clark pointed out that even though the above code is race-safe on a single-processor box, on an SMP machine it might be possible to have a race, and he showed where it could happen. The jury is still out on this matter. has poor version compatibility

Kari Pahula was shuffling binary images between two computers using RPC::PlServer and RPC::PlClient, with Storable acting as a serialising mechanism, and shipping the result over the wire.

Things started to fall apart when the two computers in question were running different versions of Storable (The 2.6 receiver versus the 2.7 sender), which rendered the received data essentially worthless. Kari wished that Storable had a mechanism to specify that version N should produce a data stream compatible with version N-x, for some small value of x. That is, Kari would quite happy to go back to his 2.7 server, and explicitly add a make this 2.6 compatible flag, except that this is currently not possible.

Patches of Interest

Test scripts for I18N::Langinfo and POSIX

Last week, Sébastien Aperghis-Tramoni wrote some tests to improve the coverage of these two modules. In doing so, he discovered a couple of bugs: fields documented as existing, in fact, don't. Steve Peters had problems with some of the test results, and applied a partial patch. Craig Berry looked at some of the skipped tests this week, and made a few more changes.

Here are the changes, concerning missing POSIX functionality, that Craig realised was being dealt with appropriately.

Sébastien came back later on with a way to improve POSIX::localeconv()'s usefulness, by defining all keys in the underlying hash, and making the undefined ones point to an empty string. This would make the test suite easier to manage, and would lead to less make-work code being required in client code.

perlhack: reflections on portability

Jarkko Hietaniemi is undergoing therapy to recover from doing combat for far too long with C compilers that conspire to make the perl codebase such a fun place to hang out.

  The tao of portability

And another batch of things to watch out for.

Making sense of the z/OS patches

Jarkko then started picking away at the z/OS patches that had recently been sent to the list from IBM, to see what he could salvage. First up was some low-hanging fruit in the shape of hints/ and Makefile.SH.

And then a number of EBCDIC fixes in the test suite.

He carried on with some more library files and test. Stephen McCamant wondered whether there wasn't an elegant way to factorise the EBCDIC/ASCII differences so that less code wound up being duplicated.

And something to do with main that H.Merijn was unable to apply, despite his best intentions.

Taking that, and Stephen's, remarks into account, Jarkko brewed a fresh patch.

At the end of the week, Jarkko was able to tease out a fix for MIME::Base64's quoted-print.t test. He wasn't precisely sure how EBCDIC and quoted-printable are supposed to work together at the best of times, but sent the patch on its way.


Various additions to cflags.SH and sundry files

Jarkko discovered with delight that recent versions of gcc have a -std to specify to which C standard the source code should be held against. So he taught cflags.SH to specify that C89 is the only game in town.

  // is not a comment

And in the interest of perfection, Jarkko also ensured that a number of recent additions to the code base were cited in cflags.SH.

In a grab-bag of assorted fixes, Jarkko patched dump.c, pp_sys.c, sv.c and util.c and snuck in a few more additions to perlhack.

In a last glorious attack on perlhack, Jarkko wrote about the Configure -Dgccansipedantic switch, that forces the gcc compiler to become extremely fussy about what it considers acceptable practices in code.

  Squeaky clean

Updated escapes and debug output improvements

Yves Orton landed a fairly hefty patch of tweakery, in part to improve the escaping code, making it Unicode aware. The second part was to clean up the way debugging output is displayed by the regexp engine. The latter work was not yet complete, but Yves felt it was time to commit.

Dave Mitchell applied the changes, and noted a number of compiler warnings about char */U8 differences, and mentioned that the debug output is currently a bit of mess, and the direction in which he wished to push it.

Yves explained what he was tinkering with at the moment, and asked for guidance on dealing with the variables used in the code. He thought that the de-recursion work that Dave had put into the engine offered a suitable place for squirrelling away new variables, and thought that the current way of dealing with variables was weird.

Dave Mitchell explained that the current state of affairs can be traced back to the heritage of Henry Spencer's original regular expression library, twisted beyond belief to work in a multi-interpreter, multi-threaded environment.

  *sounds of hammers, saws and arc-welders in the basement*

Yves fixed up a number of warnings that the new code issued, in a subsequent patch.

Watching the smoke signals

Smoke [5.9.4] 28549 FAIL(F) linux 2.6.12-9-powerpc [debian] (ppc/1 cpu)

This failure, Rafael Garcia-Suarez believed, was due to the recent work performed on dump.c whose aim was to improve the output by escaping "invisible" characters. Yves promised to take a look (which happened just above).

New and old bugs from RT

5.7.2 and perlPod::Text::Overstrike (#7959)

Whatever the problem was, at some point in the past four years, Steve Peters discovered that it had been fixed.


localtime(3) calls tzset(3), but localtime_r(3) may not. (#26136)

Jason Vas Dias posted an patch to fix a bug for Perl on Fedora, following on from a revival of the patch by Benjamin Holzman. Benjamin followed up on Jason's work and tweaked the patch to make it suitable for both maint and blead.

  It's about time

defined-ness of substrings disappear over repeated calls (#39247)

A few words of wisdom from Yitzchak Scott-Thoennes on the matter of XS code dealing with magic.

Bug in system calls when %ENV is very large (#39547)

Steve Hay traced this problem down to the way perl gets compiled on the Win32 platform and thought that as things stand, the restriction appears to be an intrinsic limitation of the platform, and doubted it could be solved.

Randy W. Sims forwarded the message to Microsoft, who confirmed that the environment block for a process is indeed limited to just shy of 32 kilobytes.

  Their bug, not ours

gcc 3.3 has problems with __attribute__((unused)) (#39634)

Andy Dougherty tweaked the setting of the HAS_ATTRIBUTE define in a way that he felt was the clearest to understand. Rafael applied it.

filetest problem with STDIN/OUT on Windows (#39637)

Steve Hay wrote some more C code to analyse this problem and found that indeed, the -r operator returns false for STDIN. Trying it out with various C compilers led him to conclude that this is also some sort of Win32 limitation. Steve and Yves kicked around a few ideas, but at the end of the week it looks like it will wind up documented in perlport as something that doesn't work.

  The tailor's face and hands

S_regmatch produces a stack overflow (#39774)

A pattern that results in a stack overflow was reported. Steve Peters explained that in blead, the overflow no longer exists, although an error message described another sort of exhaustion is emitted instead.

Yves Orton explained how the pattern was one of the sort that tend to give NFAs such as perl's regexp engine fits, as it forces backtracking information to be stashed away at every single point in the string. He went on to show how the pattern could be rewritten in a much more efficient manner.

He then expressed hope that some day, the optimiser might be able to compose the transform all by itself, but reasoned that it might be difficult.

  When he says it's not easy

Cygwin Perl bug -- pod2usage(-verbose => 0) and pod2usage(-verbose => 1) (#39775)

David Christensen wanted to know what the status was on a patch that fixes a problem with pod2usage.

Switch module bug (#39789)

Artyom Tseitlin, poor chap, was bitten by too-clever-by-half magic of Text::Balanced (upon which Switch relies). Adriano Ferreira posted a semi-evil hack that would allow Artyom to continue to use Switch.

  Divide and conquer

Unable to build Perl under Irix with -Duseshrplib (#39797)

Philippe Schaffnit was going mad trying to discover what his mistake was in trying to build Perl on Irix. (no, using Irix is not an acceptable answer). The problem sounded familiar to Andy Dougherty, who suggested an approach to narrow things down. At the end of the week we still had not heard back from Philippe.

Mortality of objects (e.g. %$_) passed as args (#39800)

Vaclav Ovsik ran afoul of the absence of refcounting to @_ and the perl stack.

  Don't do that then

bleadperl -Dm -e1 segfaults on win32 (#39806)

Rafael made an educated guess that a thread was trying to print something using a data structure that had already been freed...

eval and hash access in subroutine (#39816)

Thomas Ziehmer composed a clever little ditty that shows how he was forced to make a bogus %hash = %hash assignment to make things work correctly. I shall have commented on this next week, since nobody did this week.

  Which was last week, anyway

PERL5SHELL is not checked for tainted data (#39832)

Paul Fenwick showed how an evil hacker could make Perl inadvertently launch Notepad through a system call, even in taint mode, since PERL5SHELL is not subject to the usual tainting strictures.

Rafael took a stab at closing the hole, and asked for someone with access to a Windows machine to validate the fix. Yves confirmed that the problem was indeed resolved. Dr. Ruud suggested applying similar checks upon $ENV{SystemRoot} and $ENV{windir}.

Rafael wanted to know what purpose those two variables served.

  Could have been worse

Update perlipc.pod TCP server example with respect to safe signals and accept() (#39835)

Andy Wardley discovered that the introduction of Safe Signals, from 5.7.3 onwards had rendered the TCP server examples in perlipc obsolete. So he updated the POD, in the hope that some kind soul would apply it to blead (and even maint) so that others may avoid running into the same problem.

Script with threads hang when there are nested subthreads (#39839)

Erland Sommarskog discovered that nested threads don't.

Rafael Garcia-Suarez noticed that a Data::Bind compilation failure could be fixed by diddling with how Carp was used, and therefore wondered if something hadn't been broken somewhere. The symptoms reminded David Landgren that he had encountered similar problems testing modules under blead.

Perl5 Bug Summary

An internal server error prevented the bug summary from being posted this week, but I assume we are somewhere around 1500.

  Check 'em out

New Core Modules

In Brief

Marcus Holland-Moritz commented on the state variables progress, saying that if the making iffy constructs results in a slowdown in the general case, then those constructs should simply be outlawed as errors.

  But the slowdown is only wafer thin

David Nicol suggested that INIT blocks could be used to initialise state variables, and proposed a suitable error message to be used when indicating that a state variable was not going to be initialised as expected.

Ron Savage followed up on the followup from Jan Dubois regarding Oracle on ActiveState, noting that he was able to install DBD::Oracle, but unable to do anything with it. Jan suggested a few things to try out, and outlined what ActiveState would have to do to get it to work in all cases.

Steve Hay noted that a recent version of ext/threads doesn't build with Borland on Win32, due to differences in header files. Jerry D. Hedden found an appropriate #define upon which to hang the missing definitions.

And Jerry delivered the patch, applied by Rafael.

He then bumped up the version with a fix to stop signalling terminated threads from dumping core.

Dr. Ruud wondered whether it was worth chasing down 0 and replacing it with NULL, in pp.c to begin with.

statefulness is a property of the code, not the variable, sayeth Dave Mitchell.

Steve Hay encountered a strange problem with perly.h on Win32, that he traced tentatively back to the switch-over from byacc to bison. A tweak to fixed the problem. So Steve applied the fix, but regretted not being able to understand why the error was occurring.

  Action and reaction

Jarkko tweaked hv.c and sv.c for -DPERL_GLOBAL_STRUCT_PRIVATE and expressed disgust at the very existence of a global variable named done_sanity_check. After having stepped back to look at his handiwork, he realised that recent work to bring DynaLoader.o into libperl has rendered the Symbian port of Perl unusable.

Steve Hay discovered that mod_perl 1.x and apache 1.x do not like blead, possibly due to the way constants have been rejigged.

Sadahiro Tomoyuki produced a patch to update comments in scan_const, since much of it remains as it did at change #780 in the repository, which, if the records are correct, is roughly about the same time that the porters stopped using punched cards.

He also edited perlop to clarify the parsing of quoted constructs, so that people could apply some scientific method rather than just guessing to see how things behaved.

Yves Orton silenced a system warning during tests, although was only able to do so for Win32, although he thought that it should be possible to tweak it to work in a cross-platform manner.

About this summary

This summary was written by David Landgren.

Last week's summary, here:

elicited a response from H.Merijn who was concerned with some smoke tests throwing out black smoke that was not reproducible with a test run by hand.

Yves Orton also mentioned that he had a documentation patch for the heredoc/escaping muddle that was in limbo, and was wondering whether it needed improvement or was just pushed down on the stack of things to apply.

If you want a bookmarklet approach to viewing bugs and change reports, there are a couple of bookmarklets that you might find useful on my page of Perl stuff:

Weekly summaries are published on and posted on a mailing list, (subscription: The archive is at Corrections and comments are welcome.

If you found this summary useful, please consider contributing to the Perl Foundation to help support the development of Perl.