| perl6 Apocalypse 6: Subroutines |
Apocalypse 6: Subroutines
Larry Wall <larry@wall.org>
Maintainer: Larry Wall <larry@wall.org> Date: 7 Mar 2003 Last Modified: 25 May 2006 Number: 6 Version: 6
This is the Apocalypse on Subroutines. In Perl culture the term "subroutine" conveys the general notion of calling something that returns control automatically when it's done. This "something" that you're calling may go by a more specialized name such as "procedure", "function", "closure", or "method". In Perl 5, all such subroutines were declared using the keyword sub regardless of their specialty. For readability, Perl 6 will use alternate keywords to declare special subroutines, but they're still essentially the same thing underneath. Insofar as they all behave similarly, this Apocalypse will have something to say about them. (And if we also leak a few secrets about how method calls work, that will make Apocalypse 12 all the easier--presuming we don't have to un-invent anything between now and then...)
Here are the RFCs covered in this Apocalypse. PSA stands for "problem, solution, acceptance", my private rating of how this RFC will fit into Perl 6. I note that none of the RFCs achieved unreserved acceptance this time around. Maybe I'm getting picky in my old age. Or maybe I just can't incorporate anything into Perl without "marking" it...
RFC PSA Title
--- --- -----
21 abc Subroutines: Replace C<wantarray> with a generic C<want> function
23 bcc Higher order functions
57 abb Subroutine prototypes and parameters
59 bcr Proposal to utilize C<*> as the prefix to magic subroutines
75 dcr structures and interface definitions
107 adr lvalue subs should receive the rvalue as an argument
118 rrr lvalue subs: parameters, explicit assignment, and wantarray() changes
128 acc Subroutines: Extend subroutine contexts to include name parameters and lazy arguments
132 acr Subroutines should be able to return an lvalue
149 adr Lvalue subroutines: implicit and explicit assignment
154 bdr Simple assignment lvalue subs should be on by default
160 acc Function-call named parameters (with compiler optimizations)
168 abb Built-in functions should be functions
176 bbb subroutine / generic entity documentation
194 acc Standardise Function Pre- and Post-Handling
271 abc Subroutines : Pre- and post- handlers for subroutines
298 cbc Make subroutines' prototypes accessible from Perl
334 abb Perl should allow specially attributed subs to be called as C functions
344 acb Elements of @_ should be read-only by default
In Apocalypses 1 through 4, I used the RFCs as a springboard for discussion. In Apocalypse 5 I was forced by the complexity of the redesign to switch strategies and present the RFCs after a discussion of all the issues involved. That was so well received that I'll try to follow the same approach with this and subsequent Apocalypses.
But this Apocalypse is not trying to be as radical as the one on regexes. Well, okay, it is, and it isn't. Alright, it is radical, but you'll like it anyway (we hope). At least the old way of calling subroutines still works. Unlike regexes, Perl subroutines don't have a lot of historical cruft to get rid of. In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle. Perl 6 will let you migrate the cruft out of the user-defined code and back into the declarations where it belongs. Then you will think it to be very beautiful cruft indeed (we hope).
Perl 5's subroutines have a number of issues that need to be dealt with. First of all, they're just awfully slow, for various reasons:
@_ arrayQuite apart from performance, however, there are a number of problems with usability:
In general, the consensus is that Perl 5's simple subroutine syntax is just a little too simple. Well, okay, it's a lot too simple. While it's extremely orthogonal to always pass all arguments as a single variadic array, that mechanism does not always map well onto the problem space. So in Perl 6, subroutine syntax has blossomed in several directions.
But the most important thing to note is that we haven't actually added a lot of syntax. We've added some, but most of new capabilities come in through the generalized trait/property system, and the new type system. But in those cases where specialized syntax buys us clarity, we have not hesitated to add it. (Er, actually, we hesitated quite a lot. Months, in fact.)
One obvious difference is that the sub on closures is now optional, since every brace-delimited block is now essentially a closure. You can still put the sub if you like. But it is only required if the block would otherwise be construed as a hash value; that is, if it appears to contain a list of pairs. You can force any block to be considered a subroutine with the sub keyword; likewise you can force any block to be considered a hash value with the hash keyword. But in general Perl just dwims based on whether the top-level is a list that happens to have a first argument that is a pair or hash:
Block Meaning
----- -------
{ 1 => 2 } hash { 1 => 2 }
{ 1 => 2, 3 => 4 } hash { 1 => 2, 3 => 4 }
{ 1 => 2, 3, 4 } hash { 1 => 2, 3 => 4 }
{ %foo, 1 => 2 } hash { %foo.pairs, 1 => 2 }
[Update: hash has been demoted to a list operator, actually.]
Anything else that is not a list, or does not start with a pair or hash, indicates a subroutine:
{ 1 } sub { return 1 }
{ 1, 2 } sub { return 1, 2 }
{ 1, 2, 3 } sub { return 1, 2, 3 }
{ 1, 2, 3 => 4 } sub { return 1, 2, 3 => 4 }
{ pair 1,2,3,4 } sub { return 1 => 2, 3 => 4 }
{ gethash() } sub { return gethash() }
This is a syntactic distinction, not a semantic one. That last two examples are taken to be subs despite containing functions returning pairs or hashes. Note that it would save no typing to recognize the pair method specially, since hash automatically does pairing of non-pairs. So we distinguish these:
{ pair 1,2,3,4 } sub { return 1 => 2, 3 => 4 }
hash { 1,2,3,4 } hash { 1 => 2, 3 => 4 }
If you're worried about the compiler making bad choices before deciding whether it's a subroutine or hash, you shouldn't. The two constructs really aren't all that far apart. The hash keyword could in fact be considered a function that takes as its first argument a closure returning a hash value list. So the compiler might just compile the block as a closure in either case, then do the obvious optimization.
Although we say the sub keyword is now optional on a closure, the return keyword only works with an explicit sub. (There are other ways to return values from a block.)
[Update: This is slightly inaccurate; return works from any Routine. See below.]
You may still declare a sub just as you did in Perl 5, in which case it behaves much like it did in Perl 5. To wit, the arguments still come in via the @_ array. When you say:
sub foo { print @_ }
that is just syntactic sugar for this:
sub foo (*@_) { print @_ }
That is, Perl 6 will supply a default parameter signature (the precise meaning of which will be explained below) that makes the subroutine behave much as a Perl 5 programmer would expect, with all the arguments in @_. It is not exactly the same, however. You may not modify the arguments via @_ without declaring explicitly that you want to do so. So in the rare cases that you want to do that, you'll have to supply the rw trait (meaning the arguments should be considered "read-write"):
sub swap (*@_ is rw) { @_[0,1] = @_[1,0] };
The Perl5-to-Perl6 translator will try to catch those cases and add the parameter signature for you when you want to modify the arguments. (Note: we will try to be consistent about using "arguments" to mean the actual values you pass to the function when you call it, and "parameters" to mean the list of lexical variables declared as part of the subroutine signature, through which you access the values that were passed to the subroutine.)
Perl 5 has rudimentary prototypes, but Perl 6 type signatures can be much more expressive if you want them to be. The entire declaration is much more flexible. Not only can you declare types and names of individual parameters, you can add various traits to the parameters, such as rw above. You can add traits to the subroutine itself, and declare the return type. In fact, at some level or other, the subroutine's signature and return type are also just traits. You might even consider the body of the subroutine to be a trait.
For those of you who have been following Perl 6 development, you'll wonder why we're now calling these "traits" rather than "properties". They're all really still properties under the hood, but we're trying to distinguish those properties that are expected to be set on containers at compile time from those that are expected to be set on values at run time. So compile-time properties are now called "traits". Basically, if you declare it with is, it's a trait, and if you add it onto a value with but, it's a property. The main reason for making the distinction is to keep the concepts straight in people's minds, but it also has the nice benefit of telling the optimizer which properties are subject to change, and which ones aren't.
A given trait may or may not be implemented as a method on the underlying container object. You're not supposed to care.
[Update: Actually, they're done as mixins if the container type doesn't already support the role. See A12.]
There are actually several syntactic forms of trait:
rule trait :w {
is <ident>[\( <traitparam> \)]?
| will <ident> <closure>
| of <type>
| returns <type>
}
[Update: the :w is no longer needed on a rule.]
(We're specifying the syntax here using Perl 6 regexes. If you don't know about those, go back and read Apocalypse 5.)
A <type> is actually allowed to be a junction of types:
sub foo returns Int|Str {...}
The will syntax specifically introduces a closure trait without requiring the extra parens that is would. Saying:
will flapdoodle { flap() and doodle() }
is exactly equivalent to:
is flapdoodle({ flap() and doodle() })
but reads a little better. More typically you'll see traits like:
will first { setup() }
will last { teardown() }
The final block of a subroutine declaration is the "do" trait. Saying:
sub foo { ... }
is like saying:
sub foo will do { ... }
Note however that the closure eventually stored under the do trait may in fact be modified in various ways to reflect argument processing, exception handling, and such.
We'll discuss the of and returns traits later when we discuss types. Back to syntax.
sub formA subroutine can be declared as lexically scoped, package scoped, or unscoped:
rule lexicalsub :w {
<lexscope> <type>?
<subintro> <subname> <psignature>?
<trait>*
<block>
}
rule packagesub :w {
<subintro> <subname> <psignature>?
<trait>*
<block>
}
rule anonsub :w {
<subintro> <psignature>?
<trait>*
<block>
}
The non-lexically scoped declaration cannot specify a return type in front. The return type can only be specified as a trait in that case.
[Update: These days the return type may be specified as part of the signature after a -->. It's also possible to use a declarator in front of a declaration that introduces no name. And again, the :w is no longer used in rules.]
As in Perl 5, the difference between a package sub and an anonymous sub depends on whether you specify the <subname>. If omitted, the declaration (which is not really a declaration in that case) generates and returns a closure. (Which may not really be a closure if it doesn't access any external lexicals, but we call them all closures anyway just in case...)
A lexical subroutine is declared using either my or our:
rule lexscope { my | our }
This list doesn't include temp or let because those are not declarators of lexical scope but rather operators that initiate dynamic scoping. See the section below on Lvalue subroutines for more about temp and let.
In both lexical and package declarations, the name of the subroutine is introduced by the keyword sub, or one of its variants:
rule subintro { sub | method | submethod | multi | rule | macro }
A method participates in inheritance and always has an invocant (object or class). A submethod has an invocant but does not participate in inheritance. It's a sub pretending to be a method for the current class only. A multi is a multimethod, that is, a method that is called like a subroutine or operator, but is dispatched based on the types of one or more of its arguments.
[Update: These days multi is just a modifier on sub or method, but if you say multi the sub may be omitted since it's the default.]
Another variant is the regex rule, which is really a special kind of method; but in actuality rules probably get their own set of parse rules, since the body of a rule is a regex. I just put "rule" into <subintro> as a placeholder of sorts, because I'm lazy.
[Update: Rules are now split up into regex, token, and rule declarations. These differ in backtracking policy and whitespace matching. See S05.]
A macro is a subroutine that is called immediately upon completion of parsing. It has a default means of parsing arguments, or it may be bound to an alternate grammar rule to parse its arguments however you like.
These syntactic forms correspond the various Routine types in the Code type hierarchy:
Code
____________|________________
| |
Routine Block
________________|_______________ __|___
| | | | | | | |
Sub Method Submethod Multi Rule Macro Bare Parametric
The Routine/Block distinction is fairly important, since you always return out of the current Routine, that is, the current Sub, Method, Submethod, Multi, Rule, or Macro. Also, the &_ variable refers to your current Routine. A Block, whether Bare or Parametric, is invisible to both of those notions.
(It's not yet clear whether the Bare vs Parametric distinction is useful. Some apparently Bare blocks are actually Parametric if they refer to $_ internally, even implicitly. And a Bare block is just a Parametric block with a signature of (). More later.)
[Update: There's no longer a Bare/Parametric distinction.]
A <psignature> is a parenthesized signature:
rule psignature :w { \( <signature> \) }
And there is a variant that doesn't declare names:
rule psiglet :w { \( <siglet> \) }
(We'll discuss "siglets" later in their own section.)
It's possible to declare a subroutine in an lvalue or a signature as if it were an ordinary variable, in anticipation of binding the symbol to an actual subroutine later. Note this only works with an explicit name, since the whole point of declaring it in the first place is to have a name for it. On the other hand, the formal subroutine's parameters aren't named, hence they are specified by a <psiglet> rather than a <psignature>:
rule scopedsubvar :w {
<lexscope> <type>? &<subname> <psiglet>? <trait>*
}
rule unscopedsubvar :w {
&<subname> <psiglet>? <trait>*
}
If no <psiglet> is supplied for such a declaration, it just uses whatever the signature of the bound routine is. So instead of:
my sub foo (*@_) { print @_ }
you could equivalently say:
my &foo ::= sub (*@_) { print @_ };
(You may recall that ::= does binding at compile time. Then again, you may not.)
If there is a <psiglet>, however, it must be compatible with the signature of the routine that is bound to it:
my &moo(Cow) ::= sub (Horse $x) { $x.neigh }; # ERROR
"Pointy subs" declare a closure with an unparenthesized signature:
rule pointysub :w {
-\> <signature> <block>
}
They may not take traits.
A bare block generates a closure:
rule baresub :w {
<block> { .find_placeholders() }
}
A bare block declaration does not take traits (externally, anyway), and if there are any parameters, they must be specified with placeholder variables. If no placeholders are used, $_ may be treated as a placeholder variable, provided the surrounding control structure passes an argument to the the closure. Otherwise, $_ is bound as an ordinary lexical variable to the outer $_. ($_ is also an ordinary lexical variable when explicit placeholders are used.)
More on parameters below. But before we talk about parameters, we need to talk about types.
Well, what are types, anyway? Though known as a "typeless" language, Perl actually supports several built-in container types such as scalar, array, and hash, as well as user-defined, dynamically typed objects via bless.
Perl 6 will certainly support more types. These include some low-level storage types:
bit int str num ref bool
as well as some high-level object types:
Bit Int Str Num Ref Bool
Array Hash Code IO
Routine Sub Method Submethod Macro Rule
Block Bare Parametric
Package Module Class Object Grammar
List Lazy Eager
(These lists should not be construed as exhaustive.) We'll also need some way of at least hinting at representations to the compiler, so we may also end up with types like these:
int8 int16 int32 int64
uint8 uint16 uint32 uint64
Or maybe those are just extra size traits on a declaration somewhere. That's not important at this point.
The important thing is that we're adding a generalized type system to Perl. Let us begin by admitting that it is the height of madness to add a type system to a language that is well-loved for being typeless.
But mad or not, there are some good reasons to do just that. First, it makes it possible to write interfaces to other languages in Perl. Second, it gives the optimizer more information to think about. Third, it allows the S&M folks to inflict strongly typed compile-time semantics on each other. (Which is fine, as long as they don't inflict those semantics on the rest of us.) Fourth, a type system can be viewed as a pattern matching system for multi-method dispatch.
Which basically boils down to the notion that it's fine for Perl to have a type system as long as it's optional. It's just another area where Perl 6 will try to have its cake and eat it too.
This should not actually come as a surprise to anyone who has been following the development of Perl 5, since the grammatical slot for declaring a variable's effective type has been defined for some time now. In Perl 5 you can say:
my Cat $felix;
to declare a variable intended to hold a Cat object. That's nice, as far as it goes. Perl 6 will support the same syntax, but we'll have to push it much further than that if we're to have a type system that is good enough to specify interfaces to languages like C++ or Java. In particular, we have to be able to specify the types of composite objects such as arrays and hashes without resorting to class definitions, which are rather heavyweight--not to mention opaque. We need to be able to specify the types of individual function and method parameters and return values. Taken collectively, these parameter types can form the signature of a subroutine, which is one of the traits of the subroutine.
And of course, all this has to be intuitively obvious to the naive user.
Yeah, sure, you say.
Well, let's see how far we can get with it. If the type system is too klunky for some particular use, people will simply avoid using it. Which is fine--that's why it's optional.
First, let's clarify one thing that seems to confuse people frequently. Unlike some languages, Perl makes a distinction between the type of the variable, and the type of the value. In Perl 5, this shows up as the difference between overloading and tying. You overload the value, but you tie the variable. When you say:
my Cat $felix;
you are specifying the type of the value being stored, not the type of the variable doing the storing. That is, $felix must contain a reference to a Cat value, or something that "isa" Cat. The variable type in this case is just a simple scalar, though that can be changed by tying the variable to some class implementing the scalar variable operations.
In Perl 6, the type of the variable is just one of the traits of the variable, so if you want to do the equivalent of a tie to the Box class, you say something like:
my Cat $felix is Box;
That declares your intent to store a Cat value into a Box variable. (Whether the cat will then dead or alive (or dead|alive) depends on the definition of the Box class, and whether the Box object's side effects extend to the Cat value stored in it.)
But by default:
my Cat $felix;
just means something like:
my Cat $felix is Scalar;
Likewise, if you say:
my Cat @litter;
it's like saying:
my Cat @litter is Array;
That is, @litter is an ordinary array of scalar values that happen to be references to Cats. In the abstract, @litter is a function that maps integers to cats.
Likewise,
my Cat %pet;
is like:
my Cat %pet is Hash;
You can think of the %pet hash as a function that maps cat names (strings) to cats. Of course, that's an oversimplification--for both arrays and hashes, subscripting is not the only operation. But it's the fundamental operation, so the declared type of the returned value reflects the return value of such a subscripted call.
Actually, it's not necessarily the return type. It's merely a type that is consistent with the returned type. It would be better to declare:
my Animal %pet;
and then you could return a Cat or a Dog or a Sponge, presuming all those are derived from Animal. You'd have to generalize it a bit further if you want to store your pet Rock. In the limit, you can just leave the type out. When you say:
my %pet;
you're really just saying:
my Object %pet is Hash;
...except that you're not. We have to push it further than that, because we have to handle more complicated structures as well. When you say:
my Cat @litter is Array;
it's really shorthand for:
my @litter is Array of Cat;
That is, "Cat" is really a funny parameter that says what kind of Array you have. If you like, you could even write it like this:
my @litter is Array(returns => Cat)
Likewise you might write:
my %pet is Hash(keytype => Str, returns => Cat)
and specify the key type of the hash. The "of" keyword is just syntactic sugar for specifying the return type of the previous storage class. So we could have
my %pet is Hash of Array of Array of Hash of Array of Cat;
which might really mean:
my %pet is Hash(keytype => Str,
returns => Array(
returns => Array(
returns => Hash(
keytype => Str,
returns => Array(
returns => Cat)))))
or some such.
[Update: Type parameters are now written with square brackets, and we now distinguish the of property from the returns property.]
I suppose you could also write that as:
my Array of Array of Hash of Array of Cat %pet;
but for linguistic reasons it's probably better to keep the variable name near the left and put the long, heavy phrases to the right. (People tend to prefer to say the short parts of their sentences before the long parts--linguists call this the "end-weight" problem.) The Hash is implied by the %pet, so you could leave out the "is" part and just say:
my %pet of Array of Array of Hash of Array of Cat;
Another possibility is:
my Cat %pet is Hash of Array of Array of Hash of Array;
That one reads kinda funny if you leave out the "is Hash", though. Nevertheless, it says that we have this funny data structure that has multiple parameters that you can view as a funny function returning Cat. In fact, "returns" is a synonym for "of". This is also legal:
my @litter returns Cat;
[Update: Not any more.]
But the "returns" keyword is mostly for use by functions:
my Cat sub find_cat($name) {...}
is the same as:
my sub find_cat($name) returns Cat {...}
This is more important for things like closures that have no "my" on the front:
$closure = sub ($name) returns Cat {...}
Though for the closure case, it's possible we could define some kind of non-my article to introduce a type unambiguously:
$closure = a Camel sub ($name) {...}
$closure = an Aardvark sub () {...}
Presumably "a" or "an" is short for "anonymous". Which is more or less what the indefinite article means in English.
[Update: You can just use my there and leave the name out.]
However, we need returns anyway in cases where the return value is complicated, so that you'd rather list it later (for end-weight reasons):
my sub next_prisoner() returns (Nationality, Name, Rank, SerialNo) {...}
Note that the return type is a signature much like the parameter types, though of course there are no formal parameter names on a return value. (Though there could be, I suppose.) We're calling such nameless signatures "siglets".
[Update: It turns out that the returns property just constrains the return type as seen by the routine, and not the return type seen by the rest of the world. Use the type prefix or arrow form to declare a return type that is seen by the rest of the world, and in particular any type inferencing engine. We call that the of type to distinguish it from the where type that is specified by the returns property.]
When you declare a subroutine, it can change how the rest of the current file (or string) is compiled. So there is some pressure to put subroutine declarations early. On the other hand, there are good reasons for putting subroutine definitions later in the file too, particularly when you have mutually recursive subroutines. Beyond that, the definition might not even be supplied until run time if you use some kind of autoloading mechanism. (We'll discuss autoloading in Apocalypse 10, Packages.) Perl 5 has long supported the notion of "forward" declarations or "stubs" via a syntax that looks like this:
sub optimal;
Perl 6 also supports stubbing, but instead you write it like this:
sub optimal {...}
That is, the stub is distinguished not by leaving the body of the function out, but by supplying a body that explicitly calls the "..." operator (known affectionately as the "yada, yada, yada" operator). This operator emits a warning if you actually try to execute it. (It can also be made to pitch an exception.) There is no warning for redefining a {...} body.
We're moving away from the semicolon syntax in order to be consistent with the distinction made by other declarations:
package Fee; # scope extends to end of file
package Fee { ... } # scope extends over block
module Fie; # scope extends to end of file
module Fie { ... } # scope extends over block
class Foe; # scope extends to end of file
class Foe { ... } # scope extends over block
To be consistent, a declaration like:
sub foo;
would therefore extend to the end of the file. But that would be confusing for historical reasons, so we disallow it instead, and you have to say:
sub foo {...}
[Update: Perl 6 also allows you to defer declaring your subroutine till later in the file, but only if the delayed declaration declares the sub to be parsed consistent with how a list operator would be parsed. Basically, any unrecognized "bareword" is assumed to be a provisional list operator that must be declared by the end of the current compilation unit.]
Perl 5 gives subroutine names two scopes. Perl 6 gives them four.
All named subs in Perl 5 have package scope. (The body provides a lexical scope, but we're not talking about that. We're talking about where the name of the subroutine is visible from.) Perl 6 provides by default a package-scoped name for "unscoped" declarations such as these:
sub fee {...}
method fie {...}
submethod foe {...}
multi foo {...}
macro sic {...}
Methods and submethods are ordinarily package scoped, because (just as in Perl 5) a class's namespace is kept in a package.
It's sort of cheating to call this a subroutine scope, because it's really more of a non-scope. Scope is a property of the name of a subroutine. Since closures and anonymous subs have no name, they naturally have no intrinsic scope of their own. Instead, they rely on the scope of whatever variable contains a reference to them. The only way to get a lexically scoped subroutine name in Perl 5 was by indirection:
my $subref = sub { dostuff(@_) }
&$subref(...)
But that doesn't actually give you a lexically scoped name that is equivalent to an ordinary subroutine's name. Hence, Perl 6 also provides...
You can declare "scoped" subroutines by explicitly putting a my or our on the front of the declaration:
my sub privatestuff { ... }
our sub semiprivatestuff { ... }
Both of these introduce a name into the current lexical scope, though in the case of our this is just an alias for a package subroutine of the same name. (As with other uses of our, you might want to introduce a lexical alias if your strictness level prohibits unqualified access to package subroutines.)
You can also declare lexically scoped macros:
my macro sic { ... }
Perl 6 also introduces the notion of completely global variables that are visible from everywhere they aren't overridden by the current package or lexical scope. Such variables are named with a leading * on the identifier, indicating that the package prefix is a wildcard, if you will. Since subroutines are just a funny kind of variable, you can also have global subs:
sub *print (*@list) { $*DEFOUT.print(@list) } }
In fact, that's more-or-less how some built-in functions like print could be implemented in Perl 6. (Methods like $*DEFOUT.print() are a different story, of course. They're defined off in a class somewhere. (Unless they're multimethods, in which case they could be defined almost anywhere, because multimethods are always globally scoped. (In fact, most built-ins including print will be multimethods, not subs. (But we're getting ahead of ourselves...))))
One of Perl's strong points has always been the blending of positional parameters with variadic parameters.
"Variadic" parameters are the ones that vary. They're the "...And The Rest" list of values that many functions--like print, map, and chomp--have at the end of their call. Whereas positional parameters generally tell a function how to do its job, variadic parameters are most often used to pass the arbitrary sequences of data the function is supposed to do its job on/with/to.
In Perl 5, when you unpack the arguments to a sub like so:
my ($a, $b, $c, @rest) = @_;
you are defining three positional parameters, followed by a variadic list. And if you give the sub a prototype of ($$$@) it will force the first three parameters to be evaluated in scalar context, while the remaining arguments are evaluated in list context.
The big problem with the Perl 5 solution is that the parameter binding is done at run time, which has run-time costs. It also means the metadata is not readily available outside the function body. We could just as easily have written it in some other form like:
my $a = shift;
my $b = shift;
my $c = shift;
and left the rest of the arguments in @_. Not only is this difficult for a compiler to analyze, but it's impossible to get the metadata from a stub declaration; you have to have the body defined already.
The old approach is very flexible, but the cost to the user is rather high.
Perl 6 still allows you to access the arguments via @_ if you like, but in general you'll want to hoist the metadata up into the declaration. Perl 6 still fully supports the distinction between positional and variadic data--you just have to declare them differently. In general, variadic items must follow positional items both in declaration and in invocation.
In turn, there are at least three kinds of positional parameters, and three kinds of variadic parameters. A declaration for all six kinds of parameter won't win a beauty contest, but might look like this:
method x ($me: $req, ?$opt, +$namedopt, *%named, *@list) {...}
[Update: that currently looks like this:
method x ($me: $req, $opt?, :$namedopt, *%named, *@list) {...}
]
Of course, you'd rarely write all of those in one declaration. Most declarations only use one or two of them. Or three or four... Or five or six...
There is some flexibility in how you pass some of these parameters, but the ordering of both formal parameters and actual arguments is constrained in several ways. For instance, positional parameters must precede non-positional, and required parameters must precede optional. Variadic lists must be attached either to the end of the positional list or the end of the named parameter list. These constraints serve a number of purposes:
Since there are constraints on the ordering of parameters, similar parameters tend to clump together into "zones". So we'll call the ?, +, and * symbols you see above "zone markers". The underlying metaphor really is very much like zoning regulations--you know, the ones where your city tells you what you may or may not do on a chunk of land you think you own. Each zone has a set of possible uses, and similar zones often have overlapping uses. But you're still in trouble if you put a factory in the middle of a housing division, just as you're in trouble if you pass a positional argument to a formal parameter that has no position.
I was originally going to go with a semicolon to separate required from optional parameters (as Perl 5 uses in its prototypes), but I realized that it would get lost in the traffic, visually speaking. It's better to have the zone markers line up, especially if you decide to repeat them in the vertical style:
method action ($self:
int $x,
int ?$y,
int ?$z,
Adverb +$how,
Beneficiary +$for,
Location +$at is copy,
Location +$toward is copy,
Location +$from is copy,
Reason +$why,
*%named,
*@list
) {...}
So optional parameters are all marked with zone markers.
In this section we'll be concentrating on the declaration's syntax rather than the call's syntax, though the two cannot be completely disintertwingled. The declaration syntax is actually the more complicated of the two for various good reasons, so don't get too discouraged just yet.
The three positional parameter types are the invocant, the required parameters, and the optional positional parameters. (Note that in general, positional parameters may also be called using named parameter notation, but they must be declared as positional parameters if you wish to have the option of calling them as positional parameters.) All positional parameters regardless of their type are considered scalars, and imply scalar context for the actual arguments. If you pass an array or hash to such a parameter, it will actually pass a reference to the array or hash, just as if you'd backslashed the actual argument.
The first argument to any method (or submethod) is its invocant, that is, the object or class upon which the method is acting. The invocant parameter, if present, is always declared with a colon following it. The invocant is optional in the sense that, if there's no colon, there's no explicit invocant declared. It's still there, and it must be passed by the caller, but it has no name, and merely sets the outer topic of the method. That is, the invocant's name is $_, at least until something overrides the current topic. (You can always get at the invocant with the self built-in, however. If you don't like "self", you can change it with a macro. See below.)
[Update: The topic is not set now unless you explicitly name the invocant "$_".]
Ordinary subs never have an invocant. If you want to declare a non-method subroutine that behaves as a method, you should declare a submethod instead.
Multimethods can have multiple invocants. A colon terminates the list of invocants, so if there is no colon, all parameters are considered invocants. Only invocants participate in multimethod dispatch. Only the first invocant is bound to $_.
[Update: Again, only if you actually name that parameter "$_".]
Macros are considered methods on the current parse state object, so they have an invocant.
[Update: The current parse state object is now called $?PARSER, so macros are now more like subroutines.]
Next (or first in the case of subs) come the required positional parameters. If, for instance, the routine declares three of these, you have to pass at least three arguments in the same order. The list of required parameters is terminated at the first optional parameter, that is the first parameter having any kind of zone marker. If none of those are found, all the parameters are required, and if you pass either too many or too few arguments, Perl will throw an exception as soon as it notices. (That might be at either compile time or run time.) If there are optional or variadic parameters, the required list merely serves as the minimum number of arguments you're allowed to pass.
Next come the optional positional parameters. (They have to come next because they're positional.) In the declaration, optional positional parameters are distinguished from required parameters by marking the optional parameters with a question mark. (The parameters are not distinguished in the call--you just use commas. We'll discuss call syntax later.) All optional positional parameters are marked with ?, not just the first one. Once you've made the transition to the optional parameter zone, all parameters are considered optional from there to the end of the signature, even after you switch zones to + or *. But once you leave the positional zone (at the end of the ? zone), you can't switch back to the positional zone, because positionals may not follow variadics.
If there are no variadic parameters following the optional parameters, the declaration establishes both a minimum and a maximum number of allowed arguments. And again, Perl will complain when it notices you violating either constraint. So the declaration:
sub *substr ($string, ?$offset, ?$length, ?$repl) {...}
says that substr can be called with anywhere from 1 to 4 scalar parameters.
[Update: Now written:
sub *substr ($string, $offset?, $length?, $repl?) {...}
because the ? is often replaced by a default.]
Following the positional parameters, three kinds of variadic parameters may be declared. Variadic arguments may be slurped into a hash or an array depending on whether they look like named arguments or not. "Slurpy" parameters are denoted by a unary * before the variable name, which indicates that an arbitrary number of values is expected for that variable.
Additional named parameters may be placed at the end of the declaration, or marked with a unary + (because they're "extra" parameters). Since they are--by definition--in the variadic region, they may only be passed as named arguments, never positionally. It is illegal to mark a parameter with ? after the first + or *, because you can't reenter a positional zone from a variadic zone.
Unlike the positional parameters, the variadic parameters are not necessarily declared in the same order as they will be passed in the call. They may be declared in any order (though the exact behavior of a slurpy array depends slightly on whether you declare it first or last).
[Update: The behavior of a slurpy array is the same now regardless of whether there are any named parameters in front of it. And named parameters are now marked with : rather than +.]
Parameters marked with a + zone marker are named-only parameters. Such a parameter may never be passed positionally, but only by name.
[Update: Now using : instead.]
A hash declaration like *%named indicates that the %named hash should slurp up all the remaining named arguments (that is, those that aren't bound explicitly to a specific formal parameter).
An array declaration like *@rest indicates that the @rest array should slurp up all the remaining items after the named parameters. (Later we'll discuss how to disambiguate the situation when the beginning of your list looks like named parameters.) If you shift or pop without an argument, it shifts or pops whatever slurpy array is in scope. (So in a sense, your main program has an implicit slurpy array of *@*ARGS because that's what shift shifts there.)
Formal parameters have lexical scope, as if they were declared with a my. (That is reflected in the pseudocode in Appendix B.) Their scope extends only to the end of the associated block. Formal parameters are the only lexically scoped variables that are allowed to be declared outside their blocks. (Ordinary my and our declarations are always scoped to their surrounding block.)
Any subroutine can have a method signature syntactically, but subsequent semantic analysis will reject mistakes like invocants on subroutines. This is not just motivated by laziness. I think that "You can't have an invocant on a subroutine" is a better error message than "Syntax error".
rule signature :w {
[<parameter> [<[,:]> <parameter> ]* ]?
}
In fact, we just treat colon as a funny comma here, so any use of extra colons is detected in semantic analysis. Similarly, zone markers are semantically restricted, not syntactically. Again, "Syntax error" doesn't tell you much. It's much more informative to see "You can't declare an optional positional parameter like ?$flag after a slurpy parameter like *@list", or "You can't use a zone marker on an invocant".
Here's what an individual parameter looks like:
rule parameter :w {
[ <type>? <zone>? <variable> <trait>* <defval>?
| \[ <signature> \] # treat single array ref as an arg list
]
}
rule zone {
[ \? # optional positional
| \* # slurpy array or hash
| \+ # optional named-only
]
}
rule variable { <sigil> <name> [ \( <siglet> \) ]? }
rule sigil { <[$@%&]> <[*.?^]>? } # "What is that, swearing?"
Likewise, we parse any sigil here, but semantically reject things like $*x or $?x. We also reject package-qualified names and indirect names. We could have a <simplevar> rule that only admits <ident>, but again, "Syntax error" is a lot less user-friendly than "You can't use a package variable as a parameter, dimwit!"
Similarly, the optional <siglet> in <variable> is allowed only on & parameters, to say what you expect the signature of the referenced subroutine to look like. We should talk about siglets.
The <siglet> in the <variable> rule is an example of a nameless signature, that is, a "small signature", or "siglet". Signatures without names are also used for return types and context traits (explained later). A siglet is sequential list of paramlets. The paramlets do not refer to actual variable names, nor do they take defaults:
rule siglet :w {
[<paramlet> [<[,:]> <paramlet> ]* ]?
}
rule paramlet :w {
[ <type> <zone>? <varlet>? <trait>* # require type
| <zone> <varlet>? <trait>* # or zone
| <varlet> <trait>* # or varlet
| \[ <siglet> \] # treat single array ref as an arg list
]
}
In place of a <variable>, there's a kind of stub we'll call a "varlet":
rule varlet :w {
<sigil> [ \( <siglet \) ]?
}
As with the <variable> rule, a <varlet>'s optional siglet is allowed only on & parameters.
Here's a fancy example with one signature and several siglets.
sub (int *@) imap ((int *@) &block(int $),
int *@vector is context(int) {...}
You're not expected to understand all of that yet. What you should notice, however, is that a paramlet is allowed to be reduced to a type (such as int), or a zone (such as ?), or a varlet (such as $), or some sequence of those (such as int *@). But it's not allowed to be reduced to a null string. A signature of () indicates zero arguments, not one argument that could be anything. Use ($) for that. Nor can you specify four arguments by saying (,,,). You have to put something there.
Perl 6 siglets can boil down to something very much like Perl 5's "prototype pills". However, you can't leave out the comma between parameters in Perl 6. So you have to say ($,$) rather than ($$), when you want to indicate a list of two scalars.
If you use a <siglet> instead of a <signature> in declaring a subroutine, it will be taken as a Perl 5 style prototype, and all args still come in via @_. This is a sop to the Perl5-to-Perl6 translator, which may not be able to figure out how to translate a prototype to a signature if you've done something strange with @_. You should not use this feature in new code. If you use a siglet on a stub declaration, you must use the same siglet on the corresponding definition as well, and vice versa. You can't mix siglets and signatures that way. (This is not a special rule, but a natural consequence of the signature matching rules.)
For closure parameters like &block(int $), the associated siglet is considered part of its name. This is true not just for parameters, but anywhere you use the & form in your program, because with multimethods there may be several routines sharing the same identifier, distinguishable only by their type signature:
multi factorial(int $a) { $a<=1 ?? 1 :: $a*factorial($a-1) }
multi factorial(num $a) { gamma(1+$a) }
$ref = &factorial; # illegal--too ambiguous
$ref = &factorial($); # illegal--too ambiguous
$ref = &factorial(int); # good, means first one.
$ref = &factorial(num); # good, means second one.
$ref = &factorial(complex); # bad, no such multimethod.
Note that when following a name like "&factorial", parentheses do not automatically mean to make a call to the subroutine. (This Apocalypse contradicts earlier Apocalypses. Guess which one is right...)
[Update: Er, actually, the earlier Apocalypse is right. We now write those references above as &factorial:(complex) and such.]
$val = &factorial($x); # illegal, must use either
$val = factorial($x); # this or
$val = &factorial.($x); # maybe this.
In general, don't use the & form when you really want to call something.
Other than type, zone, and variable name, all other information about parameters is specified by the standard trait syntax, generally introduced by is. Internally even the type and zone are just traits, but syntactically they're out in front for psychological reasons. Whose psychological reasons we won't discuss.
is constant (default)Every formal parameter is constant by default, meaning primarily that the compiler won't feel obligated to construct an lvalue out the actual argument unless you specifically tell it to. It also means that you may not modify the parameter variable in any way. If the parameter is a reference, you may use it to modify the referenced object (if the object lets you), but you can't assign to it and change the original variable passed to the routine.
[Update: This is now the readonly trait. We now prefer to reserve the term "constant" to refer only to compile-time constants, and parameters naturally tend to vary at run time. Or so we hope.]
is rwThe rw trait is how you tell the compiler to ask for an lvalue when evaluating the actual argument for this parameter. Do not confuse this with the rw trait on the subroutine as a whole, which says that the entire subroutine knows how to function as an lvalue. If you set this trait, then you may modify the variable that was passed as the actual argument. A swap routine would be:
sub swap ($a is rw, $b is rw) { ($a,$b) = ($b,$a) }
If applied to a slurpy parameter, the rw trait distributes to each element of the list that is bound to the parameter. In the case of a slurpy hash, this implies that the named pairs are in an lvalue context, which actually puts the right side of each named pair into lvalue context.
Since normal lvalues assume "is rw", I suppose that also implies that you can assign to a pair:
(key => $var) = "value";
or even do named parameter binding:
(who => $name, why => $reason) := (why => $because, who => "me");
which is the same as:
$name := "me";
$reason := $because;
And since a slurpy hash soaks up the rest of the named parameters, this also seems to imply that binding a slurpy rw hash actually makes the hash values into rw aliases:
$a = "a"; $b = "b";
*%hash := (a => $a, b => $b);
%hash{a} = 'x';
print $a; # prints "x"
That's kinda scary powerful. I'm not sure I want to document that... ["Too late!" whispers Evil Damian.]
is copyThis trait requests copy-in semantics. The variable is modifiable by you, but you're only modifying your own private copy. It has the same effects as assigning the argument to your own my variable. It does not do copy-out.
If you want both copy-in and copy-out semantics, declare it rw and do your own copying back and forth, preferably with something that works even if you exit by exception (if that's what you want):
sub cico ($x is rw) {
my $copy = $x;
LAST { $x = $copy }
...
}
Though if you're using a copy you probably only want to copy-out on success, so you'd use a KEEP block instead. Or more succinctly, using the new will syntax:
sub cicomaybe ($x is rw) {
my $copy will keep { $x = $copy } = $x;
...
}
is refThis trait explicitly requests call-by-reference semantics. It lets you read and write an existing argument but doesn't attempt to coerce that argument to an lvalue (or autovivify it) on the caller end, as rw would. This trait is distinguished from a parameter of type Ref, which merely asserts that the return type of the parameter is a reference without necessarily saying anything about calling convention. You can without contradiction say:
sub copyref (Ref $ref is copy) {...}
meaning you can modify $ref, but that doesn't change whatever was passed as the argument for that parameter.
Default values are also traits, but are written as assignments and must come at the end of the formal parameter for psychological reasons.
rule defval :w { \= <item> }
That is:
sub trim ( Str $_ is rw, Rule ?$remove = /\s+/ ) {
s:each/^ <$remove> | <$remove> $//;
}
lets you call trim as either:
trim($input);
or:
trim($input, /\n+/);
It's very important to understand that the expression denoted by item is evaluated in the lexical scope of the subroutine definition, not of the caller. If you want to get at the lexical scope of the caller, you have to do it explicitly (see CALLER:: below). Note also that an item may not contain unbracketed commas, or the parser wouldn't be able to reliably locate the next parameter declaration.
Although the default looks like an assignment, it isn't one. Nor is it exactly equivalent to //=, because the default is set only if the parameter doesn't exist, not if it exists but is undefined. That is, it's used only if no argument is bound to the parameter.
An rw parameter may only default to a valid lvalue. If you find yourself wanting it to default to an ordinary value because it's undefined, perhaps you really want //= instead:
sub multprob ($x is rw, $y) {
$x //= 1.0; # assume undef means "is certain"
$x *= $y;
}
Syntactically, you can put a default on a required parameter, but it would never be used because the argument always exists. So semantic analysis will complain about it. (And I'd rather not say that adding a default implies it's optional without the ? zone marker.)
[Update: By moving the ? to the end, it now may be omitted as redundant if there is a default. That is, $x? is really short for " $x = undef".]
Formal parameters may have any type that any other variable may have, though particular parameters may have particular restrictions. An invocant needs to be an object of an appropriate class or subclass, for instance. As with ordinary variable declarations the type in front is actually the return type, and you can put it afterwards if you like:
sub foo (int @array is rw) {...}
sub foo (@array of int is rw) {...}
sub foo (@array is Array of int is rw) {...}
The type of the actual argument passed must be compatible with (but not necessarily identical to) the formal type. In particular, for methods the formal type will often indicate a base class of the actual's derived class. People coming from C++ must remember that all methods are "virtual" in Perl.
[Update: Beyond that, most of the types specified in signatures are not even classes, but just roles used to name an interface. See S12.]
Closure parameters are typically declared with &:
sub mygrep (&block, *@list is rw) {...}
Within that subroutine, you can then call block() as an ordinary subroutine with a lexically scoped name. If such a parameter is declared without its own parameter signature, the code makes no assumptions about the actual signature of the closure supplied as the actual argument. (You can always inspect the actual signature at run time, of course.)
You may, however, supply a signature if you like:
sub mygrep (&block($foo), *@list is rw) {
block(foo => $bar);
}
[Update: That can be written &block:($foo) now.]
With an explicit signature, it would be error to bind a block to &block that is not compatible. We're leaving "compatible" undefined for the moment, other than to point out that the signature doesn't have to be identical to be compatible. If the actual subroutine accepted one required parameter and one optional, it would work perfectly fine, for instance. The signature in mygrep is merely specifying what it requires of the subroutine, namely one positional argument named "$foo". (Conceivably it could even be named something different in the actual routine, provided the compiler turns that call into a positional one because it thinks it already knows the signature.)
The typical subroutine or method is called a lot more often than it is declared. So while the declaration syntax is rather ornate, we strive for a call syntax that is rather simple. Typically it just looks like a comma-separated list. Parentheses are optional on predeclared subroutine calls, but mandatory otherwise. Parentheses are mandatory on method calls with arguments, but may be omitted for argumentless calls to methods such as attribute accessors. Parentheses are optional on multimethod and macro calls because they always parse like list operators. A rule may be called like a method but is normally invoked within a regex via the <rule> syntax.
[Update: Parentheses are optional on "postdeclared" subroutines as well, provided the post-declaration is consistent with listop syntax.]
As in Perl 5, within the list there may be an implicit transition from scalar to list context. For example, the declaration of the standard push built-in in Perl 6 probably looks like this:
multi *push (@array, *@list) {...}
but you still generally call it as you would in Perl 5:
push(@foo, 1, 2, 3);
This call has two of the three kinds of call arguments. It has one positional argument, followed by a variadic list. We could imagine adding options to push sometime in the future. We could define it like this:
multi *push (@array, ?$how, *@list) {...}
That's just an optional positional parameter, so you'd call it like this:
push(@foo, "rapidly", 1,2,3)
But that won't do, actually, since we used to allow the list to start at the end of the positional parameters, and any pre-existing push(@foo,1,2,3) call to the new declaration would end up mapping the "1" onto the new optional parameter. Oops...
If instead we force new parameters to be in named notation, like this:
multi *push (@array, *@list, +$how) {...}
then we can say:
push(@foo, how => "rapidly", 1,2,3)
and it's no longer ambiguous. Since $how is in the named-only zone, it can never be set positionally, and the old calls to:
push(@foo, 1,2,3);
still work fine, because *@list is still at the end of the positional parameter zone. If we instead declare that:
multi *push (@array, +$how, *@list) {...}
we could still say:
push(@foo, how => "rapidly", 1,2,3)
but this becomes illegal:
push(@foo, 1,2,3);
[Update: Actually, it's still legal now.]
because the slurpy array is in the named-only zone. We'll need an explicit way to indicate the start of the list in this case. I can think of lots of (mostly bad) ways. You probably can too. We'll come back to this...
So the actual arguments to a Perl function are of three kinds: positional, named, and list. Any or all of these parts may be omitted, but whenever they are there, they must occur in that order. It's more efficient for the compiler (and less confusing to the programmer) if all the positional arguments come before all the non-positional arguments in the list. Likewise, the named arguments are constrained to occur before the list arguments for efficiency--otherwise the implementation would have to scan the entire list for named arguments, and some lists are monstrous huge.
[Update: These are now viewed as suggestions rather than hard rules.]
We'd call these three parts "zones" as well, but then people would get them confused with our six declarative zones. In fact, extending the zoning metaphor a bit, our three parts are more like houses, stores, and factories (real ones, not OO ones, sheesh). These are the kinds of things you actually find in residential, commercial, and industrial zones. Similarly, you can think of the three different kinds of argument as the things you're allowed to bind in the different parameter zones.
A house is generally a scalar item that is known for its position; after all, "there's no place like home". Um, yeah. Anyway, we usually number our houses. In the US, we don't usually name our houses, though in the UK they don't seem to mind it.
A store may have a position (a street number), but usually we refer to stores by name. "I'm going out to Fry's" does not refer to a particular location, at least not here in Silicon Valley. "I'm going out to McDonald's" doesn't mean a particular location anywhere in the world, with the possible exception of "not Antarctica".
You don't really care exactly where a factory is--as long as it's not in your back yard--you care what it produces. The typical factory is for mass producing a series of similar things. In programming terms, that's like a generator, or a pipe...or a list. And you mostly worry about how you get vast quantities of stuff into and out of the factory without keeping the neighbors awake at night.
So our three kinds of arguments map onto the various parameter zones in a similar fashion.
Obviously, actual positional arguments are mapped onto the formal parameters in the order in which the formal positional parameters are declared. Invocant parameters (if any) must match invocant arguments, the required parameters match positional arguments, and then any additional non-named arguments are mapped onto the optional positional parameters. However, as soon as the first named argument is seen (that cannot be mapped to an explicitly typed Pair or Hash parameter) this mapping stops, and any subsequent positional parameters may only be bound by name.
[Update: Named arguments are now distinguished syntactically from Pair arguments, so it's not necessary to do type matching.]
After the positional argument part, you may pass as many named pairs as you like. These may bind to any formal parameter named in the declaration, whether declared as positional or named. However, it is erroneous to simultaneously bind a parameter both by position and by name. Perl may (but is not required to) give you a warning or error about this. If the problem is ignored, the positional parameter takes precedence, since the name collision might have come in by accident as a result of passing extra arguments intended for a different routine. Problems like this can arise when passing optional arguments to all the base classes of the current class, for instance. It's not yet clear how fail-soft we should be here.
Named arguments can come in either as Pair or Hash references. When parameter mapper sees an argument that is neither a Pair nor a Hash, it assumes it's the end of the named part and the beginning of the list part.
All unbound named arguments are bound to elements of the slurpy hash, if one was declared. If no slurpy hash is declared, an exception is thrown (although some standard methods, like BUILD, will provide an implicitly declared slurpy hash--known as %_ by analogy to @_--to handle surplus named arguments).
At the end of named argument processing, any unmapped optional parameter ends up with the value undef unless a default value is declared for it. Any unmapped required parameter throws an exception.
All remaining arguments are bound to the slurpy array, if any. If no slurpy array is specified, any remaining arguments cause an exception to be thrown. (You only get an implicit *@_ slurpy array when the signature is omitted entirely. Otherwise we could never validly give the error "Too many arguments".)
No argument processing is done on this list. If you go back to using named pairs at the end of the list, for instance, you'll have to pop those off yourself. But since the list is potentially very long, Perl isn't going to look for those on your behalf.
Indeed, the list could be infinitely long, and maybe even a little longer than that. Perl 5 always flattens lists before calling the subroutine. In Perl 6, list flattening is done lazily, so a list could contain several infinite entries:
print(1..Inf, 1..Inf);
That might eventually give the print function heartburn, of course...
There are, then, two basic transitions in argument processing. First is the transition from positional to named arguments. The second is from named arguments to the variadic list. It's also possible to transition directly from positional arguments to the variadic list if optional positional arguments have been completely specified. That is, the slurp array could just be considered the next optional positional parameter in that case, as it is in push.
But what if you don't want to fill out all the optional parameters, and you aren't planning to use named notation to skip the rest of them? How can you make both transitions simultaneously? There are two workarounds. First, suppose we have a push-like signature such as this:
sub stuff (@array, ?$how, *@list) {...}
The declarative workaround is to move the optional parameters after the slurp array, so that they are required to be specified as named parameters:
sub stuff (@array, *@list, +$how) {...}
Then you can treat the slurp array as a positional parameter. That's the solution we used to add an extra argument to push earlier, where the list always starts at the second argument.
[Update: The slurpy array is now always treated as a positional parameter even if there are named parameters intervening.]
On the calling end, you don't have any control of the declaration, but you can always specify one of the arguments as named, either the final positional one, or the list itself:
stuff(@foo, how => undef, 1,2,3)
stuff(@foo, list => (1,2,3))
The latter is clearer and arguably more correct, but it has a couple of minor problems. For one thing, you have to know what the parameter name is. It's all very well if you have to know the names of optional parameters, but every list operator has a list that you really ought to be able to feed without knowing its name.
So we'll just say that the actual name of the slurpy list parameter is "*@". You can always say this:
stuff(@foo, '*@' => (1,2,3))
[Update: Actually, it's the null name that maps to the slurpy list.]
That's still a lot of extra unnecessary cruft--but we can do better. List operators are like commands in Unix, where there's a command line containing a program name and some options, and streams of data coming in and going out via pipes. The command in this case is stuff, and the option is @foo, which says what it is we're stuffing. But what about the streams of stuff going in and out? Perl 6 has lazy lists, so they are in fact more like streams than they used to be.
There will be two new operators, called pipe operators, that allow us to hook list generators together with list consumers in either order. So either of these works:
stuff @foo <== 1,2,3
1,2,3 ==> stuff @foo
The (ir)rationale for this is provided in Appendix A.
To be sure, these newfangled pipe operators do still pass the list as a "*@"-named argument, because that allows indirection in the entire argument list. Instead of:
1,2,3 ==> stuff @foo
you can pull everything out in front, including the positional and named parameters, and build a list that gets passed as "splat" arguments (described in the next section) to stuff:
list(@foo, how => 'scrambled' <== 1,2,3)
==> stuff *;
In other words:
list(@foo, how => 'scrambled' <== 1,2,3) ==> stuff *;
is equivalent to:
list(@foo, how => 'scrambled' <== 1,2,3) ==> stuff *();
which is equivalent to:
stuff *(list(@foo, how => 'scrambled' <== 1,2,3));
The "splat" and the list counteract each other, producing:
stuff(@foo, how => 'scrambled' <== 1,2,3);
So what stuff actually sees is exactly as if you called it like this:
stuff(@foo, how => 'scrambled', '*@' => (1,2,3));
which is equivalent to:
stuff @foo, how => 'scrambled', 1, 2, 3;
And yes, the ==> and <== operators are big, fat, and obnoxiously noticeable. I like them that way. I think the pipes are important and should stand out. In postmodern architecture the ducts are just part of the deconstructed decor. (Just don't anyone suggest a ==>= operator. Just...don't.)
The ==> and <== operators have the additional side effect of forcing their blunt end into list context and their pointy end into scalar context. (More precisely, it's not the expression on the pointy end that is in scalar context, but rather the positional arguments of whatever list function is pointed to by the pointy end.) See Appendix A for details.
[Update: We also now have a slurpy routine, *&block, that binds to an anonymous adverbial block.]
As with Perl 5, the scalar arguments are evaluated in scalar context, while the list arguments are evaluated in list context. However, there are a few wrinkles.
*Perl 5 has a syntax for calling a function without paying any attention to its prototype, but in Perl 6 that syntax has been stolen for a higher purpose (referential purity). Also, sometimes you'd like to be able to ignore part of a signature rather than the whole signature. So Perl 6 has a different notation, unary *, for disabling signature checking, which we've mentioned in earlier Apocalypses, and which you've already seen in the form of the stuff * above. (Our splat in the stuff * above is in fact unary, but the optional argument is missing, because the list is supplied via pipe.)
The first splatted term in an argument list causes all prior terms to be evaluated in scalar context, and all subsequent terms to be evaluated in list context. (Splat is a no-op in list context, so it doesn't matter if there are more splatted terms.) If the function wants more positional arguments, they are assumed to come from the generated list, as if the list had been specified literally in the program at that point as comma-separated values.
[Update: This is now done using the [,] reduce operator. For * and "splat" below read [,] instead.]
With splat lists, some of the argument processing may have to be deferred from compile time to runtime, so in general such a call may run slower than the ordinary form.
If Perl can't figure out the signature of a function at compile time (because, for instance, it's a method and not a function), then it may not be known which arguments are in scalar or list context at the time they are evaluated. This doesn't matter for Perl variables, because in Perl 6, they always return a reference in either scalar or list context. But if you call a function in such an indeterminate context, and the function doesn't have a return value declared that clarifies whether the function behaves differently in scalar or list context, then one of two things must happen. The function must either run in an indeterminate context, or the actual call to the function must be delayed until the context is known. It is not yet clear which of these approaches is the lesser evil. It may well depend on whether the function pays more attention to its dynamic context or to global values. A function with no side effects and no global or dynamic dependencies can be called whenever we like, but we're not here to enforce the functional paradigm. Interesting functions may pay attention to their context, and they may have side effects such as reading from an input stream in a particular order.
A variant of running in indeterminate context is to simply assume the function is running in list context. (That is, after all, what Perl 5 does on methods and on not-yet-declared subroutines.) In Perl 6, we may see most such ambiguities resolved by explicit use of the <== operator to force preceding args into scalar context, and the following args into list context. Individual arguments may also be forced into scalar or list context, of course.
By the way, if you mix unary splat with <==, only the args to the left of the splat are forced into scalar context. (It can do this because <== governs everything back to the list operator, since it has a precedence slightly looser than comma.) So, given something like:
@moreargs = (1,2,3);
mumble $a, @b, c(), *@moreargs <== @list;
we can tell just by looking that $a, @b, and c() are all evaluated in scalar context, while @moreargs and @list are both in list context. It is parsed like this:
mumble( ($a, @b, c(), (*@moreargs)) <== (@list) );
You might also write that like this:
@moreargs = list(1,2,3 <== @list);
mumble $a, @b, c(), *@moreargs;
In this case, we can still assume that $a, @b, c() are in scalar context, because as we mentioned in the previous section, the * forces it. (That's because there's no reason to put the splat if you're already in list context.)
Before we continue, you probably need a break. Here, have a break:
*******************************************************
******************** Intermission *********************
*******************************************************
Welcome back.
We've covered the basics up till now, but there are a number of miscellaneous variations we left out in the interests of exposition. We'll now go back to visit some of those issues.
Sometimes you want to specify that the variadic list has a particular recurring type, or types. This falls out naturally from the slurp array syntax:
sub list_of_ints ($a, $b, Int *@ints) { ... }
sub list_of_scalars (Scalar *@scalars) { ... }
These still evaluate the list in list context. But if you declare them as:
sub intlist ($a, $b, Int *@ints is context(Int)) { ... }
sub scalarlist (Scalar *@scalars is context(Scalar)) { ... }
then these provide a list of Int or Scalar contexts to the caller. If you call:
scalarlist(@foo, %bar, baz())
you get two scalar references and the scalar result of baz(), not a flattened list. You can have lists without list context in Perl 6!
[Update: But only on ordinary subs.]
If you want to have alternating types in your list, you can. Just specify a tuple type on your context:
strintlist( *@strints is context(Str,Int)) { ... }
Perl 5's list context did not do lazy evaluation, but always flattened immediately. In Perl 6 the default list context "is context(Lazy)". But you can specify "is context(Eager)" to get back to Perl 5 semantics of immediate flattening.
As a sop to the Perl5-to-Perl6 translator (and to people who have to read translated programs), the Eager context can also be specified by doubling the slurpy * on the list to make it look like a pair of rollers that will squish anything flat:
sub p5func ($arg, **@list) { ... }
The "eager splat" is also available as a unary operator to attempt eager flattening on the rvalue side:
@foo = **1..Inf; # Test our "out of memory" handler...
[Update: this is now just done with the eager listop.]
It's often the case that you'd like to treat a single array argument as if it were an argument list of its own. Well, you can. Just put a sublist signature in square brackets. This is particularly good for declaring multimethods in a functional programming mindset:
multi apply (&func, []) { }
multi apply (&func, [$head, *@tail]) {
return func($head), apply(&func, @tail);
}
@squares := apply { $_ * $_ } [1...];
Of course, in this case, the first multimethod is never called because the infinite list is never null no matter how many elements we pull off the front. But that merely means that @squares is bound to an infinite list generator. No big deal, as long as you don't try to flatten the list...
Note that, unlike the example in the previous section which alternated strings and integers, this:
strintlist( [Str, Int] *@strints ) { ... }
implies single array references coming in, each containing a string and an integer.
Of course, this may be a bad example insofar as we could just write:
multi apply (&func) { }
multi apply (&func, $head, *@tail) {
return func($head), apply(&func, *@tail);
}
@squares := apply { $_ * $_ } *1...;
It'd be nice to lose the * though on the calls. Maybe what we really want is a slurpy scalar in front of the slurpy array, where presumably the <== maps to the first slurpy scalar or hash (or it could be passed positionally):
multi apply (&func) { }
multi apply (&func, *$head, *@tail) {
return func($head), apply(&func <== @tail);
}
@squares := apply { $_ * $_ } 1...;
Yow, I think I could like that if I tried.
So let's say for now that a slurpy scalar parameter just pulls the first (or next) value off of the the slurpy list. The [] notation is still useful though for when you really do have a single array ref coming in as a parameter.
[Update: A slurpy scalar might also be bound to an unnamed adverbial block if there is no slurpy block to bind it to. Since named parameter processing precedes slurpy list processing, any named parameter bound to an adverbial block is automatically excluded from binding to the slurpy list.]
It is typical in many languages to see object initializers that look like this (give or take a keyword):
function init (a_arg, b_arg, c_arg) {
a = a_arg;
b = b_arg;
c = c_arg;
}
Other languages try to improve the situation without actually succeeding. In a language resembling C++, it might look more like this:
method init (int a_arg, int b_arg, int c_arg)
: a(a_arg), b(b_arg), c(c_arg) {}
But there's still an awful lot of redundancy there, not to mention inconsistent special syntax.
Since (as proven by Perl 5) signatures are all about syntactic sugar anyway, and since Perl 6 intentionally makes attribute variables visually distinct from ordinary variables, we can simply write this in Perl 6 as:
submethod BUILD ($.a, $.b, $.c) {}
Any parameter that appears to be an attribute is immediately copied directly into the corresponding object attribute, and no lexical parameter is generated. You can mix these with ordinary parameters--the general rule of thumb for an initializer is that you should see each dotted attribute at least once:
submethod BUILD ($.a, $.b, $c) {
$.c = mung($c);
}
This feature is primarily intended for use in constructors and initializers, but Perl does not try to guess which subroutines fall into that category (other than the fact that Perl 6 will implicitly call certain conventional names like CREATE and BUILD.)
However, submethods such as BUILD are assumed to have an extra *%_ parameter to soak up any extra unrecognized named arguments. Ordinarily you must declare a slurp-hash explicitly to get that behavior. But BUILD submethods are always called with named arguments (except for the invocant), and often have to ignore arguments intended for other classes participating in the current construction. It's likely that this implicit *%_ feature extends to other routines declared in all-caps as well, and perhaps all submethods.
[Update: Turns out that all methods and submethods work this way.]
As in Perl 5, subroutines declared in all-caps are expected to be called automatically most of the time--but not necessarily all the time. The BUILD routine is a good example, because it's only called automatically when you rely on the default class initialization rules. But you can override those rules, in which case you may have to call BUILD yourself. More on that in Apocalypse 12. Or go to one of Damian's Perl 6 talks...
All blocks are considered closures in Perl 6, even the blocks that declare modules or classes (presuming you use the block form). A closure is just an anonymous subroutine that has access to its lexical context. The fact that some closures are immediately associated with names or have other kinds of parameter declarations does not change the fact that an anonymous bare block without parameters is also a kind of subroutine. Of course, if the compiler can determine that the block is only executed inline, it's free to optimize away all the subroutine linkage--but not the lexical linkage. It can only optimize away the lexical linkage if no external lexicals are accessed (or potentially accessed, in the case of eval).
As introduced in Apocalypse 4, loops and topicalizers are often written with a special form of closure declaration known these days as "pointy subs". A pointy sub is exactly equivalent to a standard anonymous sub declaration having the same parameters. It's almost pure syntactic sugar--except that we embrace syntactic sugar in Perl when it serves a psychological purpose (not to be confused with a logical psycho purpose, which we also have).
Anyway, when you say:
-> $a, $b, $c { ... }
it's almost exactly the same as if you'd said:
sub ($a, $b, $c) { ... }
only without the parentheses, and with the cute arrow that indicates the direction of data flow to that part of your brain that consumes syntactic glucose at a prodigious rate.
Since the parentheses around the signature are missing, you can't specify anything that would ordinarily go outside the parentheses, such as the return type or other subroutine traits. But you may still put traits or zone markers on each individual formal parameter.
Also, as a "sub-less" declaration, you can't return from it using return, because despite being a closure, it's supposed to look like a bare Block embedded in a larger Routine, and users will expect return to exit from the "real" subroutine. All of which just means that, if you need those fancy extras, use a real sub sub, not a pointy one.
Also as discussed in Apocalypse 4, a bare block functioning as a closure can have its parameters declared internally. Such parameters are of the form:
rule placeholder { <sigil> \^ <ident> }
Placeholder parameters are equivalent to required position parameters declared in alphabetical order. (Er, Unicodical order, really.) For example, the closure:
{ $^fred <=> $^barney }
has the same signature as the pointy sub:
-> $barney, $fred { $fred <=> $barney }
or the standard anonymous sub:
sub ($barney, $fred) { $fred <=> $barney }
On first hearing about the alphabetical sorting policy, some otherwise level-headed folks immediately panic, imagining all sorts of ways to abuse the mechanism for the purposes of obfuscation. And surely there are many ways to abuse many of the features in Perl, more so in Perl 6. The point of this mechanism, however, is to make it drop-dead easy to write small, self-contained closures with a small number of parameters that you'd probably give single-character alphabetical names to in any event. If you want to get fancier than that, you should probably be using a fancier kind of declaration. I define "small number" as approximately e ± π. But as is generally the case in Perl, you get to pick your own definition of "small number". (Or at the very least, you get to pick whether to work with a company that has already defined "small number" for you.)
As bare rvalue variables embedded in the code, you may not put any traits or zone markers on the placeholders. Again, the desire to do so indicates you should be using a fancier form of declaration.
Perl 5 just used subroutines for methods. This is okay as long as you don't want to declare any utility subroutines in your class. But as soon as you do, they're inherited in Perl 5, which is not what you want. In Perl 6, methods and subroutines still share the same namespace, but a method must be declared using the method keyword. This is good documentation in any event, and further allows us to intuit an invocant where none is declared. (And we know that none is declared if there's no colon after the first argument, at least in the case of an ordinary method.)
There are certain implementation methods that want to be inherited in general so that you can specify a default implementation, but that you want the class to be able to override without letting derived classes inherit the overridden method from this class. That is, they are scoped like utility subroutines, but can be called as if they are methods, without being visible outside the class. We call these hybrids "submethods", and so there's a submethod keyword to declare them. Submethods are simultaneously subs and methods. You can also think of them as something less than a method, as the "sub" works in the word "subhuman". Or you can think of them as underneath in the infrastructural sense, as in "subterranean".
Routines that create, initialize, or destroy the current object tend to fall into this category. Hence, the BUILD routine we mentioned earlier is ordinarily declared as a submethod, if you don't want to inherit the standard BUILD method defined in the Object class. But if you override it, your children still inherit BUILD from Object.
Contrariwise, if you don't like Object's default BUILD method, you can define an entire new class of classes that all default to your own BUILD method, as long as those classes derive from your new base object with superior characteristics. Each of those derived classes could then define a submethod to override your method only for that class, while classes derived from those classes could still inherit your default.
And so on, ad OOium.
Some kinds of programming map easily onto the standard model in which a method has a single invocant. Other kinds of programming don't. Perl 6 supplies support for the latter kind of programming, where the relationships between classes are just as interesting as the classes themselves. In some languages, all methods are multimethods. Perl 6 doesn't go quite that far--you must declare your multimethods explicitly. To do so, use the multi keyword in place of method, and optionally place a colon after the list of invocants in the declaration, unless you want them all to be invocants. Then your multimethod will be registered globally as a being of interest to all the types of its invocants, and will participate in multimethod dispatch.
It is beyond the scope of this Apocalypse to specify exactly how multimethod dispatch works (see Apocalypse 12, someday), but we can tell you that, in general, you call a multimethod as if it were an ordinary subroutine, and the dispatcher figures out on your behalf how many of the arguments are invocants. This may sound fancy to you, but many of the functions that are built into Perl 5 are not built into Perl 6, at least, not as keywords. Instead they are either defined as global subroutines or as multimethods, single invocant multimethods in many cases. When you call a function like close($handle), it'll first look to see if there's a close subroutine defined in your scope, and if not, it will dispatch it as a multimethod. Likewise, for something like sysread, you can call it either as a method:
sysread $handle: $buffer, $length
or as a function:
sysread $handle, $buffer, $length
In the first case, it's explicitly dispatching on the handle, because a colon in place of the first comma indicates an invocant. (That's our new indirect object syntax, in fact. Perl 6 does not support the Perl 5 syntax of just leaving whitespace between the indirect object and the subsequent arguments.)
In the second case, it looks for a sysread subroutine, doesn't find it (we hope), and calls multimethod dispatch on it. And it happens that the multimethod dispatch is smart enough to find the ordinary single-invocant sysread method, even though it may not have been explicitly declared a multimethod. Multimethod dispatch happens to map directly onto ordinary method dispatch when there's only one invocant.
At least, that's how it works this week...
Rules were discussed in Apocalypse 5. They are essentially methods with an implicit invocant, consisting of the object containing the current pattern matching context. To match the internals of regex syntax, traits attached to rules are typically written as ":w" rather than "is w", but they're essentially the same thing underneath.
It's possible to call a rule as if it were a method, as long as you give it the right arguments. And a method defined in a grammar can be called as if it were a rule. They share the same namespace, and a rule really is just a method with a funny syntax.
A macro is a function that is called immediately upon completion of the parsing of its arguments. Macros must be defined before they are used--there are no forward declarations of macros, and while a macro's name may be installed in either a package or a lexical scope, its syntactic effect can only be lexical, from the point of declaration (or importation) to the end of the current lexical scope.
Every macro is associated (implicitly or explicitly) with a particular grammar rule that parses and reduces the arguments to the macro. The formal parameters of a macro are special in that they must be derived somehow from the results of that associated grammar rule. We treat macros as if they were methods on the parse object returned by the grammar rule, so the first argument is passed as if it were an invocant, and it is always bound to the current parse tree object, known as $0 in Apocalypse 5. (A macro is not a true method of that class, however, because its name is in your scope, not the class's.)
[Update: That's now the $/ object. $0 has been "demoted" to being the first submatch.]
Since the first parameter is treated as an invocant, you may either declare it or leave it implicit in the actual declaration. In either case, the parse tree becomes the current topic for the macro. Hence you may refer to it as either $_ or $0, even if you don't give it a name.
[Update: $/ would return the parse tree. $0 would only return the first submatch.]
Subsequent parameters may be specified, in which case they bind to internal values of $0 in whatever way makes sense. Positional parameters bind to $1, $2, etc. Named parameters bind to named elements of $0. A slurpy hash is really the same as $0, since $0 already behaves as a hash. A slurpy array gets $1, $2, etc., even if already bound to a positional parameter.
[Update: For $0 read $/ above.]
A macro can do anything it likes with the parse tree, but the return value is treated specially by the parser. You can return one of several kinds of values:
[Update: instead of trying to dwim a bare closure, we now have a "code" quasiquote that returns a parse tree. See S06.]
undef, indicating that the macro is only used for its side effects. Such a macro would be one way of introducing an alternate commenting mechanism, for instance. I suppose returning "" has the same effect, though.A macro by default parses any subsequent text using whatever macro rule is currently in effect. Generally this will be the standard Perl::macro rule, which parses subsequent arguments as a list operator would--that is, as a comma-separated list with the same policy on using or omitting parentheses as any other list operator. This default may be overridden with the "is parsed" trait.
[Update: There is probably a different default macro rule for each syntactic category. An infix macro wants to parse the right argument as a single value, not a list, for instance.]
If there is no signature at all, macro defaults to using the null rule, meaning it looks for no argument at all. You can use it for simple word substitutions where no argument processing is needed. Instead of the long-winded:
my macro this () is parsed(/<null>/) { "self" }
you can just quietly turn your program into C++:
my macro this { "self" }
A lot of Perl is fun, and macros are fun, but in general, you should never use a macro just for the fun of it. It's far too easy to poke someone's eye out with a macro.
Certain kinds of routines want extra parameters in addition to the ordinary parameter list. Autoloading routines for instance would like to know what function the caller was trying to call. Routines sensitive to topicalizers may wish to know what the topic is in their caller's lexical scope.
There are several possible approaches. The Perl 5 autoloader actually pokes a package variable into the package with the AUTOLOAD subroutine. It could be argued that something that's in your dynamic scope should be accessed via dynamically scoped variables, and indeed we may end up with a $*AUTOLOAD variable in Perl 6 that works somewhat like Perl 5's, only better, because AUTOLOAD kinda sucks. We'll address that in Apocalypse 10, for some definition of "we".
Another approach is to give access to the caller's lexical scope in some fashion. The magical caller() function could return a handle by which you can access the caller's my variables. And in general, there will be such a facility under the hood, because we have to be able to construct the caller's lexical scope while it's being compiled.
In the particular case of grabbing the topic from the caller's lexical scope (and it has to be in the caller's lexical scope because $_ is now lexically scoped in Perl 6), we think it'll happen often enough that there should be a shorthand for it. Or maybe it's more like a "midhand". We don't want it too short, or people will unthinkingly abuse it. Something on the order of a CALLER:: prefix, which we'll discuss below.
Works just like in Perl 5. Why change something that works?
Well, okay, we are tweaking a few things related to lexical scopes. $_ (also known as the current topic) is always a lexically scoped variable now. In general, each subroutine will implicitly declare its own $_. Methods, submethods, macros, rules, and pointy subs all bind their first argument to $_; ordinary subs declare a lexical $_ but leave it undefined. Every sub definition declares its own $_ and hides any outer $_. The only exception is bare closures that are pretending to be ordinary blocks and don't commandeer $_ for a placeholder. These continue to see the outer scope's $_, just as they would any other lexically scoped variable declared in the outer scope.
[Update: Methods and subs no longer bind their first argument to $_ by default.]
On the flipside, $_ is no longer visible in the dynamic context. You can still temporize (localize) it, but you'll be temporizing the current subroutine's lexical $_, not the global $_. Routines which used to use dynamic scoping to view the $_ of a calling subroutine will need some tweaking. See CALLER:: below.
caller functionAs in Perl 5, the caller function will return information about the dynamic context of the current subroutine. Rather than always returning a list, it will return an object that represents the selected caller's context. (In a list context, the object can still return the old list as Perl 5-ers are used to.) Since contexts are polymorphic, different context objects might in fact supply different methods. The caller function doesn't have to know anything about that, though.
What caller does know in Perl 6 is that it takes an optional argument. That argument says where to stop when scanning up the call stack, and so can be used to tell caller which kinds of context you're interested in. By default, it'll skip any "wrapper" functions (see "The .wrap method" below) and return the outermost context that thought it was calling your routine directly. Here's a possible declaration:
multi *caller (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
returns CallerContext {...}
The $where argument can be anything that matches a particular context, including a subroutine reference or any of these Code types:
Code Routine Block Sub Method Submethod Multi Macro Bare Parametric
&_ produces a reference to your current Routine, though in the signature above we have to use &CALLER::_ to get at the caller's &_.
[Update: Since the caller's sub is now named &?ROUTINE, that'd presumably be CALLER::<&?ROUTINE> instead.]
Note that use of caller can prevent certain kinds of optimizations, such as tail recursion elimination.
want functionThe want function is really just the caller function in disguise. It also takes an argument telling it which context to pay attention to, which defaults to the one you think it should default to. It's declared like this:
multi *want (?$where = &CALLER::_, Int +$skip = 0, Str +$label)
returns WantContext {...}
Note that, as a variant of caller, use of want can prevent certain kinds of optimizations.
When want is called in a scalar context:
$primary_context = want;
it returns a synthetic object whose type behaves as the junction of all the valid contexts currently in effect, whose numeric overloading returns the count of arguments expected, and whose string overloading produces the primary context as one of 'Void', 'Scalar', or 'List'. The boolean overloading produces true unless in a void context.
When want is called in a list context like this:
($primary, $count, @secondary) = want;
it returns a list of at least two values, indicating the contexts in which the current subroutine was called. The first two values in the list are the primary context (i.e the scalar return value) and the expectation count (see Expectation counts below). Any extra contexts that want may detect (see Valid contexts below) are appended to these two items.
When want is used as an object, it has methods corresponding to its valid contexts:
if want.rw { ... }
unless want.count < 2 { ... }
when want.List { ... }
The want function can be used with smart matching:
if want ~~ List & 2 & Lvalue { ... }
Which means it can also be used in a switch:
given want {
when List & 2 & Lvalue { ... }
when .count > 2 {...}
}
The numeric value of the want object is the "expectation count". This is an integer indicating the number of return values expected by the subroutine's caller. For void contexts, the expectation count is always zero; for scalar contexts, it is always zero or one; for list contexts it may be any non-negative number. The want value can simply be used as a number:
if want >= 2 { return ($x, $y) } # context wants >= 2 values
else { return ($x); } # context wants < 2 values
Note that Inf >= 2 is true. (Inf is not the same as undef.) If the context is expecting an unspecified number of return values (typically because the result is being assigned to an array variable), the expectation count is Inf. You shouldn't actually return an infinite list, however, unless want ~~ Lazy. The opposite of Lazy context is Eager context (the Perl 5 list context, which always flattened immediately). Eager and Lazy are subclasses of List.
The valid contexts are pretty much as listed in RFC 21, though to the extent that the various contexts can be considered types, they can be specified without quotes in smart matches. Also, types are not all-caps any more. We know we have a Scalar type--hopefully we also get types or pseudo-types like Void, List, etc. The List type in particular is an internal type for the temporary lists that are passed around in Perl. Preflattened lists are Eager, while those lists that are not preflattened are Lazy. When you call @array.specs, for instance, you actually get back an object of type Lazy. Lists (Lazy or otherwise) are internal generator objects, and in general you shouldn't be doing operations on them, but on the arrays to which they are bound. The bound array manages its hidden generators on your behalf to "harden" the abstract list into concrete array values on demand.
[Update: List contexts are no longer required to keep track of how many arguments they want. The only meaningful values are 0, 1, and infinity.]
CALLER:: pseudopackageJust as the SUPER:: pseudopackage lets you name a method somewhere in your set of superclasses, the CALLER:: pseudoclass lets you name a variable that is in the lexical scope of your (dynamically scoped) caller. It may not be used to create a variable that does not already exist in that lexical scope. As such, it is is primarily intended for a particular variable that is known to exist in every caller's lexical scope, namely $_. Your caller's current topic is named $CALLER::_. Your caller's current Routine reference is named &CALLER::_.
[Update: The latter is now CALLER::<&?ROUTINE>.]
Note again that, as a form of caller, use of CALLER:: can prevent certain kinds of optimizations. However, if your signature uses $CALLER::_ as a default value, the optimizer may be able to deal with that as a special case. If you say, for instance:
sub myprint (IO $handle, *@list = ($CALLER::_)) {
print $handle: *@list;
}
then the compiler can just turn the call:
myprint($*OUT);
into:
myprint($*OUT, $_);
Our earlier example of trim might want to default the first argument to the caller's $_. In which case you can declare it as:
sub trim ( Str ?$_ is rw = $CALLER::_, Rule ?$remove = /\s+/ ) {
s:each/^ <$remove> | <$remove> $//;
}
which lets you call it like this:
trim; # trims $_
or even this:
trim remove => /\n+/;
Do not confuse the caller's lexical scope with the callee's lexical scope. In particular, when you put a bare block into your program that uses $_ like this:
for @array {
mumble { s/foo/bar/ };
}
the compiler may not know whether or not the mumble routine is intending to pass $_ as the first argument of the closure, which mumble needs to do if it's some kind of looping construct, and doesn't need to do if it's a one-shot. So such a bare block actually compiles down to something like this:
for @array {
mumble(sub ($_ is rw = $OUTER::_) { s/foo/bar/ });
}
(If you put $CALLER::_ there instead, it would be wrong, because that would be referring to mumble's $_.)
With $OUTER::_, if mumble passes an argument to the block, that argument becomes $_ each time mumble calls the block. Otherwise, it's just the same outer $_, as if ordinary lexical scoping were in effect. And, indeed, if the compiler knows that mumble takes a sub argument with a signature of (), it may optimize it down to ordinary lexical scoping, and if it has a signature of ($), it can assume it doesn't need the default. A signature of (?$) means all bets are off again.
[Update: the CALLER:: mechanism has been refined into the ENV:: variable system. Only variables declared with the env declarator are visible to callees, but implicitly declared variables like $_ and $/ are included in this category. CALLER:: still works the same, but ENV:: searches all the way up the dynamic call stack for a lexically scoped variable of that name that has been declared with env. Just as $*foo is shorthand for GLOBAL::<$foo>, so too $+foo is shorthand for ENV::<$foo>.]
return/leave returns toA return statement needs to return to where the user thinks it ought to return to. Since any block is a closure, any block is really a subroutine in disguise. But the user doesn't generally want return to return from the innermost block, but from the innermost block that was actually defined using an explicit sub-ish keyword. So that's what Perl 6 does. If it can, it will implement the return internally as a simple jump to the end of the subroutine. If it can't, it implements return by throwing a control exception that is caught by the proper context frame.
There will be a leave function that can return from other scopes. By default it exits from the innermost block (anything matching base class Code), but, as with caller and want, you can optionally select the scope you want to return from. It's declared like this:
multi leave (?$where = Code, *@value, Int +$skip, Str +$label) {...}
which lets you say things like:
leave;
leave Block;
leave &_ <== 1,2,3; # same as "return 1,2,3"
leave where => Parametric, value => (1,2,3);
leave Loop, label => 'LINE', $retval;
leave { $_ ~~ Block and $_ !~ Sub } 1,2,3;
leave () <== 1,2,3;
As it currently stands, the parens aren't optional on that last one, because <== is a binary operator. You could always define yourself a "small" return, ret, that leaves the innermost block:
my macro ret { "leave Code <== " }
# and later...
{ ret 1,2,3 }
Note that unlike a return, leave always evaluates any return value in list context. Another thing to iron out is that the context we choose to leave must have set up an exception handler that can handle the control exception that leave must in some cases throw. This seems to imply that any context must miminally catch a control exception that is bound to its own identity, since leave is doing the picking, not the exception handlers.
.wrap methodYou may ask a subroutine to wrap itself up in another subroutine in place, so that calls to the original are intercepted and interpreted by the wrapper, even if access is only through the reference:
$id = $subref.wrap({
# preprocessing here
call;
# postprocessing here
})
The call built-in knows how to call the inner function that this function is wrapped around. In a void context, call arranges for the return value of the wrapped routine to be returned implicitly. Alternately, you can fetch the return value yourself from call and return it explicitly:
$id = $subref.wrap({
my @retval = call;
push(@retval, "...and your little dog, too!";
return @retval;
})
The arguments arrive in whatever form you request them, independently of how the parameters look to the wrapped routine. If you wish to modify the parameters, supply a new argument list to call:
$id = $subref.wrap(sub (*@args) {
call(*@args,1,2,3);
})
You need to be careful not to preflatten those generators, though.
[Update: Not-yet-bound arguments and return values are now represented by a Capture object.]
The $id is useful for removing a particular wrapper:
$