TPC 5.0
"Here there be dragons"
The big goals of perl 6's internals
- Speed
- Extendibility
- Cleanliness
- Compatibility
- Modularity
- Thread Safety
- Flexibility
Some global decisions
- The core will be in C. (Like it or not, it's appropriate for code at this level)
- The core must be modular, so pieces can be swapped out without rebuilding
- It must be fast
- Long-term binary compatibility is a must
- Your average perl coder or extension writer shouldn't need any info about the guts
- Things should generally be thought out, documented, and engineered
The quick overview
- Parser
- Compiler
- Optimizer
- Runtime engine
The parser
- Where the whole thing starts
- Generally takes source of some sort and turns it into a syntax tree
The Bytecode Compiler
- Turns a syntax tree into bytecode
- Performs some simple optimization
The optimizer
- Takes the plain bytecode from the compiler and abuses it heavily
- An optional step, generally skipped for compile-and-go execution
- Should be able to work on small parts of a program for JIT optimization
The Interpreter
- Takes compiled (and possibly optimized) bytecode and does something with it
- Generally that something is execute, but it might also be:
- Save to disk
- Translate to another format (.NET, Java bytecode)
- Compile to machine code
The Parser
- "Double, double, toil and trouble
Fire burn, and cauldron bubble"
Parser goals
- Extendible in perl
- More powerful than what we have now
- Retargetable
- Self-contained and removable
Parsing perl isn't easy
- May well be one of the toughest languages to properly parse
- If we get perl right other languages are easy. Or at least easier
- We have the full power of perl to draw on to do the parsing (Including the regex engine and Damian's Bizarre Idea de Jour)
The Compiler
- "Mmmmm, tasty!"
From syntax tree to bytecode
- The compiler takes a syntax tree and turns it into bytecode
- Very little optimization is done here.
- Optimization is expensive and optional
- Pretty straightforward-this isn't rocket science
The Optimizer
- "We can rebuild it. Make it better, faster, stronger"
- Takes plain bytecode and makes it faster
- Does all the sorts of things that you expect an optimizer to do-code motion, loop unrolling, common subexpression work, etc.
- Will be an iterative process
- This will be interesting, as perl's a pain to optimize
- An optional step, of course
Things that make optimizing perl tough
- Active data
- Runtime redefinitions of everything
- Really, really late binding (Waiting for Godot late)
- Perl programmers are used to more predictable runtime characteristics than, say, C programmers.
The Interpreter
- "Polly want a cracker?"
Interpreter goals
- Fast
- Tuned for perl
- Language neutral where possible
- Event capable
- Sandboxable
- Asynchronous I/O built in
- Built with an eye towards TIL and/or native code compilation
- Better debugging support than perl 5
The perl 6 interpreter is software CPU
- Complete with registers and an assembly language
- This can make translating perl 6 bytecode into native machine code easier
- There's a lot of literature on building optimzing compilers that can be leveraged
- While more complex than a pure stack-based machine, it's also faster
- Opcode dispatch needs to be faster than perl 5
- Opcode functions can be written in perl
CPU specs
- 64 int, float, string, and PMC registers
- A segmented multiple stack architecture
- Interrupt-capable (for events)
- Pretty much completely position independent-everything is referenced via register, pad entry, or name
The regex engine
- The regex engine is going to be part of the perl 6 CPU, not separate as it is now
- A good incentive to get opcode dispatch fast
- Makes expanding the regex engine a bit easier
- Details will be hidden as a set of regex opcodes
A few words on the stack system
- Each register file has an associated stack
- All registers of a particular type can be pushed onto or popped off the stack in one go
- Individual registers or groups of registers can be pushed or popped
- The stacks are all segmented so we're not relying on finding contiguous chunks of memory for them
- There's also a set of call and scratch stacks
Bytecode
- "Could you say that a little differently?"
What is bytecode?
- A distilled version of a program
- Machine language for the PVM
- Can contain a lot of 'extra' information, including full source
- Designed to be platform independent
- Should be mostly mappable as shared data (modulo the fixup sections)
Data Structures
- "Vtables and strings and floats, oh my!"
Variables
- Generically called a PMC
- Bigger than Perl 5's base data structure
- Synchronization data built-in
- Same for all variable types
- GC data is not part of base structure
Scalars
- Built off the base PMC structure
- Use the integer and float areas as caches
- Data pointer points off to string, large int, or large float
- Vtable functions determine how it all works
Arrays
- Built off the base PMC structure
- Data pointer points to array data
- All perl 6 arrays are typed
- May have an array of scalars, strings, integers, or floats
- Array only takes up enough memory to hold their types
Hashes
- Built off the base PMC structure
- Data pointer points to array data
- All perl 6 hashes are typed
- May have a hash of scalars, strings, integers, or floats
- Hashes only takes up enough memory to hold their types
- Hashing function is overridable
Strings
- Strings are sort of abstract
- Perl 6 can mix and match string data (Unicode, ASCII, EBCDIC, etc)
- New string types can be loaded on the fly
String handling
- Perl 6 has no 'built-in' string support-all string support is via loadable libraries
- There'll be Unicode, ASCII, and EBCDIC support provided (at least) to start
Numbers
- Bigints and bigfloats share the same header
- Arbitrary-length floating point and integer numbers are supported
- Perl automagically upgrades ints and floats when needed
Vtables
- All variable data access is done through a table of functions that the variable carries around with it
- This allows us faster access, since code paths are specialized for just the functions they need to perform
- Isolates us from the implementation of variables internally
- Allows special purpose behaviour (like perl 5's magic) to be attached without cost to the rest of perl
Vtables (cont'd)
- Makes thread safety easier
- A little bit more overhead because of the extra level of indirection, but the smaller functions make up for that
- Vtable functions can be written in perl. (Each class with objects blessed into it will have at least one)
- There may be more than one vtable per package
Vtables hide data manipulation
- Pretty much all the code to handle data manipulation will be done via variable vtables
- Ths allows the variable implementation to change without perl needing to know
- Allows far more flexibility in what you can make a variable do
- Shortens the code path for data functions and trims out extraneous conditionals
For example:
Fetching the string value of a scalar
For scalars with strings:
String *get_str(PMC *my_PMC) {
return my_PMC->data_pointer;
}
For int-only scalar:
String *get_str(PMC *my_PMC) {
my_PMC->data_pointer =
make_string(my_PMC->integer);
my_PMC->vtable = int_and_string_vtable;
return my_PMC->data_pointer;
}
Memory Management
- "Now where did I put that?"
Getting headers
- All the fixed-size things (PMCs, string/number headers) get allocated from arenas
- All headers, with the exception of PMCs (maybe) are moveable by the garbage collector
- Non-PMC header allocation is very fast
- PMC allocation is only mostly fast
Buffer Management
- Anything that isn't a fixed size gets allocated from the buffer pools
- All buffered data, with the exception of data allocated in special pools, is moveable by the garbage collector
- Because of GC, allocation is very quick
Garbage Collection
- "Bring out yer dead!"
The perl 6 GC is a copying collector
- Everything except PMCs is moveable in Perl 6
- PMCs might be moveable too
- We get a compact memory heap out of this, which allows for fast allocation
- Perl 6 will release empty memory back to the system when it can
- Refcounts are used only to note object lifetimes, not for GC
- Refcounts, for the most part, are dead
GC considerations for Objects
- Garbage collection and object death are now separate things
- Perl's guarantee of timely object death is stronger
- We still don't guarantee perfect collection (but it sucks less)
- We still refcount for real perl references, but only 2 bits are used
- Objects with more than two simultaneous references won't get collected until a full dead variable scan is made
Extensions beware!
- Since we have no refcounts, extensions must tell perl when they hold on to PMCs
- Not a huge deal, as we piggy-back on the cross-interpreter PMC tracking we use for threads
- No more struct PMC; in extensions...
Extending Perl 6
Extensions Made Easier
- Perl 6 will have a real API
- The API is multilevel
- Simple for embedders
- More complex for extension authors
- Pretty messy for vtable or opcode writers
- Binary compatibility is a very strong consideration
Embedding
- Guaranteed stable and binary compatible for the life of perl 6
- Very simple API
- Create interpreter
- Destroy interpreter
- Parse source
- Run code
- Register native functions
Extensions
- Much simpler interface to perl's internals
- The gory details are hidden
- Stable binary compatibility is a very strong goal
- We may add functions or options, but we won't take them away
- Extensions built for perl 6.0.1 should still run with perl 6.8.12 without rebuilding
- Manipulating perl data should be much easier
- If you have to resort to Inline to wrap a library then it means we've not got it right
Extensions (cont)
- Inline, or something like it, is probably going to be the standard for extending perl
- XS, when you have to resort to it, will be far less nasty than it is now
Homegrown Opcodes and Vtables
- This is part of the grubby inside of perl 6
- You can use any of the internal routines of perl
- If you do, though, you may run into backward-compatibility issues at some point. (If it's not part of the embedding, utility, or extension API, we make no promises)
- There's no guarantee that calling conventions won't change.
- No guarantees that perl 6.4 will even use vtables or opcodes
Utility library
- Perl 6 will provide a set of utility routines to handle common tasks
- String manipulation
- Encoding changes (Shift-JIS to Unicode, EBCDIC to ASCII)
- Conversion routines (string to int or float)
- Extended precision math (int and float)
- These will be stable, like the rest of the API
Variations on a Theme
- "Tocatta and Fuge in perl minor by Wall"
The source doesn't have to be perl
- The parser isn't obligated to be parsing perl
- Input source could be Python, Ruby, Java, or INTERCAL
- The full perl parser is optional
The interpreter doesn't have to interpret
- The interpreter is the destination for bytecode, but it doesn't have to interpret it
- It might save directly to disk
- It might translate the bytecode into an alternate form-Java bytecode, .NET code, or executable code, for example
- The interpreter might translate to machine code on the fly, as a sort of JIT compiler. (Well, really a TIL, but...)
HTMLified from Dan's powerpoint talk by Daniel Allen (da@coder.com) on 15 Aug. 2001.

