dev.perl.org
 
<%method title>Perl 6 Internals Dan Sugalski
TPC 5.0

"Here there be dragons"

The big goals of perl 6's internals

  • Speed
  • Extendibility
  • Cleanliness
  • Compatibility
  • Modularity
  • Thread Safety
  • Flexibility

Some global decisions

  • The core will be in C. (Like it or not, it's appropriate for code at this level)
  • The core must be modular, so pieces can be swapped out without rebuilding
  • It must be fast
  • Long-term binary compatibility is a must
  • Your average perl coder or extension writer shouldn't need any info about the guts
  • Things should generally be thought out, documented, and engineered

The quick overview

  • Parser
  • Compiler
  • Optimizer
  • Runtime engine

The parser

  • Where the whole thing starts
  • Generally takes source of some sort and turns it into a syntax tree

The Bytecode Compiler

  • Turns a syntax tree into bytecode
  • Performs some simple optimization

The optimizer

  • Takes the plain bytecode from the compiler and abuses it heavily
  • An optional step, generally skipped for compile-and-go execution
  • Should be able to work on small parts of a program for JIT optimization

The Interpreter

  • Takes compiled (and possibly optimized) bytecode and does something with it
  • Generally that something is execute, but it might also be:
    • Save to disk
    • Translate to another format (.NET, Java bytecode)
    • Compile to machine code

The Parser

  • "Double, double, toil and trouble
    Fire burn, and cauldron bubble"

Parser goals

  • Extendible in perl
  • More powerful than what we have now
  • Retargetable
  • Self-contained and removable

Parsing perl isn't easy

  • May well be one of the toughest languages to properly parse
  • If we get perl right other languages are easy. Or at least easier
  • We have the full power of perl to draw on to do the parsing (Including the regex engine and Damian's Bizarre Idea de Jour)

The Compiler

  • "Mmmmm, tasty!"

From syntax tree to bytecode

  • The compiler takes a syntax tree and turns it into bytecode
  • Very little optimization is done here.
  • Optimization is expensive and optional
  • Pretty straightforward-this isn't rocket science

The Optimizer

  • "We can rebuild it. Make it better, faster, stronger"
  • Takes plain bytecode and makes it faster
  • Does all the sorts of things that you expect an optimizer to do-code motion, loop unrolling, common subexpression work, etc.
  • Will be an iterative process
  • This will be interesting, as perl's a pain to optimize
  • An optional step, of course

Things that make optimizing perl tough

  • Active data
  • Runtime redefinitions of everything
  • Really, really late binding (Waiting for Godot late)
  • Perl programmers are used to more predictable runtime characteristics than, say, C programmers.

The Interpreter

  • "Polly want a cracker?"

Interpreter goals

  • Fast
  • Tuned for perl
  • Language neutral where possible
  • Event capable
  • Sandboxable
  • Asynchronous I/O built in
  • Built with an eye towards TIL and/or native code compilation
  • Better debugging support than perl 5

The perl 6 interpreter is software CPU

  • Complete with registers and an assembly language
  • This can make translating perl 6 bytecode into native machine code easier
  • There's a lot of literature on building optimzing compilers that can be leveraged
  • While more complex than a pure stack-based machine, it's also faster
  • Opcode dispatch needs to be faster than perl 5
  • Opcode functions can be written in perl

CPU specs

  • 64 int, float, string, and PMC registers
  • A segmented multiple stack architecture
  • Interrupt-capable (for events)
  • Pretty much completely position independent-everything is referenced via register, pad entry, or name

The regex engine

  • The regex engine is going to be part of the perl 6 CPU, not separate as it is now
  • A good incentive to get opcode dispatch fast
  • Makes expanding the regex engine a bit easier
  • Details will be hidden as a set of regex opcodes

A few words on the stack system

  • Each register file has an associated stack
  • All registers of a particular type can be pushed onto or popped off the stack in one go
  • Individual registers or groups of registers can be pushed or popped
  • The stacks are all segmented so we're not relying on finding contiguous chunks of memory for them
  • There's also a set of call and scratch stacks

Bytecode

  • "Could you say that a little differently?"

What is bytecode?

  • A distilled version of a program
  • Machine language for the PVM
  • Can contain a lot of 'extra' information, including full source
  • Designed to be platform independent
  • Should be mostly mappable as shared data (modulo the fixup sections)

Data Structures

  • "Vtables and strings and floats, oh my!"

Variables

  • Generically called a PMC
  • Bigger than Perl 5's base data structure
  • Synchronization data built-in
  • Same for all variable types
  • GC data is not part of base structure

Scalars

  • Built off the base PMC structure
  • Use the integer and float areas as caches
  • Data pointer points off to string, large int, or large float
  • Vtable functions determine how it all works

Arrays

  • Built off the base PMC structure
  • Data pointer points to array data
  • All perl 6 arrays are typed
  • May have an array of scalars, strings, integers, or floats
  • Array only takes up enough memory to hold their types

Hashes

  • Built off the base PMC structure
  • Data pointer points to array data
  • All perl 6 hashes are typed
  • May have a hash of scalars, strings, integers, or floats
  • Hashes only takes up enough memory to hold their types
  • Hashing function is overridable

Strings

  • Strings are sort of abstract
  • Perl 6 can mix and match string data (Unicode, ASCII, EBCDIC, etc)
  • New string types can be loaded on the fly

String handling

  • Perl 6 has no 'built-in' string support-all string support is via loadable libraries
  • There'll be Unicode, ASCII, and EBCDIC support provided (at least) to start

Numbers

  • Bigints and bigfloats share the same header
  • Arbitrary-length floating point and integer numbers are supported
  • Perl automagically upgrades ints and floats when needed

Vtables

  • All variable data access is done through a table of functions that the variable carries around with it
  • This allows us faster access, since code paths are specialized for just the functions they need to perform
  • Isolates us from the implementation of variables internally
  • Allows special purpose behaviour (like perl 5's magic) to be attached without cost to the rest of perl

Vtables (cont'd)

  • Makes thread safety easier
  • A little bit more overhead because of the extra level of indirection, but the smaller functions make up for that
  • Vtable functions can be written in perl. (Each class with objects blessed into it will have at least one)
  • There may be more than one vtable per package

Vtables hide data manipulation

  • Pretty much all the code to handle data manipulation will be done via variable vtables
  • Ths allows the variable implementation to change without perl needing to know
  • Allows far more flexibility in what you can make a variable do
  • Shortens the code path for data functions and trims out extraneous conditionals

For example:

Fetching the string value of a scalar

For scalars with strings:


String *get_str(PMC *my_PMC) {
  return my_PMC->data_pointer;
}

For int-only scalar:

String *get_str(PMC *my_PMC) {


  my_PMC->data_pointer = 
    make_string(my_PMC->integer);
  my_PMC->vtable = int_and_string_vtable;
  return my_PMC->data_pointer;
}

Memory Management

  • "Now where did I put that?"

Getting headers

  • All the fixed-size things (PMCs, string/number headers) get allocated from arenas
  • All headers, with the exception of PMCs (maybe) are moveable by the garbage collector
  • Non-PMC header allocation is very fast
  • PMC allocation is only mostly fast

Buffer Management

  • Anything that isn't a fixed size gets allocated from the buffer pools
  • All buffered data, with the exception of data allocated in special pools, is moveable by the garbage collector
  • Because of GC, allocation is very quick

Garbage Collection

  • "Bring out yer dead!"

The perl 6 GC is a copying collector

  • Everything except PMCs is moveable in Perl 6
  • PMCs might be moveable too
  • We get a compact memory heap out of this, which allows for fast allocation
  • Perl 6 will release empty memory back to the system when it can
  • Refcounts are used only to note object lifetimes, not for GC
  • Refcounts, for the most part, are dead

GC considerations for Objects

  • Garbage collection and object death are now separate things
  • Perl's guarantee of timely object death is stronger
  • We still don't guarantee perfect collection (but it sucks less)
  • We still refcount for real perl references, but only 2 bits are used
  • Objects with more than two simultaneous references won't get collected until a full dead variable scan is made

Extensions beware!

  • Since we have no refcounts, extensions must tell perl when they hold on to PMCs
  • Not a huge deal, as we piggy-back on the cross-interpreter PMC tracking we use for threads
  • No more struct PMC; in extensions...

Extending Perl 6

Extensions Made Easier

  • Perl 6 will have a real API
  • The API is multilevel
    • Simple for embedders
    • More complex for extension authors
    • Pretty messy for vtable or opcode writers
  • Binary compatibility is a very strong consideration

Embedding

  • Guaranteed stable and binary compatible for the life of perl 6
  • Very simple API
    • Create interpreter
    • Destroy interpreter
    • Parse source
    • Run code
    • Register native functions

Extensions

  • Much simpler interface to perl's internals
  • The gory details are hidden
  • Stable binary compatibility is a very strong goal
    • We may add functions or options, but we won't take them away
    • Extensions built for perl 6.0.1 should still run with perl 6.8.12 without rebuilding
  • Manipulating perl data should be much easier
  • If you have to resort to Inline to wrap a library then it means we've not got it right

Extensions (cont)

  • Inline, or something like it, is probably going to be the standard for extending perl
  • XS, when you have to resort to it, will be far less nasty than it is now

Homegrown Opcodes and Vtables

  • This is part of the grubby inside of perl 6
  • You can use any of the internal routines of perl
  • If you do, though, you may run into backward-compatibility issues at some point. (If it's not part of the embedding, utility, or extension API, we make no promises)
  • There's no guarantee that calling conventions won't change.
  • No guarantees that perl 6.4 will even use vtables or opcodes

Utility library

  • Perl 6 will provide a set of utility routines to handle common tasks
    • String manipulation
    • Encoding changes (Shift-JIS to Unicode, EBCDIC to ASCII)
    • Conversion routines (string to int or float)
    • Extended precision math (int and float)
  • These will be stable, like the rest of the API

Variations on a Theme

  • "Tocatta and Fuge in perl minor by Wall"

The source doesn't have to be perl

  • The parser isn't obligated to be parsing perl
  • Input source could be Python, Ruby, Java, or INTERCAL
  • The full perl parser is optional

The interpreter doesn't have to interpret

  • The interpreter is the destination for bytecode, but it doesn't have to interpret it
  • It might save directly to disk
  • It might translate the bytecode into an alternate form-Java bytecode, .NET code, or executable code, for example
  • The interpreter might translate to machine code on the fly, as a sort of JIT compiler. (Well, really a TIL, but...)

HTMLified from Dan's powerpoint talk by Daniel Allen (da@coder.com) on 15 Aug. 2001.