Tag Archives: crazy project

The Bootstrap Chronicles – The end of a phase

As some of you already know, the GSOC project edition 2012 is coming to an end.  And along with it, the bootstrap project reaches a checkpoint. This post covers is the news since the last chapter, and discusses about the future steps. In a next post I’ll document  the details of the project deliverables.

Where we where?

The first product of this project was a renewal of an image serializer: the SystemTracer. It takes an object graph and serializes it into the image format a vm works with. The System tracer was refactored reifying the memory object formats and updated to write Cog images.

The second step was to work on bootstrapping. And it was successful.  Hazelnut, the bootstrap process tool, is able to build a smalltalk image from a description. To ensure the quality and health of these newly created images I set up jenkins jobs loading different packages on top of them, and running tests over them.

What happened since last time: Declarative kernel descriptions

A kernel definition has two main parts:

  • the code and definitions of the entities of that kernel
  • a definition on how to build the basic model of that kernel and how to initialize it finally

The first one can take the form of source files like in https://github.com/guillep/PharoKernel, which is the source code Hazelnut actually uses to bootstrap Pharo images. The second part of the kernel definition contains some imperative parts, and by now they are declared in simple pharo classes you download from Monticello.

So now, the bootstrap loads the kernel definition from those source files, generates the bootstrapped environment, and serializes it into a new image file.

The future of bootstrapping pharo

Since our goal is to bootstrap pharo to support it’s modularity and  evolution, there are some keypoints to attack in the near future:

  • getting the pharo sourcecode in sync with this bootstrap representation
  • choosing the really important parts for a kernel. What should be and what should not in those source files? Where do we package what’s not going kernel?
  • building pharo from bootstrapped images.

Even, when looking at the upcoming pharo changes like first class slots and class layouts, or the new Tanker package manager, the bootstrap will need for sure some updates.

Conclusions

I hope this project makes pharo grow and get better! We can now generate images with the source code defined statically in source files, so for the GSOC program the scope has been fulfilled.

See you in a next post documenting the project!

Salut!

The Bootstrap Chronicles Chapter 4 – Pump up with Fuel

Last time we were generating a new image with useful information.  This post tells the story after that first bootstrap: How we ensure that little monster is healthy, and how we ensure that our process is flexible and robust enough, and how does it help pharo in the modularization cruzade.

Fuel – The modularization sword

Probably some of you already know about Fuel: a fast object serializer written by Mariano Peck and Martín Dias. And maybe you are also aware of another Google Summer of Code Martín is working on: a Binary Package manager on top of fuel. Mariano showed us already a first proof of concept of that idea in this post: http://marianopeck.wordpress.com/2011/09/24/importing-and-exporting-packages-with-fuel/.

So, I want to work on building stuff on top of my little bootstrapped image.  And I thought Fuel was a nice gun to attack that problem.

Detecting illnesses

In my last post about the bootstrap I’ve already shown you how to get a list of broken/uninitialized stuff. That gives us an idea of how healthy our image is.  But there are other tools that we already use for that: tests.

So, Can we run SUnit on the bootstrap? Yes.

When I thought about Fuel and the bootstrap, I thought Fuel should be included by default, otherwise it should be difficult make it grow.  Of course I could’ve chosen the compiler for the same purpose.  But I’d like to enable modularization with binary packages.

So, what about exporting sunit as a binary package, and import it in the bootstrap? And what if we also export the tests over sunit, and include them also? Then we should be able to run the tests of sunit. Nice. Then this same idea can be applied to tests the kernel, or the compiler…

And I did it :)

Completeness (or “does it have all its essential parts?”)

Another thing we can think on is testing the completeness.  When should you consider that the bootstrap is complete? A fun definition could be “when it is able to define itself”. Ok, let’s export the bootstrapping code with fuel, and import it in the bootstrapped image, and let’s try to bootstrap from the bootstrap. And do it again, just for fun.

Once the bootstrap bootstrap was working, I put the tests to work on the last generation of bootstraps.

The results

I’ve created some Jobs to test all this stuff in the Ecole de Mines’ Jenkins.

The test results are exported in JUnit format, so we can tell what’s broken and look at the stack trace.  All this jobs are working on the bleeding edge of the project using latest Pharo image and latest CogVM.

Have Fun!

Guille

The Bootstrap Chronicles Chapter 3 – It’s Alive!

So now that you now a bit what the bootstrap is about and what are some of the problems to face.  I’ll show you some solutions and progress for real.

How does the bootstrap implementation work

To bootstrap a Smalltalk environment, you need to create a new environment, with it’s classes and objects, and initialize some state in them. I could have done that in C, using mallocs and initializing everything by hand using plain memory :). But doing it in smalltalk is easier: you have late binding, polymorphism, closures…  Even, once you have done your first steps in the bootstrap, you can send messages to your objects. THAT is nice.

So, that is the way we cho0se to go: A new environment is created (a guest) into the current environment (the host).  The guest will have it’s own classes and objects.

What about the special objects of the vm?  We share them with the host environment, because if not, out new image will not be able to run… Afterwards, when this new image is written into an image file, we will swap references to point to our own new special objects array.

Here is a picture of how it looks like:

How the host and guest are related

The Current Version

I’ve been through several versions of the image with different capabilities, sizes, correctness. You have to know, everything you forgot to initialize, or initialize in a wrong order, or if you took an extra object you do not need, you will have a not running image, or one that carries all the objects in the host also…

Another thing is that current version takes a sample of the objects in the host to build the guest. I’m already working on starting from scratch + source code, but that is the future and I like enjoying the present :).

So, how do you load the current code?

Gofer it
    url: 'http://www.smalltalkhub.com/mc/Guille/Seed/main';
    package: 'ConfigurationOfHazelnut';
    load.

(ConfigurationOfHazelnut project version: '1.3') load.

Also, as the code is very sensitive on what you do,  it is also sensitive on what image you’re running it on.  If you play with it in a wrong/different image, you will have different/unexpected results.  So, I suggest you to use the same image as me for testing it: Latest Pharo 2.0.  In particular, I’ve tested it on versions 20133 and 20134.

Do not scare when loading the configuration on Pharo 2.0 for the first time It will raise an error. It is an issue related with unzipping old mcz in Pharo (http://code.google.com/p/pharo/issues/detail?id=6054) which will fortunately fixed soon. Just close the debugger, try again, and get the project working.

And, How do I try these weird stuff?

With Nicolas Petton, we have written some examples in the class named HazelBuilderExamples.

To run them, you can try the following scripts:

"Writes an image which when opened prints a spaceTally on fileok.txt"
HazelBuilderExample new buildImageWithSpaceTally

"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt"
HazelBuilderExample new buildImageWithBrokenReferencesReport

"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt. The report is written in the xml format Jenkins Junit plugin likes."
HazelBuilderExample new buildImageWithBrokenReferencesReportForJenkins

After evaluating this code, you’ll have some bootstrapped image and changes files. Open that file with your CogVM and wait until it closes.  Then have a look at the fileok.txt file :).

Of couse you can look at the code in the examples, and try to build your own. Email me if you have ideas to improve this :).

There are also probably some problems with file overwritting that came with some latest Pharo changes with the file management.  Please, if you notice this, just remove the bootstrapped* files from the folder where your image is and try again.

Hey! How is this thingy useful?

Well, if you’ve had a look at the examples, the three examples I’ve shown you are very useful:

  • The first one tells you how the space is distributed in the new image. You can use this knowledge to attack space problems if you want an even smaller image.
  • The second one is used to detect bugs in the pharo Packaging or in the initialization process: You have not initialized some classes, or the have been initialized but the initialization code does not initialize all the variables.
  • The third one is the same as the second, but adapted to have this kind of CI integration: http://car.mines-douai.fr/ci/job/Seed%20Broken%20References%20Report/

Nice huh? Now we can use jenkins to validate the core of Pharo is well initialized and well packaged when that list becomes empty.

What’s next?

  • Each one of those things in the list should be tracked as issues.
  • continue working on the bootstrap from sourcecode.
  • Making the fuel seed work so we can install fuel packages on our little image.

Keep u updated!

Hasta Luego!

Guille

The Bootstrap Chronicles Chapter 2 – Do not mess with the VM

Everything you want in life has a price connected to it. There’s a price to pay if you want to make things better, a price to pay just for leaving things as they are, a price for everything.

Harry Browne

And I found myself trying to bootstrap for real. But of course it was not going to be easy. I had to pay the Iron price.

The VM we use plays a very important role in the day to day development. It is the one in charge of defining the method lookup, garbage collection, some platform dependent code, some optimizations. And as it does some nice things for us, it also puts restrictions on what we do. Have you ever heard about the special objects array? The compact classes array? Primitives? We are going to talk a bit about them and other secrets, and how they bother in the bootstrap process :).

The Special Objects Array

The special objects array is an array shared between the VM and the image. This array points to some objects that are important or interesting to the VM, you can have a look at it inspecting the following expression in Pharo:

Smalltalk specialObjectsArray

If you have an overview, you will see some things like the Processor instance, the Array, Smalltalk, SmallInteger, Float, Compiled method, Semaphore classes, some Semaphore instances…

What does the vm do with them? It for example introduces hard validations against concrete classes -yeap, like checking if an instance’s class is the same as the object in it’s slot 20, which BTW is Character…

Ok…

Doing those validations through messages could be too expensive in terms of speed. If you want to be fast, you have to pay some price. If you want to have a tiny mermory footprint, you have to pay some price. There are side effects for decisions in general…

So, some may wonder, Why is this array so important for the bootstrap? Imagine I want to have a new Array class, Class class, Character class, and a new CompiledMethod class.  What should happen if the VM does not recognize them as I would like? CogVM only recognizes one special objects array.

The solution? Hack and cheat.  You choose, you can cheat on the VM side, or in the image side.  Each has a price to pay.  But today is not the day for telling you how I cheated :).

Now, look at the field 29 of the special objects array. It is another array, …

The Compact Classes Array

U remember about compact objects?  If not, you can refresh in here: https://playingwithobjects.wordpress.com/2012/05/30/understanding-object-formats-in-cogvm/.

In two words, compact objects do not have an extra header for the class pointer: they have some bits in it’s base header which is an index into this nice compact classes array, where it’s class is. This mechanism is normally used for classes with tons of instances, saving 1 header for each object. Complexity against space.

Here again, we have the same problem.  Even worse, having this guy here means that if I have my nice Array’ class, which is also compact, and it’s compact class index points to the original Array class, the method lookup will end up in the original class instead of mines :(.

The solution? Hack, and cheat.

The Primitives

Now think what happens when my bootstrap classes use primitives methods. It’s nice because the vm returns me the objects it wants :). It’s actually a sinptom of the last two points. But it is good to know that primitives can give you headaches…

Other problems? Of course…

Literals: I can’t change so easily SmallInteger’s class because it is an inmediate object for example. The same happens with the other numbers, or strings, or blocks.  They all give you headaches. Even if I could make it work with the VM, I should change the compiler to use a different set of classes…

Vm magic assumptions: Like class instances’ first three instance variables are superclass, method dictionary and format.  In that order. Try changing the order :).  Or doing something like:

myA := A new.
A become: 'hello'.
myA crashTheVM.

And there are some other like this. So far I know LinkedList, ProcessScheduler, and Class. Find your own Waldo!

You already know what the solution is, do you?

HACK AND CHEAT

Yeah, this is what the bootstrap is really about. And learning hardcore stuff too :). Of course I can’t tell you every detail because this post will be larger than anyone would care to read.

So, I’ll keep you updated, I have now to continue paying the iron price.

$33 ¥0µ £473r!

9µ1££3

Bootstrapping: finding the missing link

A few months ago I got involved in some crazy project: bootstrapping Pharo. I took some existing code, played with it, hacked it, modified it, understood it. Now I think I have some idea of what is a bootstrap and what are it’s advantages. I’ll try to give a brief introduction to the project: what is it about, advantages, an overview of the current military secret results, and an insight of what is to come.

I recommend you to have look at my last post (the image dilemma) before reading.

This project is one of the ESUG projects supported this year by the google summer of code program.

What is a Bootstrap

The encyclopedic definition: A Bootstrap of system is a process that can generate the smallest subset of that system that may be used to reproduce the complete one.

I mean, you have an explicit process that can generate a the minimal version of your system.

Ok, easier: You kick yourself to get impulse and start from a better place :).

Then, bootstrapping software systems or languages normally means that you will somehow enhance the original process that created the system.

Some examples to clarify:

  • When your computer is turned on, some one has to bootstrap the little program able to load other programs :).
  • Generating a development environment with very basic tools will improve your work a lot (when you use your development environment).
  • C compiler is written in C. That means that It somehow compiles itself. Of course if you do not like assembly much, reading the C implementation is a great improvement.
  • Pharo implements traits and uses them in the core of the system to empower the design.
  • A big part of Pharo’s VM is written in smalltalk!

The need of bootstrapping Pharo

Did you know the image you download from the pharo website is the same one that comes from years ago? I mean, not the exactly same one, but a binary copy :). The fact is that years ago (yeah, ancient history) god created the first image and it started evolving, one little change after the other, to the Pharo we know today.

Now, as in evolution, we found in Pharo missing links.  We do not know how some object became the way it is. Or how it was initialized.  The code is simple nowhere, it’s a missing link. Also, as years passed, our ancestor became chaotic. It grew in many different uncontrolled and unordered ways. Since the Pharo Project started, one if it’s goals was cleaning this mess, but re-modularising and cleaning the system is a hard, long, and bothersome process.

The outputs and advantages of Bootstrapping Pharo will be:

  • getting tools to detect problems: bad dependencies, unexistent initializations, code that really do not work but was never executed before.
  • This initialization process will be explicit and open.
  • We will be able to start the next Pharo from scratch, and since we will be able to change this explicit process, our next generation Pharo will be cleaner and fancier. It will be able to acquire easily new features: namespaces/modules, security, remote tools, mirrors.
  • But also, since it will allow people to create a custom system, researchers will have an invaluable tool to fulfil their own purposes.  They will be able to experiment

Current status of the project

The project has already had a first output, which is the image writer.  The image writer is a little tool that traces a graph from an SmalltalkImage object, and deploys that graph into a .image file.  I’ll talk about this sub project in a future post.

The rest of the code is a military secret yet. Ok, I can give it to you, but you have to be responsible if it blows up on your face :).

The results of the project so far are:

  • It can create an Smalltalk image living inside another image.
  • This inner/guest image can be written in a new .image file.
  • With this approach a small kernel of 1.1MB has been reached.
  • SpaceTally runs and prints reports to understand how the space is distributed among objects.
  • A tool to detect every uninitialized class variable/class instance variable and references to unexistant globals was developed.

Soon we will have all these public on Pharo Jenkins server.

Next Steps

  • Jenkins jobs :)
  • Taking jenkins feedback to speed up the cleaning process
  • Remodularizaton to get even a smaller kernel
  • Maybe some little experiment to bootstrap MicroSqueak and learn from it
  • Bootstrap from source code.

I’ll try to keep you updated often.  BTW, any ideas, critics or suggestions are welcome. I’m not a gurú, I’m just learning, as everyone :).

Saludos!