Tag Archives: Living environment

The Bootstrap Chronicles Chapter 3 – It’s Alive!

So now that you now a bit what the bootstrap is about and what are some of the problems to face.  I’ll show you some solutions and progress for real.

How does the bootstrap implementation work

To bootstrap a Smalltalk environment, you need to create a new environment, with it’s classes and objects, and initialize some state in them. I could have done that in C, using mallocs and initializing everything by hand using plain memory :). But doing it in smalltalk is easier: you have late binding, polymorphism, closures…  Even, once you have done your first steps in the bootstrap, you can send messages to your objects. THAT is nice.

So, that is the way we cho0se to go: A new environment is created (a guest) into the current environment (the host).  The guest will have it’s own classes and objects.

What about the special objects of the vm?  We share them with the host environment, because if not, out new image will not be able to run… Afterwards, when this new image is written into an image file, we will swap references to point to our own new special objects array.

Here is a picture of how it looks like:

How the host and guest are related

The Current Version

I’ve been through several versions of the image with different capabilities, sizes, correctness. You have to know, everything you forgot to initialize, or initialize in a wrong order, or if you took an extra object you do not need, you will have a not running image, or one that carries all the objects in the host also…

Another thing is that current version takes a sample of the objects in the host to build the guest. I’m already working on starting from scratch + source code, but that is the future and I like enjoying the present :).

So, how do you load the current code?

Gofer it
    url: 'http://www.smalltalkhub.com/mc/Guille/Seed/main';
    package: 'ConfigurationOfHazelnut';
    load.

(ConfigurationOfHazelnut project version: '1.3') load.

Also, as the code is very sensitive on what you do,  it is also sensitive on what image you’re running it on.  If you play with it in a wrong/different image, you will have different/unexpected results.  So, I suggest you to use the same image as me for testing it: Latest Pharo 2.0.  In particular, I’ve tested it on versions 20133 and 20134.

Do not scare when loading the configuration on Pharo 2.0 for the first time It will raise an error. It is an issue related with unzipping old mcz in Pharo (http://code.google.com/p/pharo/issues/detail?id=6054) which will fortunately fixed soon. Just close the debugger, try again, and get the project working.

And, How do I try these weird stuff?

With Nicolas Petton, we have written some examples in the class named HazelBuilderExamples.

To run them, you can try the following scripts:

"Writes an image which when opened prints a spaceTally on fileok.txt"
HazelBuilderExample new buildImageWithSpaceTally

"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt"
HazelBuilderExample new buildImageWithBrokenReferencesReport

"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt. The report is written in the xml format Jenkins Junit plugin likes."
HazelBuilderExample new buildImageWithBrokenReferencesReportForJenkins

After evaluating this code, you’ll have some bootstrapped image and changes files. Open that file with your CogVM and wait until it closes.  Then have a look at the fileok.txt file :).

Of couse you can look at the code in the examples, and try to build your own. Email me if you have ideas to improve this :).

There are also probably some problems with file overwritting that came with some latest Pharo changes with the file management.  Please, if you notice this, just remove the bootstrapped* files from the folder where your image is and try again.

Hey! How is this thingy useful?

Well, if you’ve had a look at the examples, the three examples I’ve shown you are very useful:

  • The first one tells you how the space is distributed in the new image. You can use this knowledge to attack space problems if you want an even smaller image.
  • The second one is used to detect bugs in the pharo Packaging or in the initialization process: You have not initialized some classes, or the have been initialized but the initialization code does not initialize all the variables.
  • The third one is the same as the second, but adapted to have this kind of CI integration: http://car.mines-douai.fr/ci/job/Seed%20Broken%20References%20Report/

Nice huh? Now we can use jenkins to validate the core of Pharo is well initialized and well packaged when that list becomes empty.

What’s next?

  • Each one of those things in the list should be tracked as issues.
  • continue working on the bootstrap from sourcecode.
  • Making the fuel seed work so we can install fuel packages on our little image.

Keep u updated!

Hasta Luego!

Guille

The Image Dilemma

Without self knowledge, without understanding the working and functions of his machine, man cannot be free, he cannot govern himself and he will always remain a slave.
Goerge Gurdjieff

Many people I’ve talked to think Smalltalk is weirdy because it has an image.  Funny thing is most people that thinks that, do it because they feel it’s different.  Or they do Java.  Of course, Java and Smalltalk are different :). Okok, joke, I’m not interested in flamewars. Seriously what puzzles me is that normally people’s opinion on technology is based on feelings instead of concrete rational arguments.

So, let’s think a bit on the pros and cons of using an image and not using it.  Really. Only after understanding a bit you can decide whatever you want to use. That’s what the quote at the start is there for :).

Also, this post will be one of the corner stones to explain the bootstrap gsoc project basics later ;).

Non Image Based Development Environments

Ok, we all know these. It’s the kind of environment we use when we code in C, Java, Python, Ruby, JavaScript and many other.

We have source files which will be somehow interpreted on runtime or traduced to machine code in an executable file.  Naively explained, the interpreted ones normally depend on another program normally called interpreter or VM and the other ones depend only the first time on a compiler. So far, nothing strange.

Image Based Development Environments

Image based environments are the main topic of this post.  But before talking about it’s pros and cons, we have to define what is an image. And to avoid preconceptions, let’s name them snapshots. A snapshot is a photo taken of a system in a moment of time x, from which we can rebuild the whole system again.

If you have ever used VirtualBox or VMWare solutions you know what I am talking about: you have, for example, a linux mint virtual machine, which can be opened with your virtual box software.  If you save the state of your linux mint with virtual box, the next time you open it, it will wake up in the same state you left it. It will not start everything from scratch unless you tell it explicitly. And if you quit without saving, the next time will open without your changes.

It’s a very simple concept:  you are treating a whole running system (a linux machine in this case) as any of your other data files.

So that’s how most of the smalltalk systems work: you have a running smalltalk system, and when you wake it up, it will start from where it was left. Even if you was in the middle of a computation, this computation will continue running.

Comparison

System Configuration

How heavy or reproducible is our system configuration?

non image based

When a program start it normally does some basic initialization, configuration loading, setup stuff…  Every time our program does this same basic stuff.  It’s a repetitive task, but it’s done by the machine, so I’m ok with it.

Even more, one of the advantages of loading the application again from scratch is that you are testing your building process. And having a working artifact after ensures that you will be able to rebuild it in the future following the same steps.

But have you ever wondered how much time takes a java class loader to load the classes you will use?  And how much time takes Hibernate or Spring or JBoss to read your configuration, and provide an usable environment?  And how about to compile a large C/C# application? Ok, those times are machine time. You can go to take a coffee every time you restart your tomcat, or change a source file with many dependencies. That time is not even important in a deployed product, since it is delayed to the user or webserver which may run it once every hours/days. But it is important some times.  For example, when you are developing software. Software development is a highly demanding task which requires concentration. Taking a coffee every 15 minutes makes you lose concentration. And you can’t avoid checking/compiling/testing your code for long periods of times because handling lots of changes only in your head it’s a really hard task.

image based

When an snapshot is taken, the whole objects and computations that were occupying some memory in your program will be saved in a file in the state they were.  Object references are not broken. Then, when you load your smalltalk snapshot again, the objects take place in memory, and continue working as they were.  Normally there is not heavy initialization on image startup other than reallocating resources from outside the environment (files, sockets are OS resources and become invalid for sure when the system is halted).  You don’t have to configure almost none of your program neither, because it is already configured!

Now, since you don’t have to reinitialize your configurations from scratch every time, maybe you initialized it, then accidentally deleted the method that performed the configuration. And then you are screwed, because if you want to start from scratch you will have pieces of your program lost that will make it impossible.  This is why we may consider an image based system and evolving environment.  Because the system is changed by mutations that may get lost or untracked.

The Development process

How is our development process altered by the environment we use?

non image based

  1. Write code in a file
  2. Compile or similar or however you want to call it :).
  3. Run the whole program from scratch
  4. Test
  5. Go to 1

This process is slooow.  Specially because of steps 2/3.  For every change you do, no matter if it is long of little, you have to rebuild everything and lost time with it, and reinitialize, and reconfigure…  Even if you only wanted to replace a 2 by a 3, or fix a typo in a string.

However, there exist currently some tools that implement what they call hot deployment which is nothing else that exploiting some dynamic features that let a program change while it’s running.  But hey!  That’s what we do with an image based approach: we change the program while it’s running.

Anyway, those extra steps makes you gain reproducibility by making some parts of the process a bit more explicit.  At the cost of extra bureaucracy.

image based

  1. Open your image
  2. Write code
  3. Evaluate/Accept
  4. Test
  5. Go to 2

Here the main difference is that the modification of our program is made of little deltas during execution.  We replace/change/create methods while our whole program is running.  We create classes, objects, modify class hierarchies while our whole program is running.  We alter our objects state while our whole program is running. This really improves the development process, since the time you have to wait to receive feedback from execution or testing is almost none. And there is less overhead, less context switch in your head.

Now the problem is that to reproduce our program we have to track all the little changes we’ve done.  And understand them, and discard the ones that really overlap o do not make sense.  In the Smalltalk System Pharo we have a .changes file that works as a log of the changes done.  This changes file aims to reduce the impact of this dynamic feature disadvantages.

IDE’s Features

Have you ever used an IDE with refactoring support, syntax highlighting, nice code browsing capabilities?  What do our tools need in order to provide us all those gourgeous features?

IDE are meta-programs. They are programs that help us manipulate programs: modify them, understand them, query them.  And to do that they can do it by:

  • directly modifying source code without a model behind.  Painful and Hard.  I would never try to do it that way :).
  • modeling the concepts of the language they will manipulate to make it easier.

non image based

A program’s meta model (a model representing program entities such as classes or variables instead of domain entities like bank accounts and clients) is often generated when the program runs if you have reflective capabilities in the manipulated language. But IDEs should work while the program is not running, or even if it does not provide reflection. So, how does this approach build this kind of tools?

They have an alternative model and an alternative parser/compiler which generates this model from sourcecode.  A lot of extra work is needed for information that is already there. The problem is that information is lost in text files with source code instead of being stored in a more malleable format.

image based

Our image based approach can contain all the meta information in the snapshot, alive, as first class objects.  This means that we do not have to create a duplicated meta-model, nor an extra parser/compiler.  We can use the ones that are alive in the image.  We make use of the model created by the language, what we can’t do in the non image based approach.

Files or no files

Some times you feel comfortable modifying files, some times not.

non image based

You have files with your source code. You can use them to execute/compile your program, you can store them in svn/git/bazar repositories, you can diff/grep/more/less/find over them just using a terminal.

I think this is the main pro over the image based approach. There are plenty of tools working on plain text files you can use if your source code lives in this kind of storage.

But try to write a program implementing an extract method refactoring over a piece of text.  You have to write a very complex and large program analyzing classes, methods, scopes…

In two words: It’s really cool to use existing tools.  But it is not that nice to write your own.

image based

As the opposite of the not having an image, we can’t just use our nice text manipulation tools on our image file.  We have to rebuild them on our system, or externalize the source code in files to use them.  And then re-insert the feedback in the system.

But there are other tasks-like refactorings and meta programming in general- that become very simple just by the fact of manipulating objects instead of text.

In the Pharo project there are plenty of existing and arising tools aiming us to interact with the outer world, just like:

Conclusion

As I’ve shown you, these two approaches have their good and bad stuff.  None of them is the silver bullet, and we should be aware of that to be better developers.

I hope you are a little more free now ;).

さようなら!