Many times I have come across the need to serialize a whole bunch of Ruby objects, with all their references.
This is often useful to save the state of a program at one point, and retrieve it later in another session.
Then I looked for a solution which could guarantee the following:
- Serialize directly Ruby objects, without having to convert/copy them to a specific format (no ORM, just native API).
- Be efficient enough to handle a lot of objects (either big objects or a ton of smaller ones). Namely several hundreds of Mb of data.
- Serialize user-made objects (not just native types).
- Serialize/deserialize using the same format in different Ruby versions.
- Be backward compatible when new serialization versions come out.
- Be optimized enough to not serialize twice objects that are shared among several Ruby objects.
- Be able to deserialize shared objects without duplicating them (keep references, don’t memory copy).
And I could not find a simple solution to handle that.
Most of the serialization libraries I found either
- require a JSON/YAML kind of data conversion (and mem copy),
- were not compatible across Ruby versions (yes I’m looking at you Marshal),
- or were wasting space and memory by keeping a human-readable format.
So I decided to create one using what I found best in other solutions: ruby-serial.
Basically, ruby-serial is using the great MessagePack serialization library to encode on-the-fly Ruby objects (without mem copy) by simulating a JSON-like structure for user-made objects and shared object references.
Its API is quite straight-forward (same as Marshal) and can be customized easily.
Here is an example of its installation and usage:
> gem install ruby-serial
require 'ruby-serial' # Create example class User attr_accessor :name attr_accessor :comment def ==(other) other.is_a?(User) and (@name == other.name) and (@comment == other.comment) end end shared_obj = 'This string instance will be shared' user = User.new user.name = 'John' user.comment = shared_obj # shared_obj is referenced here obj = [ 'My String', shared_obj, # shared_obj is also referenced here 1, user ] # Get obj as a serialized String serialized_obj = RubySerial::dump(obj) # Get back our objects from the serialized String deserialized_obj = RubySerial::load(serialized_obj) # Both objects are the same puts "Same? #{obj == deserialized_obj}" # => true # The shared object is still shared! puts "Shared? #{deserialized_obj[1].object_id == deserialized_obj[3].comment.object_id}" # => true
Currently ruby-serial is still young, and may lack some features.
It is under active development, so feel free to report any problem you may find in using it.
Complete documentation is accessible here. RDoc here.
Contributions are highly welcomed!
- You can open tickets for bugs and features.
- Feel free to fork the source code from GitHub.
- The tests can be run using
rake test
.
Enjoy!