201 lines
11 KiB
HTML
201 lines
11 KiB
HTML
|
<html>
|
||
|
<body>
|
||
|
<h1>WARNING incomplete and missleading doc!!!</h1>
|
||
|
<p>This package contains public API and introduction</p>
|
||
|
|
||
|
<h2>JDBM intro</h2>
|
||
|
Key-Value databases have got a lot of attention recently, but their history is much older. GDBM (predecessor of JDBM)
|
||
|
started
|
||
|
in 1970 and was called 'pre rational' database. JDBM is under development since 2000. Version 1.0 was in production
|
||
|
since 2005 with only a few bugs reported. Version 2.0 adds some features on top of JDBM (most importantly <code>java.util.Map</code>
|
||
|
views)
|
||
|
<p/>
|
||
|
JDBM 2.0 goal is to provide simple and fast persistence. It is very simple to use, it has minimal overhead and
|
||
|
standalone
|
||
|
JAR takes only 130KB. It is excelent choice for Swing application or Android phone. JDBM also handles huge datasets well
|
||
|
and can be used for data processing (author is using it to process astronomical data).
|
||
|
The source code is not complicated; it is well readabable and can also be used for teaching.
|
||
|
On the other hand, it does not have some important features (concurrent scalability, multiple transaction, annotations,
|
||
|
clustering...), which is the reason why it is so simple and small. For example, multiple transaction would introduce a
|
||
|
new dimension of problems, such as concurrent updates, optimistic/pesimistic record locking, etc.
|
||
|
JDBM does not try to replicate Valdemort, HBase or other more advanced Key Value databases.
|
||
|
<p/>
|
||
|
|
||
|
<h2>JDBM2 is </h2>
|
||
|
|
||
|
<p/><b>Not a SQL database</b><br/>
|
||
|
JDBM2 is more low level. With this comes great power (speed, resource usage, no ORM)
|
||
|
but also big responsibility. You are responsible for data integrity, partioning, typing etc...
|
||
|
Excelent embedded SQL database is <a href="http://www.h2database.com">H2</a> (in fact it is faster than JDBM2 in many
|
||
|
cases).
|
||
|
|
||
|
<p/><b>Not an Object database</b><br/>
|
||
|
The fact that JDBM2 uses serialization may give you a false sense of security. It does not
|
||
|
magically split a huge object graph into smaller pieces, nor does it handle duplicates.
|
||
|
With JDBM you may easily end up with single instance being persisted in several copies over a datastore.
|
||
|
An object database would do this magic for you as it traverses object graph references and
|
||
|
makes sure there are no duplicates in a datastore. Have look at
|
||
|
<a href="http://www.neodatis.org/">NeoDatis</a> or <a href="http://www.db4o.com/">DB4o</a>
|
||
|
|
||
|
<p/><b>Not at enterprise level</b><br/>
|
||
|
JDBM2 codebase is propably very good and without bugs, but it is a community project. You may easily endup without
|
||
|
support. For something more enterprisey have a look at
|
||
|
<a href="http://www.oracle.com/database/berkeley-db/je/index.html ">Berkley DB Java Edition</a> from Oracle. BDB has
|
||
|
more
|
||
|
features, it is more robust, it has better documentation, bigger overhead and comes with a pricetag.
|
||
|
|
||
|
<p/><b>Not distributed</b><br/>
|
||
|
Key Value databases are associated with distributed stores, map reduce etc. JDBM is not distributed, it runs on single
|
||
|
computer only.
|
||
|
It does not even have a network interface and can not act as a server.
|
||
|
You would be propably looking for <a href="http://project-voldemort.com/">Valdemort</a>.
|
||
|
|
||
|
<h2>JDBM2 overview</h2>
|
||
|
JDBM2 has some helpfull features to make it easier to use. It also brings it closer to SQL and helps with data
|
||
|
integrity checks and data queries.
|
||
|
<p/><b>Low level node store</b><br/>
|
||
|
This is Key-Value database in its literal mean. Key is a record identifier number (recid) which points to a location in
|
||
|
file.
|
||
|
Since recid is a physical pointer, new key values must be assgned by store (wherever the free space is found).
|
||
|
Value can be any object, serializable to a byte[] array. Page store also provides transaction and cache.
|
||
|
<p/><b>Named objects</b><br/>
|
||
|
Number as an identifier is not very practical. So there is a table that translates Strings to recid. This is recommended
|
||
|
approach for persisting singletons.
|
||
|
<p/><b>Primary maps</b><br/>
|
||
|
{@link jdbm.PrimaryTreeMap} and {@link jdbm.PrimaryHashMap} implements <code>java.util.map</code> interface
|
||
|
from Java Collections. But they use node store for persistence. So you can create HashMap with bilions of items and
|
||
|
worry only about the commits.
|
||
|
<p/><b>Secondary maps</b><br/>
|
||
|
Secondary maps (indexes) provide side information and associations for the primary map. For example, if there is a
|
||
|
Person class persisted in the primary map,
|
||
|
the secondary maps can provide fast lookup by name, address, age... The secondary maps are 'views' to the primary map
|
||
|
and are readonly.
|
||
|
They are updated by the primary map automatically.
|
||
|
<p/><b>Cache</b><br/>
|
||
|
JDBM has object instance cache. This reduces the serialization time and disk IO. By default JDBM uses SoftReference
|
||
|
cache. If JVM have
|
||
|
less then 50MB heap space available, MRU (Most Recently Used) fixed size cache is used instead.
|
||
|
<p/><b>Transactions</b><br/>
|
||
|
JDBM provides transactions with commit and rollback. The transaction mechanism is safe and tested (in usage for the last
|
||
|
5 years). JDBM allows only
|
||
|
single concurrent transactions and there are no problems with concurrent updates and locking.
|
||
|
|
||
|
<h1>10 things to keep in mind</h1>
|
||
|
<ul>
|
||
|
<li>Uncommited data are stored in memory, and if you get <code>OutOfMemoryException</code> you have to make commits
|
||
|
more
|
||
|
frequently.
|
||
|
<li>Keys and values are stored as part of the index nodes. They are instanciated each time the index is searched.
|
||
|
If you have larger values (>512 bytes), these may hurt performance and cause <code>OutOfMemoryException</code>
|
||
|
<li>If you run into performance problems, use the profiler rather then asking for it over the internet.
|
||
|
<li>JDBM caches returned object instances. If you modify an object (like set new name on a person),
|
||
|
next time RecordManager may return the object with this modification.
|
||
|
<li>Iteration over Maps is not guaranteed if there are changes
|
||
|
(for example adding a new entry while iterating). There is no fail fast policy yet.
|
||
|
So all iterations over Maps should be synchronized on RecordManager.
|
||
|
<li>More memory means better performance; use <code>-Xmx000m</code> generously. JDBM has good SoftReference cache.
|
||
|
<li>SoftReference cache may be blocking some memory for other tasks. The memory is released automatically, but it
|
||
|
may take longer then you expect.
|
||
|
Consider clearing the cache manually with <code>RecordManager.clearCache()</code> before starting a new type
|
||
|
of task.
|
||
|
<li>It is safe not to close the db before exiting, but if you that there will be a long cleanup upon the next start.
|
||
|
<li>JDBM may have problem reclaiming free space after many records are delete/updated. You may want to run
|
||
|
<code>RecordManager.defrag()</code> from time to time.
|
||
|
<li>A Key-Value db does not support N-M relations easily. It takes a lot of care to handle them correctly.
|
||
|
</ul>
|
||
|
|
||
|
<dl>
|
||
|
</dl>
|
||
|
|
||
|
|
||
|
|
||
|
<!-- $Id: package.html,v 1.1 2001/05/19 16:01:33 boisvert Exp $ -->
|
||
|
<html>
|
||
|
<body>
|
||
|
<p>Core classes for managing persistent objects and processing transactions.</p>
|
||
|
|
||
|
<h1>Memory allocation</h1>
|
||
|
This document describes the memory allocation structures and
|
||
|
algorithms used by jdbm. It is based on a thread in the
|
||
|
jdbm-developers mailing list.
|
||
|
<p/>
|
||
|
<ul>
|
||
|
<li> A block is a fixed length of bytes. Also known as a node.
|
||
|
<li> A row is a variable length of bytes. Also known as a record.
|
||
|
<li> A slot is a fixed length entry in a given block/node.
|
||
|
<li> A node list is a linked list of pages. The head and tail of each
|
||
|
node list is maintained in the file header.
|
||
|
</ul>
|
||
|
Jdbm knows about a few node lists which are pre-defined in Magic,
|
||
|
e.g., Magic.USED_PAGE. The FREE, USED, TRANSLATION, FREELOGIDS, and
|
||
|
FREEPHYSIDS node lists are used by the jdbm memory allocation policy
|
||
|
and are described below.
|
||
|
<p/>
|
||
|
The translation list consists of a bunch of slots that can be
|
||
|
available (free) or unavailable (allocated). If a slot is available,
|
||
|
then it contains junk data (it is available to map the logical row id
|
||
|
associated with that slot to some physical row id). If it is
|
||
|
unavailable, then it contains the block id and offset of the header of
|
||
|
a valid (non-deleted) record. "Available" for the translation list
|
||
|
is marked by a zero block id for that slot.
|
||
|
<p/>
|
||
|
The free logical row id list consists of a set of pages that contain
|
||
|
slots. Each slot is either available (free) or unavailable
|
||
|
(allocated). If it is unavailable, then it contains a reference to
|
||
|
the location of the available slot in the translation list. If it is
|
||
|
available, then it contains junk data. "Available" slots are marked by
|
||
|
a zero block id. A count is maintained of the #of available slots
|
||
|
(free row ids) on the node.
|
||
|
<p/>
|
||
|
As you free a logical row id, you change it's slot in the translation
|
||
|
list from unavailable to available, and then *add* entries to the free
|
||
|
logical row list. Adding entries to the free logical row list is done
|
||
|
by finding an available slot in the free logical row list and
|
||
|
replacing the junk data in that slot with the location of the now
|
||
|
available slot in the translation list. A count is maintained of the
|
||
|
#of available slots (free row ids) on the node.
|
||
|
<p/>
|
||
|
Whew... now we've freed a logical row id. But what about the physical
|
||
|
row id?
|
||
|
<p/>
|
||
|
Well, the free physical row id list consists of a set of pages that
|
||
|
contain slots. Each slot is either available (free) or unavailable
|
||
|
(allocated). If it is unavailable, then it contains a reference to
|
||
|
the location of the newly freed row's header in the data node. If it
|
||
|
is available, then it contains junk data. "Available" slots are
|
||
|
marked by a zero block id. A count is maintained of the #of available
|
||
|
slots (free row ids) on the node. (Sound familiar?)
|
||
|
<p/>
|
||
|
As you free a physical row id, you change it's header in the data node
|
||
|
from inuse to free (by zeroing the size field of the record header),
|
||
|
and then *add* an entry to the free physical row list. Adding entries
|
||
|
to the free physical row list consists of finding an available slot,
|
||
|
and replacing the junk data in that slot with the location of the
|
||
|
newly freed row's header in the data node.
|
||
|
<p/>
|
||
|
The translation list is used for translating in-use logical row ids
|
||
|
to in-use physical row ids. When a physical row id is freed, it is
|
||
|
removed from the translation list and added to the free physical row
|
||
|
id list.
|
||
|
<p/>
|
||
|
This allows a complete decoupling of the logical row id from the
|
||
|
physical row id, which makes it super easy to do some of the fiddling
|
||
|
I'm talking about the coallescing and splitting records.
|
||
|
<p/>
|
||
|
If you want to get a list of the free records, just enumerate the
|
||
|
unavailable entries in the free physical row id list. You don't even
|
||
|
need to look up the record header because the length of the record is
|
||
|
also stored in the free physical row id list. As you enumerate the
|
||
|
list, be sure to not include slots that are available (in the current
|
||
|
incarnation of jdbm, I believe the available length is set to 0 to
|
||
|
indicate available - we'll be changing that some time soon here, I'm
|
||
|
sure).
|
||
|
<p/>
|
||
|
|
||
|
</body>
|
||
|
</html>
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|