OrientDB and MongoDB* share many features, but the engines are fundamentally different. While MongoDB is a pure Document Database, OrientDB has a hybrid Document-Graph engine that adds some compelling features to the Document Database model.
This page will outline only the most important differences.
This is a JSON document representing a simplified Order. Note the “customer” property, which is embedded inside the parent Order object.
This is a very common method with Document Databases to overcome the performance bottlenecks of Relational DBMS when a JOIN is executed. With MongoDB, you can store the _id of the connected customer, but this is still similar to doing JOINs that have a high run-time cost and do not scale when the database size increases (to know more about this topic, please look at the presentation “Why Relationships are cool, but JOIN sucks”).
OrientDB can embed documents like any other Document Database, but it can also connect documents like a Relational Database. The main difference is that OrientDB doesn’t use the costly JOIN, but rather uses direct, super-fast links taken from the Graph Database world.
The illustration on the right shows how the original document has been split in two documents linked using the Customer’s Record ID #8:124 to connect the Order to the Customer document. Links can be thought of as in-memory pointers, but persistent on the disk.
Why connect documents rather than embedding them? For two reasons: 1. there are no duplicates, resulting in a smaller and lighter database, and 2. because it’s faster. A smaller database means better usage of RAM, thus allowing more caching.
Upon loading the Order document, OrientDB will assemble the entire document by fetching all the connections transparently.
The way this transparent fetching of connections is done is one of the strong points of OrientDB. Instead of having repeated calls to the database or costly JOIN operations, a Fetch Plan is given with the query, allowing the database to return a complete graph of interconnected documents, exactly as intended, in a single operation.
MongoDB doesn’t support ACID Transactions. Instead, they support Atomic Operations, so any single operation against documents is atomic. This means that you cannot have Atomicity against multiple documents. For some use cases, this is acceptable, but in others this would be a big problem. OrientDB supports Atomic Operations as well as ACID Transactions, just like the Relational Database model. OrientDB uses a Write Ahead Logging (WAL) Journal to make all the changes durable, even in the event of failure.
MongoDB has its own Query Language based on JSON, which requires training to learn a new language. OrientDB’s query language is built on SQL and is augmented with a few extensions to manipulate trees and graphs. Considering most developers are familiar with SQL, working with OrientDB is easier.
UPDATE product SET price = 9.99 WHERE stock.qty > 2
- SB-Tree index, a new generation of algorithm designed to manage a high number of concurrent clients. Furthermore, it is durable by way of WAL (Write Ahead Logging) avoiding the need to rebuild indexes in case of failure
- Hash index, based on hashing, so it’s super fast on read and write operations, but doesn’t support range queries
- Lucene, based on Apache Lucene Project, provides fast Full-Text and Spatial indexes
The entire storage of MongoDB is managed using the Memory Mapping technique. This is great, because it’s very fast and managed by the Operating System (OS). In the past, OrientDB used the exact same technique with its “LOCAL” Storage Engine. However, the problem with the Memory Mapping approach is that each OS manages Memory Mapped files in a different way and the available tools to tune the process are very limited and low level. This introduces many problems when databases require more space than the available RAM on the server.
Consider these experiences from MongoDB users:
“Poor Memory Management – MongoDB manages memory by memory mapping your entire data set, leaving page cache management and faulting up to the kernel. A more intelligent scheme would be able to do things like fault in your indexes before use as well as handle faulting in of cold/hot data more effectively. The result is that memory usage can’t be effectively reasoned about, and performance is non-optimal.” – One Year with MongoDB
And also “MongoDB … The global write lock (now just a database-level write lock, woo). The non-durable un-verifiable writes. The posts about how to scale to Big Data, where Big Data is 100gb. It makes more sense when you look at how the underlying storage layer is implemented. Basically, MongoDB consists of a collection of mmap’d linked lists of BSON documents, with dead simple B-tree indexing, and basic journaling as the storage durability mechanism (issues with what the driver considers a “durable write”, before data necessarily hits the storage layer, is something others have dealt with in depth). Eventually writes get fsync’d to disk by the OS, and reads result in the page with the data being loaded into memory by the OS.” – The Genius and Folly of MongoDB
Starting from release 1.4, OrientDB supports a new generation of Storage Engine named “PLOCAL“. It avoids Memory Mapping completely, in favor of direct management of disk pages. Pages are also compressed to maximize available RAM and take up less space on the disk. This is much more efficient than Memory Mapping, especially with large databases.
Want to know more about OrientDB? Check out the OrientDB Manual.
*MongoDB is a registered trademark of MongoDB, Inc.