RC: W2 D1 — Difference between a database, a DBMS and a storage engine
February 19, 2024Until today, I was saying I wanted to re-implement a database from scratch, and would use interchangeably the terms “database”, “database management system (DBMS)” and “storage engine”. I figured it was about time to clarify the distinctions between those terms, so that I can describe precisely what I am doing.
So, here it is:
- A database is a structured collection of data. It ranges from a simple text file (where each line is one item) to more complex collections of tables, rows, indexes and so forth. For example, this could be the catalog of books that you stored in your PostgreSQL database.
- A database management system (DBMS) is a software that provides an interface between the database and its users (basically to manage and interact with the database). It includes functionalities to create, read, update and delete records, as well as data integrity and transactions.
- A storage engine is a component of a DBMS that deals specifically with data storage and retrieval on disk. It is responsible with the data’s physical representation, how it is organized in files, as well as indexing and locking. Some DBMS allow to configure a specific storage engine, depending on your needs.
Having these terms clarified, my goal during my RC batch will be to implement 2 or 3 storage engines so that I can understand better the differences between them and why each is optimized for different use cases. Then, I would like to implement a full DBMS from end-to-end, which, as far as my understanding goes for now, means adding a transaction layer and a layer containing query processing (with the query optimizer and what not).
Time will tell how far I can get in this endeavor!