Introduction
NoSQL, which stands for "Not Only SQL," is a broad term for a variety of database technologies that are not based on the traditional relational model used in SQL databases. NoSQL databases are designed to store and manage large amounts of unstructured or semi-structured data, which is not well-suited for relational databases.Characheristics:
Some of the key characteristics of NoSQL databases include:- Schemaless: NoSQL databases do not require a predefined schema, which means that the data structure can evolve over time. This is in contrast to relational databases, which require a rigid schema that is defined in advance.
- Flexible data structures: NoSQL databases support a variety of data structures, such as key-value pairs, documents, and graphs. This flexibility allows for more efficient storage and retrieval of different types of data.
- Horizontal scalability: NoSQL databases are designed to be horizontally scalable, which means that they can be easily scaled by adding more nodes to the cluster. This is in contrast to relational databases, which are typically vertically scaled by adding more resources to a single server.
Key differences between the SQL databases and NoSQL databases:
Feature | SQL Databases | NoSQL Databases |
---|---|---|
Schema | Rigid, predefined | Schemaless, flexible |
Data structure | Tables with rows and columns | Key-value pairs, documents, graphs |
Scalability | Vertical, limited by single server | Horizontal, scalable by adding node |
Use cases | Structured data, traditional applications | Unstructured or semi-structured data, modern applications |
Types of NoSQL Databases:
- Document-Oriented: Stores data in flexible, JSON-like documents. Examples include MongoDB and CouchDB.
Each document contains key-value pairs, and collections group related documents.
Collection 1 Document 1 { "field1": "value1", "field2": "value2", "field3": "value3" } Document 2 { "field1": "value4", "field2": "value5", "field3": "value6" } Collection 2 Document 1 { "field1": "value7", "field2": "value8", "field3": "value9" }
Most dommon data strunctures:- JSON or BSON Documents: Stores data in a flexible, hierarchical format.
Example:
# Json file { "_id": 1, "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Exampleville", "country": "Exampleland" } }
- Collections: Groups related documents together for easier organization.
Example:
// Users Collection { "_id": 1, "name": "John Doe", "age": 30 } // Orders Collection { "_id": 101, "user_id": 1, "total_amount": 50.00 }
- JSON or BSON Documents: Stores data in a flexible, hierarchical format.
Example:
- Key-Value Stores: Simple key-value pairs for data storage. Redis and DynamoDB are examples. In this case data is stored as simple key-value pairs.
Key: "user:1" Value: "John Doe" Key: "product:42" Value: "{ name: 'Product A', price: 29.99 }"
Most dommon data strunctures:- Hashtable: Uses keys to directly access data, providing fast retrieval.
Example:
{ "key1": "value1", "key2": "value2", "key3": "value3" }
- B-trees: Efficiently organizes key-value pairs for range queries and sequential access.
Example:
{ "key1": "value1", "key2": "value2", "key3": "value3" }
- Hashtable: Uses keys to directly access data, providing fast retrieval.
Example:
- Column-Family Stores: Organizes data into columns rather than rows. Apache Cassandra is a popular choice.
- Graph Databases: Designed for storing and querying graph-like data structures. Neo4j is a well-known graph
database. In this case data is represented as nodes and relationships between nodes.
(Node) User1 - Knows (Node) User2 - Likes (Node) Product1 (Node) User3 - Follows (Node) User1
Most dommon data strunctures:- Adjacency List: Represents graph as a collection of vertices and their adjacent vertices.
Example:
{ "A": ["B", "C"], "B": ["A", "D"], "C": ["A"], "D": ["B"] }
- Adjacency Matrix: Stores relationships between vertices in a matrix format.
Example:
A B C D A 0 1 1 0 B 1 0 0 1 C 1 0 0 0 D 0 1 0 0
- Adjacency List: Represents graph as a collection of vertices and their adjacent vertices.
Example:
Feature | Key-value stores | Graph databases | Document-oriented databases |
---|---|---|---|
Data structure | Key-value pairs | Nodes, edges | Documents |
flexibility | Moderate | Limited | Flexible |
Indexing | Simple indexing | Complex indexing | Efficient indexing |
Search | Simple keyword search | Advanced graph search | Flexible search |
Use cases:
- NoSQL databases are often used in scenarios where there's a need for quick and agile development.
- They excel in handling large volumes of data with varying structures.
- Commonly applied in real-time big data applications and analytics.
Popular NoSQL Databases:
- MongoDB:Document-oriented and widely used for its scalability and flexibility.
- Cassandra: Column-family store known for its high availability and fault tolerance.
- Redis: Key-value store with in-memory data storage.
- Neo4j: A graph database for handling interconnected data.
How does MongoDB work?
MongoDB environments offer a server for creating databases, where data is stored as records composed of collections and documents. Documents, the basic data units, consist of field-value pairs and use Binary JSON (BSON) format for flexibility in data types. Fields are similar to relational database columns, and documents may include various data types. Collections, akin to tables, group sets of documents within a database. The mongo shell, an interactive JavaScript interface, facilitates querying, updating, and administrative operations. BSON provides a binary representation of JSON-like documents, enhancing data type accommodation. Automatic sharding enables horizontal scalability by distributing data across multiple systems. MongoDB utilizes a single master architecture for data consistency, with secondary databases maintaining copies for automatic failover through replication.Why is MongoDB used?
An organization might want to use MongoDB for the following- Storage: MongoDB can store large structured and unstructured data volumes and is scalable vertically and horizontally. Indexes are used to improve search performance. Searches are also done by field, range and expression queries.
- Data integration: This integrates data for applications, including for hybrid and multi-cloud applications.
- Complex data structures descriptions: Document databases enable the embedding of documents to describe nested structures (a structure within a structure) and can tolerate variations in data.
- Load balancing: MongoDB can be used to run over multiple servers.
Features of MongoDB
Features of MongoDB include the following:- Replication: A replica set is two or more MongoDB instances used to provide high availability. Replica sets are made of primary and secondary servers. The primary MongoDB server performs all the read and write operations, while the secondary replica keeps a copy of the data. If a primary replica fails, the secondary replica is then used.
- Scalability: MongoDB supports vertical and horizontal scaling. Vertical scaling works by adding more power to an existing machine, while horizontal scaling works by adding more machines to a user's resources.
- Load balancing: MongoDB handles load balancing without the need for a separate, dedicated load balancer, through either vertical or horizontal scaling.
- Schema-less: MongoDB is a schema-less database, which means the database can manage data without the need for a blueprint.
- Document: Data in MongoDB is stored in documents with key-value pairs instead of rows and columns, which makes the data more flexible when compared to SQL databases.
Advantages of MongoDB
MongoDB offers several potential benefits:- Schema-less: Like other NoSQL databases, MongoDB doesn't require predefined schemas. It stores any type of data. This gives users the flexibility to create any number of fields in a document, making it easier to scale MongoDB databases compared to relational databases.
- Document-oriented: One of the advantages of using documents is that these objects map to native data types in several programming languages., Having embedded documents also reduces the need for database joins, which can lower costs.
- Scalability: A core function of MongoDB is its horizontal scalability, which makes it a useful database for companies running big data applications. In addition, sharding lets the database distribute data across a cluster of machines. MongoDB also supports the creation of zones of data based on a shard key.
- Third-party support: MongoDB supports several storage engines and provides pluggable storage engine APIs that let third parties develop their own storage engines for MongoDB.
- Aggregation: The DBMS also has built-in aggregation capabilities, which lets users run MapReduce code directly on the database rather than running MapReduce on Hadoop. MongoDB also includes its own file system called GridFS, akin to the Hadoop Distributed File System. The use of the file system is primarily for storing files larger than BSON's size limit of 16 MB per document. These similarities let MongoDB be used instead of Hadoop, though the database software does integrate with Hadoop, Spark and other data processing frameworks.
Disadvantages of MongoDB
Though there are some valuable benefits to MongoDB, there are some downsides to it as well.- Continuity.: With its automatic failover strategy, a user sets up just one master node in a MongoDB cluster. If the master fails, another node will automatically convert to the new master. This switch promises continuity, but it isn't instantaneous -- it can take up to a minute. By comparison, the Cassandra NoSQL database supports multiple master nodes. If one master goes down, another is standing by, creating a highly available database infrastructure.
- Write limits: MongoDB's single master node also limits how fast data can be written to the database. Data writes must be recorded on the master, and writing new information to the database is limited by the capacity of that master node.
- Data consistency: MongoDB doesn't provide full referential integrity through the use of foreign-key constraints, which could affect data consistency.
- Security: In addition, user authentication isn't enabled by default in MongoDB databases. However, malicious hackers have targeted large numbers of unsecured MongoDB systems in attacks, which led to the addition of a default setting that blocks networked connections to databases if they haven't been configured by a database administrator.
MongoDB query language (MQL):
MongoDB uses a query language called MongoDB Query Language (MQL) for interacting with its databases. MQL provides a set of operators and methods for querying, updating, and manipulating data within MongoDB collections. Here are some detailed MQL queries for common operations (assume that our collection name is:collection_db
):
- Basics of Querying:
- Find Documents:
db.collection_db.find({ key: value });
Example:
Example:db.students.find({ grade: "A" });
Example:// Example: Search for documents containing the word "MongoDB" db.articles.find({ $text: { $search: "MongoDB" } });
// Example: Find documents where the array "scores" has at least one element greater than 90 db.students.find({ "scores": { $elemMatch: { $gt: 90 } } });
- Find Documents:
- Specify Multiple Conditions:
db.collection_db.find({ key1: value1, key2: value2 });
- Query with Operators:
db.collection_db.find({ age: { $gt: 25 } });
- Projection:
- Select Fields to Return:
db.collection_db.find({}).project({ key1: 1, key2: 1 });
- Exclude Fields:
db.collection_db.find({}).project({ keyToExclude: 0 });
- Select Fields to Return:
- Sorting:
- Sort in Ascending Order:
db.collection_db.find({}).sort({ key: 1 });
- Sort in Descending Order:
db.collection_db.find({}).sort({ key: -1 });
- Sort in Ascending Order:
- Limit and skips:
- Limit Results:
db.collection_db.find({}).limit(10);
- Skip Results:
db.collection_db.find({}).skip(5);
- Limit Results:
- Update Documents: To update a document in a collection:
- Update a Single Document:
db.collection_db.updateOne({ filter }, { $set: { key: value } });
Example:
db.students.updateOne( { name: "John Doe" }, { $set: : { age: 28 } });
- Update a Single Document:
- Update Multiple Documents:
db.collection_db.updateMany( { filter }, { $set: { key: value } });
- Delete Documents:
- Delete a Single Document:
db.collection_db.deleteOne({ filter });
Example:
db.students.deleteOne({ name: "John Doe", });
- Delete a Single Document:
- Delete Multiple Documents:
db.students.deleteOne({ filter });
- Aggregration frameworks:
- Aggregate Data:
Example:db.collection_db.aggregate([ { $match: { key: value } }, { $group: { _id: "$key", count: { $sum: 1 } } }, ]);
// Example: Calculate the average age of students in each grade db.students.aggregate([ { $group: { _id: "$grade", avgAge: { $avg: "$age" } } }, { $sort: { avgAge: -1 } }, ]);
- Aggregate Data:
- Indexes:
- Create an Index:
db.collection_db.createIndex({ key: 1 });
- Explain Query Execution Plan:
db.collection_db.find({ key: value }).explain("executionStats");
- Create an Index:
- Insert Docuemnt: To insert a document into a collection:
db.collection_db.insertOne({key1 : value1, key2 : value2})
Example:
db.students.insertOne({name: "John Doe", age: 25, grade: "A",})
Advanced MQL queris:
Advanced MongoDB Query Language (MQL) queries can involve more complex operations, aggregations, and various stages in the aggregation pipeline. Below are some examples of advanced MQL queries:- Operators: MongoDB operators are used in queries to perform various operations on documents. Some common operators include:
- Comparison Operators:
- Logical Operators:
- Element Operators:
- Array Operators:
Operator Description: $eq
Matches values that are equal to a specified value. $ne
Matches values that are not equal to a specified value. $gt
,$gte
Matches values that are greater than (or equal to) a specified value. $lt
,$lte
Matches values that are less than (or equal to) a specified value. Operator Description: %and
,$or
,$not
Perform logical AND, OR, and NOT operations, respectively. Operator Description: $exists
Matches documents that have the specified field. $type
Selects documents if a field is of the specified type. Operator Description: $in
Matches any of the values specified in an array. $all
Matches arrays that contain all elements specified in the query. // Find documents where age is greater than 25 and grade is "A" db.students.find({ $and: [{ age: { $gt: 25 } }, { grade: "A" }] });
- Cursor Methods:
MongoDB provides cursor methods to interact with the result set returned by queries. Common cursor methods include:
Methods Descriptions limit(n)
Limits the number of documents returned. skip(n)
Skips the first n
documents.sort({ field: 1})
Sorts the result set in ascending order based on the specified field. Use -1
for descending order.count()
Returns the count of documents in the result set. forEach(fn)
Applies a JavaScript function for each document in the result set. // Find and limit to 10 documents, skip the first 5, and sort by age in descending order db.students.find().limit(10).skip(5).sort({ age: -1 });
- Projections
Projections in MongoDB are used to control which fields are returned in the query result. The
project
method or the second argument of thefind
method is used for projections.- Include Fields:
// Only include the name and age fields db.students.find({}, { name: 1, age: 1, _id: 0 });
- Exclude Fields:
// Exclude the grade field
db.students.find({}, { grade: 0 });
Projections help in reducing the amount of data sent over the network and can improve query performance.
CRUD operations
In MongoDB, CRUD stands for Create, Read, Update, and Delete – the basic operations that can be performed on documents in a collection. Here's a brief overview of each CRUD operation in the context of MongoDB:- Create (C): To insert a new document into a collection, you use the
insertOne()
orinsertMany()
method.db.collection.insertOne({ name: "John Doe", age: 30, city: "Example City" });
- Read(R): To retrieve documents from a collection, you use the
find()
method. You can specify criteria for filtering results.db.collection.find({ age: { $gt: 25 } });
- Update (U): To modify existing documents, you use the
updateOne()
orupdateMany()
method. You can specify criteria for matching and define the changes using update operators.db.collection.updateOne({ name: "John Doe" }, { $set: { age: 31 } });
- Delte (D): To remove documents from a collection, you use the
deleteOne()
ordeleteMany()
method. You specify criteria for matching documents to be deleted.db.collection.deleteOne({ name: "John Doe" });
References
Some other interesting things to know:
- Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
- Visit my website on Data engineering