NOSQL & MongoDB

Introduction

NoSQL, which stands for "Not Only SQL," is a broad term for a variety of database technologies that are not based on the traditional relational model used in SQL databases. NoSQL databases are designed to store and manage large amounts of unstructured or semi-structured data, which is not well-suited for relational databases.
Characheristics:
Some of the key characteristics of NoSQL databases include:
  • Schemaless: NoSQL databases do not require a predefined schema, which means that the data structure can evolve over time. This is in contrast to relational databases, which require a rigid schema that is defined in advance.
  • Flexible data structures: NoSQL databases support a variety of data structures, such as key-value pairs, documents, and graphs. This flexibility allows for more efficient storage and retrieval of different types of data.
  • Horizontal scalability: NoSQL databases are designed to be horizontally scalable, which means that they can be easily scaled by adding more nodes to the cluster. This is in contrast to relational databases, which are typically vertically scaled by adding more resources to a single server.
NoSQL databases are often used for applications that require the storage and management of large amounts of unstructured or semi-structured data, such as social media data, e-commerce data, and IoT data. Some popular NoSQL databases include MongoDB, Cassandra, and Couchbase

Key differences between the SQL databases and NoSQL databases:
Feature SQL Databases NoSQL Databases
Schema Rigid, predefined Schemaless, flexible
Data structure Tables with rows and columns Key-value pairs, documents, graphs
Scalability Vertical, limited by single server Horizontal, scalable by adding node
Use cases Structured data, traditional applications Unstructured or semi-structured data, modern applications

Types of NoSQL Databases:

  • Document-Oriented: Stores data in flexible, JSON-like documents. Examples include MongoDB and CouchDB. Each document contains key-value pairs, and collections group related documents.
                        Collection 1
                        Document 1
                          {
                            "field1": "value1",
                            "field2": "value2",
                            "field3": "value3"
                          }
                        Document 2
                          {
                            "field1": "value4",
                            "field2": "value5",
                            "field3": "value6"
                          }
                      
                      Collection 2
                        Document 1
                          {
                            "field1": "value7",
                            "field2": "value8",
                            "field3": "value9"
                          }
                      
                    
    Most dommon data strunctures:
    • JSON or BSON Documents: Stores data in a flexible, hierarchical format. Example:
                                  # Json file
                                  {
                                      "_id": 1,
                                      "name": "John Doe",
                                      "age": 30,
                                      "address": {
                                        "street": "123 Main St",
                                        "city": "Exampleville",
                                        "country": "Exampleland"
                                      }
                                    }                            
                              
    • Collections: Groups related documents together for easier organization. Example:
                                  // Users Collection
                                  {
                                    "_id": 1,
                                    "name": "John Doe",
                                    "age": 30
                                  }
                                  
                                  // Orders Collection
                                  {
                                    "_id": 101,
                                    "user_id": 1,
                                    "total_amount": 50.00
                                  }                                            
                              
  • Key-Value Stores: Simple key-value pairs for data storage. Redis and DynamoDB are examples. In this case data is stored as simple key-value pairs.
                        Key: "user:1"
                        Value: "John Doe"
                        
                        Key: "product:42"
                        Value: "{ name: 'Product A', price: 29.99 }"                    
                    
    Most dommon data strunctures:
    • Hashtable: Uses keys to directly access data, providing fast retrieval. Example:
                                  {
                                      "key1": "value1",
                                      "key2": "value2",
                                      "key3": "value3"
                                    }
                              
    • B-trees: Efficiently organizes key-value pairs for range queries and sequential access. Example:
                                  {
                                      "key1": "value1",
                                      "key2": "value2",
                                      "key3": "value3"
                                    }
                              
  • Column-Family Stores: Organizes data into columns rather than rows. Apache Cassandra is a popular choice.
  • Graph Databases: Designed for storing and querying graph-like data structures. Neo4j is a well-known graph database. In this case data is represented as nodes and relationships between nodes.
                        (Node) User1
                        - Knows
                          (Node) User2
                        - Likes
                          (Node) Product1
                      
                      (Node) User3
                        - Follows
                          (Node) User1                  
                    
    Most dommon data strunctures:
    • Adjacency List: Represents graph as a collection of vertices and their adjacent vertices. Example:
                                  {
                                      "A": ["B", "C"],
                                      "B": ["A", "D"],
                                      "C": ["A"],
                                      "D": ["B"]
                                    }
                              
    • Adjacency Matrix: Stores relationships between vertices in a matrix format. Example:
                                  A  B  C  D
                                  A  0  1  1  0
                                  B  1  0  0  1
                                  C  1  0  0  0
                                  D  0  1  0  0
                              

Feature Key-value stores Graph databases Document-oriented databases
Data structure Key-value pairs Nodes, edges Documents
flexibility Moderate Limited Flexible
Indexing Simple indexing Complex indexing Efficient indexing
Search Simple keyword search Advanced graph search Flexible search

Use cases:

  • NoSQL databases are often used in scenarios where there's a need for quick and agile development.
  • They excel in handling large volumes of data with varying structures.
  • Commonly applied in real-time big data applications and analytics.

Popular NoSQL Databases:

  • MongoDB:Document-oriented and widely used for its scalability and flexibility.
  • Cassandra: Column-family store known for its high availability and fault tolerance.
  • Redis: Key-value store with in-memory data storage.
  • Neo4j: A graph database for handling interconnected data.

How does MongoDB work?

MongoDB environments offer a server for creating databases, where data is stored as records composed of collections and documents. Documents, the basic data units, consist of field-value pairs and use Binary JSON (BSON) format for flexibility in data types. Fields are similar to relational database columns, and documents may include various data types. Collections, akin to tables, group sets of documents within a database. The mongo shell, an interactive JavaScript interface, facilitates querying, updating, and administrative operations. BSON provides a binary representation of JSON-like documents, enhancing data type accommodation. Automatic sharding enables horizontal scalability by distributing data across multiple systems. MongoDB utilizes a single master architecture for data consistency, with secondary databases maintaining copies for automatic failover through replication.

Why is MongoDB used?

An organization might want to use MongoDB for the following
  • Storage: MongoDB can store large structured and unstructured data volumes and is scalable vertically and horizontally. Indexes are used to improve search performance. Searches are also done by field, range and expression queries.
  • Data integration: This integrates data for applications, including for hybrid and multi-cloud applications.
  • Complex data structures descriptions: Document databases enable the embedding of documents to describe nested structures (a structure within a structure) and can tolerate variations in data.
  • Load balancing: MongoDB can be used to run over multiple servers.

Features of MongoDB

Features of MongoDB include the following:
  • Replication: A replica set is two or more MongoDB instances used to provide high availability. Replica sets are made of primary and secondary servers. The primary MongoDB server performs all the read and write operations, while the secondary replica keeps a copy of the data. If a primary replica fails, the secondary replica is then used.
  • Scalability: MongoDB supports vertical and horizontal scaling. Vertical scaling works by adding more power to an existing machine, while horizontal scaling works by adding more machines to a user's resources.
  • Load balancing: MongoDB handles load balancing without the need for a separate, dedicated load balancer, through either vertical or horizontal scaling.
  • Schema-less: MongoDB is a schema-less database, which means the database can manage data without the need for a blueprint.
  • Document: Data in MongoDB is stored in documents with key-value pairs instead of rows and columns, which makes the data more flexible when compared to SQL databases.

Advantages of MongoDB

MongoDB offers several potential benefits:
  • Schema-less: Like other NoSQL databases, MongoDB doesn't require predefined schemas. It stores any type of data. This gives users the flexibility to create any number of fields in a document, making it easier to scale MongoDB databases compared to relational databases.
  • Document-oriented: One of the advantages of using documents is that these objects map to native data types in several programming languages., Having embedded documents also reduces the need for database joins, which can lower costs.
  • Scalability: A core function of MongoDB is its horizontal scalability, which makes it a useful database for companies running big data applications. In addition, sharding lets the database distribute data across a cluster of machines. MongoDB also supports the creation of zones of data based on a shard key.
  • Third-party support: MongoDB supports several storage engines and provides pluggable storage engine APIs that let third parties develop their own storage engines for MongoDB.
  • Aggregation: The DBMS also has built-in aggregation capabilities, which lets users run MapReduce code directly on the database rather than running MapReduce on Hadoop. MongoDB also includes its own file system called GridFS, akin to the Hadoop Distributed File System. The use of the file system is primarily for storing files larger than BSON's size limit of 16 MB per document. These similarities let MongoDB be used instead of Hadoop, though the database software does integrate with Hadoop, Spark and other data processing frameworks.
Disadvantages of MongoDB
Though there are some valuable benefits to MongoDB, there are some downsides to it as well.
  • Continuity.: With its automatic failover strategy, a user sets up just one master node in a MongoDB cluster. If the master fails, another node will automatically convert to the new master. This switch promises continuity, but it isn't instantaneous -- it can take up to a minute. By comparison, the Cassandra NoSQL database supports multiple master nodes. If one master goes down, another is standing by, creating a highly available database infrastructure.
  • Write limits: MongoDB's single master node also limits how fast data can be written to the database. Data writes must be recorded on the master, and writing new information to the database is limited by the capacity of that master node.
  • Data consistency: MongoDB doesn't provide full referential integrity through the use of foreign-key constraints, which could affect data consistency.
  • Security: In addition, user authentication isn't enabled by default in MongoDB databases. However, malicious hackers have targeted large numbers of unsecured MongoDB systems in attacks, which led to the addition of a default setting that blocks networked connections to databases if they haven't been configured by a database administrator.

MongoDB query language (MQL):

MongoDB uses a query language called MongoDB Query Language (MQL) for interacting with its databases. MQL provides a set of operators and methods for querying, updating, and manipulating data within MongoDB collections. Here are some detailed MQL queries for common operations (assume that our collection name is: collection_db):
  1. Basics of Querying:
    • Find Documents:
      db.collection_db.find({ key: value });
    • Example:
      db.students.find({ grade: "A" });
      Example:
      
                    // Example: Search for documents containing the word "MongoDB"
                    db.articles.find({ $text: { $search: "MongoDB" } });              
                  
      Example:
      
                    // Example: Find documents where the array "scores" has at least one element greater than 90
                    db.students.find({ "scores": { $elemMatch: { $gt: 90 } } });
                  
    • Specify Multiple Conditions:
      db.collection_db.find({ key1: value1, key2: value2 });
    • Query with Operators:
      db.collection_db.find({ age: { $gt: 25 } });
  2. Projection:
    • Select Fields to Return:
      db.collection_db.find({}).project({ key1: 1, key2: 1 });
    • Exclude Fields:
      db.collection_db.find({}).project({ keyToExclude: 0 });
  3. Sorting:
    • Sort in Ascending Order:
      db.collection_db.find({}).sort({ key: 1 });
    • Sort in Descending Order:
      db.collection_db.find({}).sort({ key: -1 });
  4. Limit and skips:
    • Limit Results:
      db.collection_db.find({}).limit(10);
    • Skip Results:
      db.collection_db.find({}).skip(5);
  5. Update Documents: To update a document in a collection:
    • Update a Single Document:
      db.collection_db.updateOne({ filter }, { $set: { key: value } });
    • Example:
      db.students.updateOne(
                    { name: "John Doe" }, 
                    { 
                      $set: : { 
                        age: 28 
                      } 
                    });
    • Update Multiple Documents:
      db.collection_db.updateMany(
                      { filter }, 
                      { 
                        $set: { 
                          key: value 
                        } 
                      });
  6. Delete Documents:
    • Delete a Single Document:
      db.collection_db.deleteOne({ filter });
    • Example:
      db.students.deleteOne({ name: "John Doe", });
    • Delete Multiple Documents:
      db.students.deleteOne({ filter });
  7. Aggregration frameworks:
    • Aggregate Data:
      
                      db.collection_db.aggregate([
                      { $match: { key: value } },
                      { $group: { _id: "$key", count: { $sum: 1 } } },
                    ]);                
                    
      Example:
    • 
                    // Example: Calculate the average age of students in each grade
                    db.students.aggregate([
                      { $group: { _id: "$grade", avgAge: { $avg: "$age" } } },
                      { $sort: { avgAge: -1 } },
                    ]);
                  
  8. Indexes:
    • Create an Index:
      db.collection_db.createIndex({ key: 1 });
    • Explain Query Execution Plan:
      db.collection_db.find({ key: value }).explain("executionStats");
  9. Insert Docuemnt: To insert a document into a collection:
    db.collection_db.insertOne({key1 : value1, key2 : value2})
  10. Example:
    db.students.insertOne({name: "John Doe", age: 25, grade: "A",})

Advanced MQL queris:

Advanced MongoDB Query Language (MQL) queries can involve more complex operations, aggregations, and various stages in the aggregation pipeline. Below are some examples of advanced MQL queries:
  1. Operators: MongoDB operators are used in queries to perform various operations on documents. Some common operators include:
    • Comparison Operators:
    • Operator Description:
      $eq Matches values that are equal to a specified value.
      $ne Matches values that are not equal to a specified value.
      $gt, $gte Matches values that are greater than (or equal to) a specified value.
      $lt, $lte Matches values that are less than (or equal to) a specified value.
    • Logical Operators:
    • Operator Description:
      %and, $or, $not Perform logical AND, OR, and NOT operations, respectively.
    • Element Operators:
    • Operator Description:
      $exists Matches documents that have the specified field.
      $type Selects documents if a field is of the specified type.
    • Array Operators:
    • Operator Description:
      $in Matches any of the values specified in an array.
      $all Matches arrays that contain all elements specified in the query.
    
                // Find documents where age is greater than 25 and grade is "A"
                db.students.find({ $and: [{ age: { $gt: 25 } }, { grade: "A" }] });          
              
  2. Cursor Methods: MongoDB provides cursor methods to interact with the result set returned by queries. Common cursor methods include:
    Methods Descriptions
    limit(n) Limits the number of documents returned.
    skip(n) Skips the first n documents.
    sort({ field: 1}) Sorts the result set in ascending order based on the specified field. Use -1 for descending order.
    count() Returns the count of documents in the result set.
    forEach(fn) Applies a JavaScript function for each document in the result set.
    Example:
    
                // Find and limit to 10 documents, skip the first 5, and sort by age in descending order
                db.students.find().limit(10).skip(5).sort({ age: -1 });          
              
  3. Projections Projections in MongoDB are used to control which fields are returned in the query result. The project method or the second argument of the find method is used for projections.
    • Include Fields:
    • 
                    // Only include the name and age fields
                    db.students.find({}, { name: 1, age: 1, _id: 0 });            
                  
    • Exclude Fields:
    • 
                    // Exclude the grade field
                    db.students.find({}, { grade: 0 });            
                  
      Projections help in reducing the amount of data sent over the network and can improve query performance.

CRUD operations

In MongoDB, CRUD stands for Create, Read, Update, and Delete – the basic operations that can be performed on documents in a collection. Here's a brief overview of each CRUD operation in the context of MongoDB:
  1. Create (C): To insert a new document into a collection, you use the insertOne() or insertMany() method.
    
                db.collection.insertOne({ name: "John Doe", age: 30, city: "Example City" });
              
  2. Read(R): To retrieve documents from a collection, you use the find() method. You can specify criteria for filtering results.
    
                db.collection.find({ age: { $gt: 25 } });
              
  3. Update (U): To modify existing documents, you use the updateOne() or updateMany() method. You can specify criteria for matching and define the changes using update operators.
    
                db.collection.updateOne({ name: "John Doe" }, { $set: { age: 31 } });
              
  4. Delte (D): To remove documents from a collection, you use the deleteOne() or deleteMany() method. You specify criteria for matching documents to be deleted.
    
                db.collection.deleteOne({ name: "John Doe" });
              

References


Some other interesting things to know: