Answer To: Step 1 Please explain in a paragraph what is a No SQL database specifically Mongo DB and Neo4J u...
Neha answered on Jun 25 2021
Step1
The NoSQL databases stands for not only SQL and they are present in the non-tabular format. They are used to store the data but in the different manner. The NoSQL databases can be in different types on the basis of their data model. The major types for the NoSQL databases are wide column, craft document and the key value. They are used to provide the flexible schemas. They can easily skill up with a large amount of the data and also handle the high user load. Then the people use the term NoSQL database then they typically use it for referring it to the non-relational databases (Macak, M., Stovcik, M., Buhnova, B., & Merjavy, M). Mongo DB and neo 4j are the two types of NoSQL databases.
The mongo DB can be defined as the open-source document database, and it is the leading type for the NoSQL database. It is basically written in C++, and it is the cross-platform data space which is document oriented and it is known for providing high performance, easy scalability and the high availability for the data. It works over the concept of document and the collection. It is the document database which has a single collection, and it will hold the different documents. The document will have size, number of the fields and the content which will be different from one document to another.
It has a clear structure for the single object, and it does not include any type of the complex joins. It provides the deep query ability. It is known for supporting dynamic queries over the documents with the help of document-based query language which has same power as the structured query language. It is easy to scale and also it does not need the mapping or the conversion for application objects to the database objects. It uses the internal memory for storing the working set and also enables the faster access of the data.
Another type of NoSQL databases neo4j and it is the popular graph database. It is known as the cipher query language, and it is written in Java language. The graph can be defined as the pictorial representation for the set of objects in which we have some pair of the connected objects with the help of links. It is composed of two major elements which are the nodes or the vertices and edges or the relationship.
The graph database can be defined as the database which is used for modelling the data in the form of graph and the notes of the graph are used to show the entities and the relationship is used to show the association for all the nodes. It is the flexible data model, and it provides the flexible but powerful data model which we can modify as per the industry or the application. It provides result on the basis of real time data, and it is highly available for the large enterprise real time applications with the help of proper transactional guarantees.
With the help of this database, we can represent the semi structured and connected data. With the help of this database, we can represent the data but also retrieve the connected data faster than the other databases. It is known for providing the declarative query language for representing the graph visually. The commands for this language are present in human readable format and they are easy to learn. It will not need any complex joins for retrieving the data as we can retrieve the adjacent node or the relationship details without using indexes or the joints.
Step2
The Hadoop started in 2006 by Yahoo and it became the top-level Apache open-source project. The general purpose of the Hadoop is to perform distributed processing which has several components. The components are Hadoop distributed file system which is used to store the files in the Hadoop native format, and it will then parallelize across the cluster, yarn is used as the schedule which will coordinate with the application from time and the MapReduce which is the algorithm for processing the data in parallel.
It is built in Java language and is assessable with the help of different programming languages. We can use any programming language for writing the MapReduce code. It is available as the open source or even through the vendors.
The spark is newer project and it started in 2012. It is the top-level project of Apache which has focus on processing the data and parallel across the cluster, but the major difference is the working in the memory. The Hadoop will read and write the files to the Hadoop distributed file system, but spark will process the data in ram with the help of concept known as the re silent distributed data set.
It can either run on the stand-alone mode, in conjunction with massage or with the Hadoop cluster which is serving as the data source. The spark has been created around the spark core which is the engine that is able to drive optimization, scheduling and the abstraction. It is also connected with the spark for the correct file system. There are different libraries which are operating on the spark core, and they will allow us to run the SQL like commands on the distributed data sets (Samadi, Y., Zbakh, M., & Tadonki, C).
Hadoop can be used when we want to have linear processing over the huge data set. The Hadoop MapReduce will help us to perform parallel processing for the huge amount of the data as it will break the large...