Wednesday, June 3, 2020

Evaluation of NoSQL Databases for Data Analysis - 275 Words

Evaluation of NoSQL Databases for Data Analysis (Dissertation Sample) Content: EVALUATION OF NoSQL DATABASES FOR DATA ANALYSIS(Name)(Course)(Tutor)School (University)Date DECLARATIONBy submitting this work, I declare that this work is entirely my own except those parts duly identified and referenced in my submission. It complies with any specified word limits and the requirements and regulations detailed in the assessment instructions and any other relevant programme and module documentation. In submitting this work I acknowledge that I have read and understood the regulations and code regarding academic misconduct, including that relating to plagiarism, as specified in the Programme Handbook. I also acknowledge that this work will be subject to a variety of checks for academic misconduct. ABSTRACTThe research sought to evaluate and establish the different types of NoSQL databases for data analysis. SQL databases have been used for a long time in data storage and retrieval, yet they are cumbered by the problem of unstructured data and scalabil ity. The research paper dealt with the scalability issue of the relational databases and how the non-relational databases solved the scalability problem. The research paper shows how non-relational databases developed a new set of data management features supporting data analytics and overcoming the challenges of relational databases. The research paper also defined the advantages of adopting various database systems and their benefits. With particular concentration on the four classifications of NoSQL databases and an example in each, the research paper compared the databases on grounds of data model, handling of relational information, performing of aggregation tasks and querying to finally determine the database that was best suited to undertake tasks that relate to data analytics and performance of analytical tasks in relation to unstructured data. TABLE OF CONTENTS TOC \o "1-3" \h \z \u  HYPERLINK \l "_Toc430875960" DECLARATION  PAGEREF _Toc430875960 \h 2 HYPERLINK \l "_Toc430875961" ABSTRACT  PAGEREF _Toc430875961 \h 3 HYPERLINK \l "_Toc430875962" 1.0 INTRODUCTION  PAGEREF _Toc430875962 \h 6 HYPERLINK \l "_Toc430875963" 1.1 Problem Statement  PAGEREF _Toc430875963 \h 9 HYPERLINK \l "_Toc430875964" 1.2 Research Aims and Objectives  PAGEREF _Toc430875964 \h 10 HYPERLINK \l "_Toc430875965" 1.3 Structure of Report  PAGEREF _Toc430875965 \h 11 HYPERLINK \l "_Toc430875966" 2.0 LITERATURE REVIEW  PAGEREF _Toc430875966 \h 12 HYPERLINK \l "_Toc430875967" 2.1 Development and Evolution of NoSQL Databases  PAGEREF _Toc430875967 \h 12 HYPERLINK \l "_Toc430875968" 2.2 Data Analysis using NoSQL Databases  PAGEREF _Toc430875968 \h 13 HYPERLINK \l "_Toc430875969" 2.3 Models of Distribution  PAGEREF _Toc430875969 \h 14 HYPERLINK \l "_Toc430875970" 2.4 CAP Theorem  PAGEREF _Toc430875970 \h 15 HYPERLINK \l "_Toc430875971" 2.5 Relational (SQL) v Non-relational Databases (NoSQL)  PA GEREF _Toc430875971 \h 16 HYPERLINK \l "_Toc430875972" 2.6 Types of NoSQL databases  PAGEREF _Toc430875972 \h 18 HYPERLINK \l "_Toc430875973" 2.7 Preview of Some NoSQL Solutions  PAGEREF _Toc430875973 \h 19 HYPERLINK \l "_Toc430875974" 3.0 RESEARCH METHODOLOGY  PAGEREF _Toc430875974 \h 21 HYPERLINK \l "_Toc430875975" 3.1 Introduction  PAGEREF _Toc430875975 \h 21 HYPERLINK \l "_Toc430875976" 3.2 Research Methodology  PAGEREF _Toc430875976 \h 21 HYPERLINK \l "_Toc430875977" 3.3 Research Design  PAGEREF _Toc430875977 \h 24 HYPERLINK \l "_Toc430875978" 3.4 Data Collection Techniques  PAGEREF _Toc430875978 \h 24 HYPERLINK \l "_Toc430875979" 3.5 Sampling Techniques  PAGEREF _Toc430875979 \h 25 HYPERLINK \l "_Toc430875980" 4.0 RESULT PRESENTATION, ANALYSIS AND DISCUSSION  PAGEREF _Toc430875980 \h 28 HYPERLINK \l "_Toc430875981" 4.1 Introduction  PAGEREF _Toc430875981 \h 28 HYPERLINK \l "_Toc430875982" 4. 2 Result Presentation  PAGEREF _Toc430875982 \h 29 HYPERLINK \l "_Toc430875983" 4.2.1 Key-value Stores (REDIS as the Case Study)  PAGEREF _Toc430875983 \h 29  HYPERLINK \l "_Toc430875984" 4.2.2 Column-based Database (Cassandra as the Case Study)  PAGEREF _Toc430875984 \h 34  HYPERLINK \l "_Toc430875985" 4.2.3 Document-based Database (MongoDB as the Case Study)  PAGEREF _Toc430875985 \h 38 HYPERLINK \l "_Toc430875986" 4.2.4 Graph-based Database (Neo4J as the Case Study)  PAGEREF _Toc430875986 \h 43 HYPERLINK \l "_Toc430875987" 4.3.5 Performance, Availability and Scalability of Key-value Store Databases (REDIS)  PAGEREF _Toc430875987 \h 50 HYPERLINK \l "_Toc430875988" 4.3.6 Performance, Availability and Scalability of Column Family Databases (Cassandra)  PAGEREF _Toc430875988 \h 51 HYPERLINK \l "_Toc430875989" 4.3.7 Performance, Availability and Scalability of Document Store Databases (MongoDB)  PAGEREF _Toc430875989 \h 53 HYPERLINK \l "_Toc430875990" 4.3.8 Performance, Availability and Scalability of Graph-Based Databases (Neo4J)  PAGEREF _Toc430875990 \h 55 HYPERLINK \l "_Toc430875991" 5.0 CONCLUSION AND RECOMMENDATIONS  PAGEREF _Toc430875991 \h 56 HYPERLINK \l "_Toc430875992" 5.1 General Overview  PAGEREF _Toc430875992 \h 56 HYPERLINK \l "_Toc430875993" 5.2 Limitations of the Study  PAGEREF _Toc430875993 \h 58 HYPERLINK \l "_Toc430875994" 5.3 Recommendations for Action  PAGEREF _Toc430875994 \h 58 HYPERLINK \l "_Toc430875995" 5.4 Recommendations for Further Research  PAGEREF _Toc430875995 \h 59 HYPERLINK \l "_Toc430875996" REFERENCES  PAGEREF _Toc430875996 \h 60 HYPERLINK \l "_Toc430875997" APPENDIX A  PAGEREF _Toc430875997 \h 63 HYPERLINK \l "_Toc430875998" APPENDIX B  PAGEREF _Toc430875998 \h 73 HYPERLINK \l "_Toc430875999" APPENDIX C  PAGEREF _Toc430875999 \h 77 HYPERLINK \l "_Toc430876000" APPENDIX D  PAGEREF _To c430876000 \h 79 HYPERLINK \l "_Toc430876001" APPENDIX E  PAGEREF _Toc430876001 \h 821.0 INTRODUCTIONNoSQL databases are a wide variety of non-relational database management systems that were developed to cater for issues such as volume of data stored for users, frequency of data access, performance and the needs that arise with analysing and processing such data to arrive at logical conclusions. NoSQL databases are normally preferred when the volume of data is large and cannot be handled using relational databases. They are normally distributed and process data in a parallel manner across a large number of servers effectively. NoSQL databases were invented by companies that were encountering problems dealing with large amounts of data while performing data analytics either predictively or for deriving conclusions. Some of the companies that came up with the idea are industry leaders such as Google and Facebook.SQL queries are often used to retrieve data in a fast and e fficient manner that embraces well-defined standards. With the use of standard SQL coupled with lack of substantial coding, it is easier to manage SQL databases. According to Bhatnagar (2008), it is hard to create an interface for SQL databases and expand them according to the large volumes of data to allow entering of the data into the database. NoSQL databases provide high performance and low latency. The major difference, however, between relational databases and NoSQL databases is the lack of an explicit data schema. NoSQL databases have a non-relational dynamic schema-less design that effectively examines and analyses raw data or datasets for the purposes of drawing conclusions. NoSQL databases rarely require schemas, and in case one is needed, they deduce the schema from already stored data. Consequently, there has been more development and increase in use of NoSQL databases that offer increased replication and scalability. These were developed to curb the loopholes that come with relational databases and aid in a more perfect data management system. The features of NoSQL databases allow for effective data analytics since, unlike relational databases, NoSQL databases are not affected by the big data problem that arises when standard SQL operations do not have acceptable performances in transactions (Edlich, 2015). Due to the large volumes of data that are flowing in on a daily basis, data analytics comes in handy for the purposes of determining what actually is important to business growth and what is not. It unearths patterns that cannot be deciphered easily. It also shows correlations used to make very important decisions (Croll Yoscovitz, 2013). Data scientists are in a position to analyse large amounts of data depending on the databases they are stored in and the accessibility of those databases. The outcome of the analytics goes on to prove what data are really essential and what are not essential to the organisations progress. There are a high num ber of NoSQL databases (around 170) and they are classified into different classes or models including key-value stores, document stores, column family stores and graph based databases. NoSQL databases at times incorporate several models at once. The key-value store, which is the simplest data model, is an associative array that is distributed persistently. Also known as K-V store, it has a key that is an identifier for a value. It may be used to share data across applications, like while storing session data for a specific user (Edlich, 2015). The document store normally stores object data that is semi-structured and the JSON format represents the metadata and objects. The column-oriented store is an advanced key-value store that organises data in their own individual column... Evaluation of NoSQL Databases for Data Analysis - 275 Words Evaluation of NoSQL Databases for Data Analysis (Dissertation Sample) Content: EVALUATION OF NoSQL DATABASES FOR DATA ANALYSIS(Name)(Course)(Tutor)School (University)Date DECLARATIONBy submitting this work, I declare that this work is entirely my own except those parts duly identified and referenced in my submission. It complies with any specified word limits and the requirements and regulations detailed in the assessment instructions and any other relevant programme and module documentation. In submitting this work I acknowledge that I have read and understood the regulations and code regarding academic misconduct, including that relating to plagiarism, as specified in the Programme Handbook. I also acknowledge that this work will be subject to a variety of checks for academic misconduct. ABSTRACTThe research sought to evaluate and establish the different types of NoSQL databases for data analysis. SQL databases have been used for a long time in data storage and retrieval, yet they are cumbered by the problem of unstructured data and scalabil ity. The research paper dealt with the scalability issue of the relational databases and how the non-relational databases solved the scalability problem. The research paper shows how non-relational databases developed a new set of data management features supporting data analytics and overcoming the challenges of relational databases. The research paper also defined the advantages of adopting various database systems and their benefits. With particular concentration on the four classifications of NoSQL databases and an example in each, the research paper compared the databases on grounds of data model, handling of relational information, performing of aggregation tasks and querying to finally determine the database that was best suited to undertake tasks that relate to data analytics and performance of analytical tasks in relation to unstructured data. TABLE OF CONTENTS TOC \o "1-3" \h \z \u  HYPERLINK \l "_Toc430875960" DECLARATION  PAGEREF _Toc430875960 \h 2 HYPERLINK \l "_Toc430875961" ABSTRACT  PAGEREF _Toc430875961 \h 3 HYPERLINK \l "_Toc430875962" 1.0 INTRODUCTION  PAGEREF _Toc430875962 \h 6 HYPERLINK \l "_Toc430875963" 1.1 Problem Statement  PAGEREF _Toc430875963 \h 9 HYPERLINK \l "_Toc430875964" 1.2 Research Aims and Objectives  PAGEREF _Toc430875964 \h 10 HYPERLINK \l "_Toc430875965" 1.3 Structure of Report  PAGEREF _Toc430875965 \h 11 HYPERLINK \l "_Toc430875966" 2.0 LITERATURE REVIEW  PAGEREF _Toc430875966 \h 12 HYPERLINK \l "_Toc430875967" 2.1 Development and Evolution of NoSQL Databases  PAGEREF _Toc430875967 \h 12 HYPERLINK \l "_Toc430875968" 2.2 Data Analysis using NoSQL Databases  PAGEREF _Toc430875968 \h 13 HYPERLINK \l "_Toc430875969" 2.3 Models of Distribution  PAGEREF _Toc430875969 \h 14 HYPERLINK \l "_Toc430875970" 2.4 CAP Theorem  PAGEREF _Toc430875970 \h 15 HYPERLINK \l "_Toc430875971" 2.5 Relational (SQL) v Non-relational Databases (NoSQL)  PA GEREF _Toc430875971 \h 16 HYPERLINK \l "_Toc430875972" 2.6 Types of NoSQL databases  PAGEREF _Toc430875972 \h 18 HYPERLINK \l "_Toc430875973" 2.7 Preview of Some NoSQL Solutions  PAGEREF _Toc430875973 \h 19 HYPERLINK \l "_Toc430875974" 3.0 RESEARCH METHODOLOGY  PAGEREF _Toc430875974 \h 21 HYPERLINK \l "_Toc430875975" 3.1 Introduction  PAGEREF _Toc430875975 \h 21 HYPERLINK \l "_Toc430875976" 3.2 Research Methodology  PAGEREF _Toc430875976 \h 21 HYPERLINK \l "_Toc430875977" 3.3 Research Design  PAGEREF _Toc430875977 \h 24 HYPERLINK \l "_Toc430875978" 3.4 Data Collection Techniques  PAGEREF _Toc430875978 \h 24 HYPERLINK \l "_Toc430875979" 3.5 Sampling Techniques  PAGEREF _Toc430875979 \h 25 HYPERLINK \l "_Toc430875980" 4.0 RESULT PRESENTATION, ANALYSIS AND DISCUSSION  PAGEREF _Toc430875980 \h 28 HYPERLINK \l "_Toc430875981" 4.1 Introduction  PAGEREF _Toc430875981 \h 28 HYPERLINK \l "_Toc430875982" 4. 2 Result Presentation  PAGEREF _Toc430875982 \h 29 HYPERLINK \l "_Toc430875983" 4.2.1 Key-value Stores (REDIS as the Case Study)  PAGEREF _Toc430875983 \h 29  HYPERLINK \l "_Toc430875984" 4.2.2 Column-based Database (Cassandra as the Case Study)  PAGEREF _Toc430875984 \h 34  HYPERLINK \l "_Toc430875985" 4.2.3 Document-based Database (MongoDB as the Case Study)  PAGEREF _Toc430875985 \h 38 HYPERLINK \l "_Toc430875986" 4.2.4 Graph-based Database (Neo4J as the Case Study)  PAGEREF _Toc430875986 \h 43 HYPERLINK \l "_Toc430875987" 4.3.5 Performance, Availability and Scalability of Key-value Store Databases (REDIS)  PAGEREF _Toc430875987 \h 50 HYPERLINK \l "_Toc430875988" 4.3.6 Performance, Availability and Scalability of Column Family Databases (Cassandra)  PAGEREF _Toc430875988 \h 51 HYPERLINK \l "_Toc430875989" 4.3.7 Performance, Availability and Scalability of Document Store Databases (MongoDB)  PAGEREF _Toc430875989 \h 53 HYPERLINK \l "_Toc430875990" 4.3.8 Performance, Availability and Scalability of Graph-Based Databases (Neo4J)  PAGEREF _Toc430875990 \h 55 HYPERLINK \l "_Toc430875991" 5.0 CONCLUSION AND RECOMMENDATIONS  PAGEREF _Toc430875991 \h 56 HYPERLINK \l "_Toc430875992" 5.1 General Overview  PAGEREF _Toc430875992 \h 56 HYPERLINK \l "_Toc430875993" 5.2 Limitations of the Study  PAGEREF _Toc430875993 \h 58 HYPERLINK \l "_Toc430875994" 5.3 Recommendations for Action  PAGEREF _Toc430875994 \h 58 HYPERLINK \l "_Toc430875995" 5.4 Recommendations for Further Research  PAGEREF _Toc430875995 \h 59 HYPERLINK \l "_Toc430875996" REFERENCES  PAGEREF _Toc430875996 \h 60 HYPERLINK \l "_Toc430875997" APPENDIX A  PAGEREF _Toc430875997 \h 63 HYPERLINK \l "_Toc430875998" APPENDIX B  PAGEREF _Toc430875998 \h 73 HYPERLINK \l "_Toc430875999" APPENDIX C  PAGEREF _Toc430875999 \h 77 HYPERLINK \l "_Toc430876000" APPENDIX D  PAGEREF _To c430876000 \h 79 HYPERLINK \l "_Toc430876001" APPENDIX E  PAGEREF _Toc430876001 \h 821.0 INTRODUCTIONNoSQL databases are a wide variety of non-relational database management systems that were developed to cater for issues such as volume of data stored for users, frequency of data access, performance and the needs that arise with analysing and processing such data to arrive at logical conclusions. NoSQL databases are normally preferred when the volume of data is large and cannot be handled using relational databases. They are normally distributed and process data in a parallel manner across a large number of servers effectively. NoSQL databases were invented by companies that were encountering problems dealing with large amounts of data while performing data analytics either predictively or for deriving conclusions. Some of the companies that came up with the idea are industry leaders such as Google and Facebook.SQL queries are often used to retrieve data in a fast and e fficient manner that embraces well-defined standards. With the use of standard SQL coupled with lack of substantial coding, it is easier to manage SQL databases. According to Bhatnagar (2008), it is hard to create an interface for SQL databases and expand them according to the large volumes of data to allow entering of the data into the database. NoSQL databases provide high performance and low latency. The major difference, however, between relational databases and NoSQL databases is the lack of an explicit data schema. NoSQL databases have a non-relational dynamic schema-less design that effectively examines and analyses raw data or datasets for the purposes of drawing conclusions. NoSQL databases rarely require schemas, and in case one is needed, they deduce the schema from already stored data. Consequently, there has been more development and increase in use of NoSQL databases that offer increased replication and scalability. These were developed to curb the loopholes that come with relational databases and aid in a more perfect data management system. The features of NoSQL databases allow for effective data analytics since, unlike relational databases, NoSQL databases are not affected by the big data problem that arises when standard SQL operations do not have acceptable performances in transactions (Edlich, 2015). Due to the large volumes of data that are flowing in on a daily basis, data analytics comes in handy for the purposes of determining what actually is important to business growth and what is not. It unearths patterns that cannot be deciphered easily. It also shows correlations used to make very important decisions (Croll Yoscovitz, 2013). Data scientists are in a position to analyse large amounts of data depending on the databases they are stored in and the accessibility of those databases. The outcome of the analytics goes on to prove what data are really essential and what are not essential to the organisations progress. There are a high num ber of NoSQL databases (around 170) and they are classified into different classes or models including key-value stores, document stores, column family stores and graph based databases. NoSQL databases at times incorporate several models at once. The key-value store, which is the simplest data model, is an associative array that is distributed persistently. Also known as K-V store, it has a key that is an identifier for a value. It may be used to share data across applications, like while storing session data for a specific user (Edlich, 2015). The document store normally stores object data that is semi-structured and the JSON format represents the metadata and objects. The column-oriented store is an advanced key-value store that organises data in their own individual column...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.