Wednesday, June 3, 2020
Evaluation of NoSQL Databases for Data Analysis - 275 Words
Evaluation of NoSQL Databases for Data Analysis (Dissertation Sample) Content: EVALUATION OF NoSQL DATABASES FOR DATA ANALYSIS(Name)(Course)(Tutor)School (University)Date DECLARATIONBy submitting this work, I declare that this work is entirely my own except those parts duly identified and referenced in my submission. It complies with any specified word limits and the requirements and regulations detailed in the assessment instructions and any other relevant programme and module documentation. In submitting this work I acknowledge that I have read and understood the regulations and code regarding academic misconduct, including that relating to plagiarism, as specified in the Programme Handbook. I also acknowledge that this work will be subject to a variety of checks for academic misconduct. ABSTRACTThe research sought to evaluate and establish the different types of NoSQL databases for data analysis. SQL databases have been used for a long time in data storage and retrieval, yet they are cumbered by the problem of unstructured data and scalabil ity. The research paper dealt with the scalability issue of the relational databases and how the non-relational databases solved the scalability problem. The research paper shows how non-relational databases developed a new set of data management features supporting data analytics and overcoming the challenges of relational databases. The research paper also defined the advantages of adopting various database systems and their benefits. With particular concentration on the four classifications of NoSQL databases and an example in each, the research paper compared the databases on grounds of data model, handling of relational information, performing of aggregation tasks and querying to finally determine the database that was best suited to undertake tasks that relate to data analytics and performance of analytical tasks in relation to unstructured data. TABLE OF CONTENTS TOC \o "1-3" \h \z \u HYPERLINK \l "_Toc430875960" DECLARATION PAGEREF _Toc430875960 \h 2 HYPERLINK \l "_Toc430875961" ABSTRACT PAGEREF _Toc430875961 \h 3 HYPERLINK \l "_Toc430875962" 1.0 INTRODUCTION PAGEREF _Toc430875962 \h 6 HYPERLINK \l "_Toc430875963" 1.1 Problem Statement PAGEREF _Toc430875963 \h 9 HYPERLINK \l "_Toc430875964" 1.2 Research Aims and Objectives PAGEREF _Toc430875964 \h 10 HYPERLINK \l "_Toc430875965" 1.3 Structure of Report PAGEREF _Toc430875965 \h 11 HYPERLINK \l "_Toc430875966" 2.0 LITERATURE REVIEW PAGEREF _Toc430875966 \h 12 HYPERLINK \l "_Toc430875967" 2.1 Development and Evolution of NoSQL Databases PAGEREF _Toc430875967 \h 12 HYPERLINK \l "_Toc430875968" 2.2 Data Analysis using NoSQL Databases PAGEREF _Toc430875968 \h 13 HYPERLINK \l "_Toc430875969" 2.3 Models of Distribution PAGEREF _Toc430875969 \h 14 HYPERLINK \l "_Toc430875970" 2.4 CAP Theorem PAGEREF _Toc430875970 \h 15 HYPERLINK \l "_Toc430875971" 2.5 Relational (SQL) v Non-relational Databases (NoSQL) PA GEREF _Toc430875971 \h 16 HYPERLINK \l "_Toc430875972" 2.6 Types of NoSQL databases PAGEREF _Toc430875972 \h 18 HYPERLINK \l "_Toc430875973" 2.7 Preview of Some NoSQL Solutions PAGEREF _Toc430875973 \h 19 HYPERLINK \l "_Toc430875974" 3.0 RESEARCH METHODOLOGY PAGEREF _Toc430875974 \h 21 HYPERLINK \l "_Toc430875975" 3.1 Introduction PAGEREF _Toc430875975 \h 21 HYPERLINK \l "_Toc430875976" 3.2 Research Methodology PAGEREF _Toc430875976 \h 21 HYPERLINK \l "_Toc430875977" 3.3 Research Design PAGEREF _Toc430875977 \h 24 HYPERLINK \l "_Toc430875978" 3.4 Data Collection Techniques PAGEREF _Toc430875978 \h 24 HYPERLINK \l "_Toc430875979" 3.5 Sampling Techniques PAGEREF _Toc430875979 \h 25 HYPERLINK \l "_Toc430875980" 4.0 RESULT PRESENTATION, ANALYSIS AND DISCUSSION PAGEREF _Toc430875980 \h 28 HYPERLINK \l "_Toc430875981" 4.1 Introduction PAGEREF _Toc430875981 \h 28 HYPERLINK \l "_Toc430875982" 4. 2 Result Presentation PAGEREF _Toc430875982 \h 29 HYPERLINK \l "_Toc430875983" 4.2.1 Key-value Stores (REDIS as the Case Study) PAGEREF _Toc430875983 \h 29 HYPERLINK \l "_Toc430875984" 4.2.2 Column-based Database (Cassandra as the Case Study) PAGEREF _Toc430875984 \h 34 HYPERLINK \l "_Toc430875985" 4.2.3 Document-based Database (MongoDB as the Case Study) PAGEREF _Toc430875985 \h 38 HYPERLINK \l "_Toc430875986" 4.2.4 Graph-based Database (Neo4J as the Case Study) PAGEREF _Toc430875986 \h 43 HYPERLINK \l "_Toc430875987" 4.3.5 Performance, Availability and Scalability of Key-value Store Databases (REDIS) PAGEREF _Toc430875987 \h 50 HYPERLINK \l "_Toc430875988" 4.3.6 Performance, Availability and Scalability of Column Family Databases (Cassandra) PAGEREF _Toc430875988 \h 51 HYPERLINK \l "_Toc430875989" 4.3.7 Performance, Availability and Scalability of Document Store Databases (MongoDB) PAGEREF _Toc430875989 \h 53 HYPERLINK \l "_Toc430875990" 4.3.8 Performance, Availability and Scalability of Graph-Based Databases (Neo4J) PAGEREF _Toc430875990 \h 55 HYPERLINK \l "_Toc430875991" 5.0 CONCLUSION AND RECOMMENDATIONS PAGEREF _Toc430875991 \h 56 HYPERLINK \l "_Toc430875992" 5.1 General Overview PAGEREF _Toc430875992 \h 56 HYPERLINK \l "_Toc430875993" 5.2 Limitations of the Study PAGEREF _Toc430875993 \h 58 HYPERLINK \l "_Toc430875994" 5.3 Recommendations for Action PAGEREF _Toc430875994 \h 58 HYPERLINK \l "_Toc430875995" 5.4 Recommendations for Further Research PAGEREF _Toc430875995 \h 59 HYPERLINK \l "_Toc430875996" REFERENCES PAGEREF _Toc430875996 \h 60 HYPERLINK \l "_Toc430875997" APPENDIX A PAGEREF _Toc430875997 \h 63 HYPERLINK \l "_Toc430875998" APPENDIX B PAGEREF _Toc430875998 \h 73 HYPERLINK \l "_Toc430875999" APPENDIX C PAGEREF _Toc430875999 \h 77 HYPERLINK \l "_Toc430876000" APPENDIX D PAGEREF _To c430876000 \h 79 HYPERLINK \l "_Toc430876001" APPENDIX E PAGEREF _Toc430876001 \h 821.0 INTRODUCTIONNoSQL databases are a wide variety of non-relational database management systems that were developed to cater for issues such as volume of data stored for users, frequency of data access, performance and the needs that arise with analysing and processing such data to arrive at logical conclusions. NoSQL databases are normally preferred when the volume of data is large and cannot be handled using relational databases. They are normally distributed and process data in a parallel manner across a large number of servers effectively. NoSQL databases were invented by companies that were encountering problems dealing with large amounts of data while performing data analytics either predictively or for deriving conclusions. Some of the companies that came up with the idea are industry leaders such as Google and Facebook.SQL queries are often used to retrieve data in a fast and e fficient manner that embraces well-defined standards. With the use of standard SQL coupled with lack of substantial coding, it is easier to manage SQL databases. According to Bhatnagar (2008), it is hard to create an interface for SQL databases and expand them according to the large volumes of data to allow entering of the data into the database. NoSQL databases provide high performance and low latency. The major difference, however, between relational databases and NoSQL databases is the lack of an explicit data schema. NoSQL databases have a non-relational dynamic schema-less design that effectively examines and analyses raw data or datasets for the purposes of drawing conclusions. NoSQL databases rarely require schemas, and in case one is needed, they deduce the schema from already stored data. Consequently, there has been more development and increase in use of NoSQL databases that offer increased replication and scalability. These were developed to curb the loopholes that come with relational databases and aid in a more perfect data management system. The features of NoSQL databases allow for effective data analytics since, unlike relational databases, NoSQL databases are not affected by the big data problem that arises when standard SQL operations do not have acceptable performances in transactions (Edlich, 2015). Due to the large volumes of data that are flowing in on a daily basis, data analytics comes in handy for the purposes of determining what actually is important to business growth and what is not. It unearths patterns that cannot be deciphered easily. It also shows correlations used to make very important decisions (Croll Yoscovitz, 2013). Data scientists are in a position to analyse large amounts of data depending on the databases they are stored in and the accessibility of those databases. The outcome of the analytics goes on to prove what data are really essential and what are not essential to the organisations progress. There are a high num ber of NoSQL databases (around 170) and they are classified into different classes or models including key-value stores, document stores, column family stores and graph based databases. NoSQL databases at times incorporate several models at once. The key-value store, which is the simplest data model, is an associative array that is distributed persistently. Also known as K-V store, it has a key that is an identifier for a value. It may be used to share data across applications, like while storing session data for a specific user (Edlich, 2015). The document store normally stores object data that is semi-structured and the JSON format represents the metadata and objects. The column-oriented store is an advanced key-value store that organises data in their own individual column... Evaluation of NoSQL Databases for Data Analysis - 275 Words Evaluation of NoSQL Databases for Data Analysis (Dissertation Sample) Content: EVALUATION OF NoSQL DATABASES FOR DATA ANALYSIS(Name)(Course)(Tutor)School (University)Date DECLARATIONBy submitting this work, I declare that this work is entirely my own except those parts duly identified and referenced in my submission. It complies with any specified word limits and the requirements and regulations detailed in the assessment instructions and any other relevant programme and module documentation. In submitting this work I acknowledge that I have read and understood the regulations and code regarding academic misconduct, including that relating to plagiarism, as specified in the Programme Handbook. I also acknowledge that this work will be subject to a variety of checks for academic misconduct. ABSTRACTThe research sought to evaluate and establish the different types of NoSQL databases for data analysis. SQL databases have been used for a long time in data storage and retrieval, yet they are cumbered by the problem of unstructured data and scalabil ity. The research paper dealt with the scalability issue of the relational databases and how the non-relational databases solved the scalability problem. The research paper shows how non-relational databases developed a new set of data management features supporting data analytics and overcoming the challenges of relational databases. The research paper also defined the advantages of adopting various database systems and their benefits. With particular concentration on the four classifications of NoSQL databases and an example in each, the research paper compared the databases on grounds of data model, handling of relational information, performing of aggregation tasks and querying to finally determine the database that was best suited to undertake tasks that relate to data analytics and performance of analytical tasks in relation to unstructured data. TABLE OF CONTENTS TOC \o "1-3" \h \z \u HYPERLINK \l "_Toc430875960" DECLARATION PAGEREF _Toc430875960 \h 2 HYPERLINK \l "_Toc430875961" ABSTRACT PAGEREF _Toc430875961 \h 3 HYPERLINK \l "_Toc430875962" 1.0 INTRODUCTION PAGEREF _Toc430875962 \h 6 HYPERLINK \l "_Toc430875963" 1.1 Problem Statement PAGEREF _Toc430875963 \h 9 HYPERLINK \l "_Toc430875964" 1.2 Research Aims and Objectives PAGEREF _Toc430875964 \h 10 HYPERLINK \l "_Toc430875965" 1.3 Structure of Report PAGEREF _Toc430875965 \h 11 HYPERLINK \l "_Toc430875966" 2.0 LITERATURE REVIEW PAGEREF _Toc430875966 \h 12 HYPERLINK \l "_Toc430875967" 2.1 Development and Evolution of NoSQL Databases PAGEREF _Toc430875967 \h 12 HYPERLINK \l "_Toc430875968" 2.2 Data Analysis using NoSQL Databases PAGEREF _Toc430875968 \h 13 HYPERLINK \l "_Toc430875969" 2.3 Models of Distribution PAGEREF _Toc430875969 \h 14 HYPERLINK \l "_Toc430875970" 2.4 CAP Theorem PAGEREF _Toc430875970 \h 15 HYPERLINK \l "_Toc430875971" 2.5 Relational (SQL) v Non-relational Databases (NoSQL) PA GEREF _Toc430875971 \h 16 HYPERLINK \l "_Toc430875972" 2.6 Types of NoSQL databases PAGEREF _Toc430875972 \h 18 HYPERLINK \l "_Toc430875973" 2.7 Preview of Some NoSQL Solutions PAGEREF _Toc430875973 \h 19 HYPERLINK \l "_Toc430875974" 3.0 RESEARCH METHODOLOGY PAGEREF _Toc430875974 \h 21 HYPERLINK \l "_Toc430875975" 3.1 Introduction PAGEREF _Toc430875975 \h 21 HYPERLINK \l "_Toc430875976" 3.2 Research Methodology PAGEREF _Toc430875976 \h 21 HYPERLINK \l "_Toc430875977" 3.3 Research Design PAGEREF _Toc430875977 \h 24 HYPERLINK \l "_Toc430875978" 3.4 Data Collection Techniques PAGEREF _Toc430875978 \h 24 HYPERLINK \l "_Toc430875979" 3.5 Sampling Techniques PAGEREF _Toc430875979 \h 25 HYPERLINK \l "_Toc430875980" 4.0 RESULT PRESENTATION, ANALYSIS AND DISCUSSION PAGEREF _Toc430875980 \h 28 HYPERLINK \l "_Toc430875981" 4.1 Introduction PAGEREF _Toc430875981 \h 28 HYPERLINK \l "_Toc430875982" 4. 2 Result Presentation PAGEREF _Toc430875982 \h 29 HYPERLINK \l "_Toc430875983" 4.2.1 Key-value Stores (REDIS as the Case Study) PAGEREF _Toc430875983 \h 29 HYPERLINK \l "_Toc430875984" 4.2.2 Column-based Database (Cassandra as the Case Study) PAGEREF _Toc430875984 \h 34 HYPERLINK \l "_Toc430875985" 4.2.3 Document-based Database (MongoDB as the Case Study) PAGEREF _Toc430875985 \h 38 HYPERLINK \l "_Toc430875986" 4.2.4 Graph-based Database (Neo4J as the Case Study) PAGEREF _Toc430875986 \h 43 HYPERLINK \l "_Toc430875987" 4.3.5 Performance, Availability and Scalability of Key-value Store Databases (REDIS) PAGEREF _Toc430875987 \h 50 HYPERLINK \l "_Toc430875988" 4.3.6 Performance, Availability and Scalability of Column Family Databases (Cassandra) PAGEREF _Toc430875988 \h 51 HYPERLINK \l "_Toc430875989" 4.3.7 Performance, Availability and Scalability of Document Store Databases (MongoDB) PAGEREF _Toc430875989 \h 53 HYPERLINK \l "_Toc430875990" 4.3.8 Performance, Availability and Scalability of Graph-Based Databases (Neo4J) PAGEREF _Toc430875990 \h 55 HYPERLINK \l "_Toc430875991" 5.0 CONCLUSION AND RECOMMENDATIONS PAGEREF _Toc430875991 \h 56 HYPERLINK \l "_Toc430875992" 5.1 General Overview PAGEREF _Toc430875992 \h 56 HYPERLINK \l "_Toc430875993" 5.2 Limitations of the Study PAGEREF _Toc430875993 \h 58 HYPERLINK \l "_Toc430875994" 5.3 Recommendations for Action PAGEREF _Toc430875994 \h 58 HYPERLINK \l "_Toc430875995" 5.4 Recommendations for Further Research PAGEREF _Toc430875995 \h 59 HYPERLINK \l "_Toc430875996" REFERENCES PAGEREF _Toc430875996 \h 60 HYPERLINK \l "_Toc430875997" APPENDIX A PAGEREF _Toc430875997 \h 63 HYPERLINK \l "_Toc430875998" APPENDIX B PAGEREF _Toc430875998 \h 73 HYPERLINK \l "_Toc430875999" APPENDIX C PAGEREF _Toc430875999 \h 77 HYPERLINK \l "_Toc430876000" APPENDIX D PAGEREF _To c430876000 \h 79 HYPERLINK \l "_Toc430876001" APPENDIX E PAGEREF _Toc430876001 \h 821.0 INTRODUCTIONNoSQL databases are a wide variety of non-relational database management systems that were developed to cater for issues such as volume of data stored for users, frequency of data access, performance and the needs that arise with analysing and processing such data to arrive at logical conclusions. NoSQL databases are normally preferred when the volume of data is large and cannot be handled using relational databases. They are normally distributed and process data in a parallel manner across a large number of servers effectively. NoSQL databases were invented by companies that were encountering problems dealing with large amounts of data while performing data analytics either predictively or for deriving conclusions. Some of the companies that came up with the idea are industry leaders such as Google and Facebook.SQL queries are often used to retrieve data in a fast and e fficient manner that embraces well-defined standards. With the use of standard SQL coupled with lack of substantial coding, it is easier to manage SQL databases. According to Bhatnagar (2008), it is hard to create an interface for SQL databases and expand them according to the large volumes of data to allow entering of the data into the database. NoSQL databases provide high performance and low latency. The major difference, however, between relational databases and NoSQL databases is the lack of an explicit data schema. NoSQL databases have a non-relational dynamic schema-less design that effectively examines and analyses raw data or datasets for the purposes of drawing conclusions. NoSQL databases rarely require schemas, and in case one is needed, they deduce the schema from already stored data. Consequently, there has been more development and increase in use of NoSQL databases that offer increased replication and scalability. These were developed to curb the loopholes that come with relational databases and aid in a more perfect data management system. The features of NoSQL databases allow for effective data analytics since, unlike relational databases, NoSQL databases are not affected by the big data problem that arises when standard SQL operations do not have acceptable performances in transactions (Edlich, 2015). Due to the large volumes of data that are flowing in on a daily basis, data analytics comes in handy for the purposes of determining what actually is important to business growth and what is not. It unearths patterns that cannot be deciphered easily. It also shows correlations used to make very important decisions (Croll Yoscovitz, 2013). Data scientists are in a position to analyse large amounts of data depending on the databases they are stored in and the accessibility of those databases. The outcome of the analytics goes on to prove what data are really essential and what are not essential to the organisations progress. There are a high num ber of NoSQL databases (around 170) and they are classified into different classes or models including key-value stores, document stores, column family stores and graph based databases. NoSQL databases at times incorporate several models at once. The key-value store, which is the simplest data model, is an associative array that is distributed persistently. Also known as K-V store, it has a key that is an identifier for a value. It may be used to share data across applications, like while storing session data for a specific user (Edlich, 2015). The document store normally stores object data that is semi-structured and the JSON format represents the metadata and objects. The column-oriented store is an advanced key-value store that organises data in their own individual column...
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.