This week, you will begin the planning phase and conduct online research to investigate the requirements for setting up a data science technology stack and create a PowerPoint presentation that...

1 answer below »



This exercise for a PHD course work. Please review the word document and let me know if you will be able to do it.








Thanks






This week, you will begin the planning phase and conduct online research to investigate the requirements for setting up a data science technology stack and create a PowerPoint presentation that highlights your findings. Specifically, make sure the following areas are discussed in your presentation: · Available data warehousing and storage technologies · Tools used for Extraction, Transformation, and Loading (ETL) · Technologies that support Business Intelligence · Visualization tools · Machine Learning and Analytics Implementation Frameworks · Deployment Stack (architectures) · Common data science technology use case scenarios Discuss the implications of a poorly designed and managed data science technology stack on data science democratization initiatives and planning. Length: 12-15 PowerPoint slides with notes (200-350 words per slide), not including title and references slides References: Include a minimum of 5 scholarly references (be sure that at least two of the five are peer-reviewed research studies involving data science technology stack planning and data democratization from the school library to support your ideas). NB: …”two of the five are peer-reviewed research studies involving data science technology stack planning and data democratization from the school library….” I have downloaded 2 peer-reviewed research articles from the school library. It’s the attached PDF files (school_library_post_1.pdf, and school_library_post_2.pdf). Please use them for citation in place of the 2 required peer-reviewed research article. International Journal of Sport Communication, 2019, 12, 313–335 https://doi.org/10.1123/ijsc.2019-0051 © 2019 Human Kinetics, Inc. SCHOLARLY COMMENTARY The Blockchain Phenomenon: Conceptualizing Decentralized Networks and the Value Proposition to the Sport Industry Michael L. Naraine Deakin University, Australia The sport industry has experienced significant technological change in its environment with the recent rise of Bitcoin and its underlying foundation, blockchain. Accordingly, the purpose of this paper is to introduce and conceptu- ally ground blockchain in sport and discuss the implications and value proposition of blockchain to the sport industry. After a brief overview of blockchain and the technology stack, the mechanism is conceptually rooted in the network paradigm, a framework already known to the academic sport community. This treatment argues that the decentralized, closed, and dense mesh network produced by blockchain technology is beneficial to the sport industry. Notably, the article identifies blockchain’s capacity to facilitate new sources of revenue and improve data management and suggests that sport management and communication consider the value of blockchain and the technology stack as the digital footprint in the industry intensifies and becomes increasingly complex. Keywords: Bitcoin, cryptocurrency, network theory, technology On December 17, 2017, the price of Bitcoin (BTC) reached an all-time high, with 1 unit valued at roughly $19,783 U.S. (Morris, 2017). The digital medium of exchange (also known as a cryptocurrency) had been originally conceived by a person or group working under the pseudonym Satoshi Nakamoto in 2008 and experienced a high degree of volatility since its inception (Maurer, Nelms, & Swartz, 2013). In fact, during its infancy, large amounts of BTC were exchanged for “low-value” goods and services including pizzas and music albums and even facilitated the online transactions of illegal drugs, weapons, and pornographic material, much to the dismay of authorities in several jurisdictions. Despite these seedy beginnings, BTC matured and has experienced an increase in its value, drawing notice of larger businesses who began accepting the currency as a form of payment, including casinos, e-commerce sites, and even NCAA Division I bowl The author is with the Dept. of Management, Faculty of Business and Law, Deakin University, Burwood, VIC, Australia. Address correspondence to [email protected]. 313 Authenticated [email protected] | Downloaded 01/24/20 10:05 AM UTC https://doi.org/10.1123/ijsc.2019-0051 mailto:[email protected] games (Casey, 2014). Although the 2017 holiday shopping season might have helped BTC reach its highest point, it is difficult to challenge its growth trajectory with year-to-year increases upward of 2,000%. BTC is just one of several cryptocurrencies in the marketplace, however. Names such as Ethereum, Litecoin, Dash, and Ripple represent a small sample of virtual currency offering a digital alternative to traditional forms of value-exchange mediums (e.g., cash, credit). The expanse and popularity of these currencies is partly attributable to the notion that these “moneys” are not regulated by any government or central authority and thus cannot be manipulated by political will (Middlebrook & Hughes, 2014). Nonetheless, the infatuation with the value of cryptocurrencies overshadows key elements of the overall challenge to the traditional value-exchange paradigm, especially the notion of blockchain and decentralized networks. While Nakamoto’s (2008) advancement of BTC as a cryptocurrency is certainly novel, the underlying support mechanism known as blockchain is incredibly nuanced and has led to other decentralized movements including ride-sharing, microblogging, file-storage, and crowdfunding sites, to name a few (Swan, 2015). To wit, more business domains have changed their models to incorporate blockchain (Tapscott & Tapscott, 2016, 2017), signaling that while BTC and cryptocurrencies may be “fadlike,” there is a greater under- standing required of the underlying technology. Specifically, conceptualizing blockchain technology and understanding its impact on the sport industry has not yet occurred. This omission can also be explained on two fronts. First, sport organizations tend to maintain an inert state and often resist technological changes due to a knowledge gap and a lack of understanding of the new technology’s impact (Slack & Parent, 2006). This sentiment is exemplified by the recent surge of social media and sport literature in the field (Filo, Lock, & Karg, 2015); although social media has experienced significant growth over the past decade, sport management via sport communication is only now realizing the impact of this medium on sport stakeholders such as professional teams (e.g., Achen, Kaczorowski, Horsmann, & Ketzler, 2018), brands (e.g., Geurin & Burch, 2017), and governing bodies (e.g., Naraine & Parent, 2016a, 2016b). Sport organizations do make incremental, evolutionary changes such as the virtual assistant referee in global football and vehicle innovations in Formula One racing, but these are not radical changes to the operational landscape that Slack and Parent have described. As such, the sport management and communication academe is generally slow to react and under- stand new, radical change innovations. Second, the lack of discussion about blockchain in sport management and communication can be attributed to its computational complexity. Maintaining the social media analogy, the premise behind social networking sites like Facebook and Twitter has been simplified in sport: Users can interact and share with other users asynchronously and synchro- nously (Naraine & Karg, 2019). In this sense, conceptualizing social media does not require an in-depth understanding of its algorithms and various formulae. Conversely, blockchain has yet to be tailored for a sport management and communication audience and, given its association with BTC (Maurer et al., 2013), there is the potential to be overwhelmed with its mathematical association. Although this technology has yet to be examined, its presence in the sport industry is growing. Heitner (2018) reported that more sport enterprises are looking IJSC Vol. 12, No. 3, 2019 314 Naraine Authenticated [email protected] | Downloaded 01/24/20 10:05 AM UTC to blockchain technology to expand their customer base globally in a safe, protected environment, while Martínez (2017) pointed to highly visible profes- sional athletes like Stephen Curry and Jeremy Lin as investors in the blockchain sector. In addition, associated industries such as tourism have also turned to blockchain for growth and development (Kwok & Koh, 2018). With these trends, blockchain technology is poised to become more pervasive and join social media as a disruptive force in our industry, altering the present understanding of how sport organizations generate revenue, store data, and generally digitize the business management environment (Pegoraro, 2014). Moreover, Ratten and Ferreira (2016) argued that maintaining the sport industry’s upward growth trajectory is predicated on seeking out and adopting innovative approaches. Thus, while sport managers might hesitate to embrace this new technology, understanding its disruptive, innovative nature can help advance the industry further. As such, the purpose of this treatment is twofold: to introduce and conceptu- ally ground blockchain in sport and to discuss the implications and value proposition of blockchain to the sport industry. In order for scholars and practi- tioners to assess the importance and impact of blockchain technology, it is imperative that they develop an understanding of what blockchain entails and how it differs from the existing, accepted paradigm (Charitou & Markides, 2003). To help with this aim, I begin this treatment with an examination of the blockchain system, its characteristics, and, ultimately, its relationship to cryptocurrencies like BTC. After this initial discussion, the merits of blockchain’s decentralized network system are revealed, juxtaposed to the extant network paradigm literature in sport, which champions centralization. Finally, the possible applications of blockchain in the industry are discussed, with a notable emphasis on alternative revenue- generation schemes (e.g., sport-based tokens) and data-storage repositories. Blockchain and Technology Stacks Blockchain Definition To understand how blockchain could be useful and applied in the sport industry, it is prudent to define the concept and explain its functions. Blockchain is a decentralized and transparent recording system; simply put, it is a set of blocks of information strung together by various transactions over a peer-to-peer network (Zhao, Fan, & Yan, 2016). This definition might still be too abstract for the sport management and communication audience, so further unpacking is required. The blockchain process is enacted when a transaction is requested (see Figure 1). That transaction is generally a financial activity—as it is in the case of BTC and other cryptocurrencies—but could consist of some other action (e.g., a product moving through the supply chain, data being stored to a database), although the value-exchange paradigm is a useful, straightforward example of the process. Once a transaction is requested, a message is broadcast over a network of peers, a network of users connected through their phones, tablets, and computers. These peers begin to validate the transaction request based on specified criteria (i.e., a series of algorithms)—the technical, mathematical computing that is more relevant to computer science and less relevant to this discussion. Once the request meets those criteria and is validated, a piece of a block is created (i.e., known in BTC IJSC Vol. 12, No. 3, 2019 The Blockchain Phenomenon 315 Authenticated [email protected] | Downloaded 01/24/20 10:05 AM UTC F ig u re 1 — T he bl oc kc ha in pr oc es s. 316 IJSC Vol. 12, No. 3, 2019 Authenticated [email protected] | Downloaded 01/24/20 10:05 AM UTC terminology as a hash). This piece is combined with others to form a whole block. A whole block is then then placed alongside others to form a string of blocks known as a ledger (Tapscott & Tapscott, 2016). Ledgers are transparent but unalterable, removing the risk of manipulation by a user. Once a ledger has been created, the entire process ceases, with the initial transaction request completed, too. To a casual observer, this process may seem convoluted and unnecessary, but its dynamics are what makes it most valuable. In the traditional (but modern) value- exchange paradigm, one party could transmit money to another party for a product or service. This requires a centralized actor, such as a central bank, operating as the conduit between the two parties to ensure the validity of the transaction (i.e., that the purchasing party is using legal means). However, modern transactions also have multiple actors between two parties, including retail banks, credit unions, electronic-transfer facilitators (e.g., Visa, Mastercard), and payment-infrastructure merchants both traditional (e.g., Interac, Moneris) and emergent (e.g., PayTM, Alipay). As trusted intermediaries, these actors help determine whether the purchaser has the funds and ability to expend those funds for goods and services. Despite their trustworthiness, however, there remain three important considera- tions with these actors: time, cost, and security. When a buyer purchases a good or service, that transaction has several layers to confirm the authenticity and transfer of funds. Most purchases conducted using a credit card require several days to settle and confirm. There is also a cost factor; as for-profit institutions, these actors require revenue, so various fees are applied at various stages of the transaction. Finally, modern transactions can be susceptible to what is known as the double- spend problem (Maurer et al., 2013). This problem arises if a buyer attempts to make a fraudulent digital exchange and spend the same funds twice. For instance, while shopping online, a user decides to use a digital currency to pay for a good from two distinctive retailers. Because the currency is digital, there is the potential to use the same amount for both transactions, without either retailer knowing about the fraudulent behavior. While most online retailers have systems to detect fraudulent transactions on their backends (one of the reasons they are deemed trustworthy) and only accept payment from electronic-transfer facilitators, it is still plausible that a user could “hack” the system and attempt two transactions with the same set of funds simultaneously. The need for a secure backend also speaks to the costs associated with handling transactions in the modern environment. Thus, these three considerations underscore the value of blockchain—its intricacies facilitate quick, cost-effective, and safer transactions. To emphasize this point, let’s consider BTC. With blockchain technologies, the time required to initiate and complete a transaction is
Answered 4 days AfterMar 16, 2023

Answer To: This week, you will begin the planning phase and conduct online research to investigate the...

Shubham answered on Mar 21 2023
33 Votes
SETTING UP A DATA SCIENCE TECHNOLOGY STACK
SETTING UP A DATA SCIENCE TECHNOLOGY STACK
Available data warehousing and storage technologies
A data warehouse is a key component of a data science technology stack, providing a centralized repository of structured and organized data that can be easily accessed and analyzed by data scientists, analysts, and other stakeholders. A data warehouse is a large, centralized repository of data that is specifically designed to support business intelligence, reporting, and analytics activities. It is used to store structured, historical data from various sources and transform it into a format that is useful for analysis and decision-making.
Amazon Redshift: a fully managed, petabyte-scale data warehouse service that offers high performance and scalability. It integrates well with other Amazon Web Services (AWS) products and offers a range of security features (Naraine, 2019).
Google BigQuery: a serverless, cloud-based data warehouse service that can process large amounts of data quickly and efficiently. It integrates well with other Google Cloud Platform services and offers a range of security features.
Snowflake: a cloud-based data warehousing platform that offers high performance and scalabi
lity, as well as a range of security features. It integrates well with other cloud-based services and supports multiple data sources.
Microsoft Azure Synapse Analytics: a cloud-based analytics service that offers both data warehousing and big data analytics capabilities. It integrates well with other Microsoft Azure services and offers a range of security features.
A data warehouse is a key component of a data science technology stack, providing a centralized repository of structured and organized data that can be easily accessed and analyzed by data scientists, analysts, and other stakeholders. A data warehouse is a large, centralized repository of data that is specifically designed to support business intelligence, reporting, and analytics activities. It is used to store structured, historical data from various sources and transform it into a format that is useful for analysis and decision-making.
Amazon Redshift: a fully managed, petabyte-scale data warehouse service that offers high performance and scalability. It integrates well with other Amazon Web Services (AWS) products and offers a range of security features.
Google BigQuery: a serverless, cloud-based data warehouse service that can process large amounts of data quickly and efficiently. It integrates well with other Google Cloud Platform services and offers a range of security features.
Snowflake: a cloud-based data warehousing platform that offers high performance and scalability, as well as a range of security features. It integrates well with other cloud-based services and supports multiple data sources.
Microsoft Azure Synapse Analytics: a cloud-based analytics service that offers both data warehousing and big data analytics capabilities. It integrates well with other Microsoft Azure services and offers a range of security features.
2
Available data warehousing and storage technologies (Contd..)
When setting up a data science technology stack, choosing the right storage technology is crucial to ensure that data is easily accessible, secure, and scalable. There are several storage technologies available for storing data in a data science technology stack.
Relational databases: Relational databases are the most common type of database used in data science technology stacks. They are designed to store structured data in tables, with each table having a predefined schema. Examples include MySQL, PostgreSQL, and Microsoft SQL Server.
NoSQL databases: NoSQL databases are designed to store unstructured or semi-structured data, making them ideal for storing large volumes of data that don't fit neatly into tables. Examples include MongoDB, Cassandra, and Couchbase (Hird, Kariyeva & McDermid, 2021).
Object storage: Object storage is a type of data storage that is used to store unstructured data, such as images, videos, and documents. Object storage systems use a flat address space to store data, making it easier to scale and access data quickly. Examples include Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage.
Data lakes: Data lakes are large-scale storage repositories that allow organizations to store all types of data in their original format, without having to convert it into a structured format first. Data lakes are typically built on top of object storage systems and are used to store large volumes of data that can be used for data science and analytics.
When setting up a data science technology stack, choosing the right storage technology is crucial to ensure that data is easily accessible, secure, and scalable. There are several storage technologies available for storing data in a data science technology stack.
Relational databases: Relational databases are the most common type of database used in data science technology stacks. They are designed to store structured data in tables, with each table having a predefined schema. Examples include MySQL, PostgreSQL, and Microsoft SQL Server.
NoSQL databases: NoSQL databases are designed to store unstructured or semi-structured data, making them ideal for storing large volumes of data that don't fit neatly into tables. Examples include MongoDB, Cassandra, and Couchbase.
Object storage: Object storage is a type of data storage that is used to store unstructured data, such as images, videos, and documents. Object storage systems use a flat address space to store data, making it easier to scale and access data quickly. Examples include Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage.
Data lakes: Data lakes are large-scale storage repositories that allow organizations to store all types of data in their original format, without having to convert it into a structured format first. Data lakes are typically built on top of object storage systems and are used to store large volumes of data that can be used for data science and analytics.
3
Tools used for Extraction, Transformation, and Loading
Extract, Transform, Load (ETL) is a critical process in setting up a data science technology stack. It involves extracting data from various sources, transforming it into a format suitable for analysis, and loading it into a data warehouse or data lake. This process is vital to ensure that the data used for analysis is accurate, consistent, and reliable. There are several tools available for ETL, each with its own strengths and weaknesses. 
Apache Spark: Apache Spark is a popular open-source distributed computing system that is commonly used for large-scale data processing. It provides various modules for ETL, such as Spark SQL, Spark Streaming, and Spark MLlib. Spark's built-in support for distributed data processing and machine learning makes it an ideal choice for data science projects (Raschka, Patterson & Nolet, 2020).
Apache NiFi: Apache NiFi is an open-source data integration tool that is used for data routing, transformation, and system mediation. It provides a user-friendly web interface that makes it easy to create, monitor, and manage ETL pipelines. NiFi is designed to handle data in real-time, making it an excellent choice for streaming data processing.
Talend: Talend is a powerful open-source data integration tool that offers a comprehensive set of ETL features. It supports various data sources, including databases, flat files, and cloud storage. Talend provides a user-friendly drag-and-drop interface for building ETL pipelines, making it easy for non-technical users to create complex data integration workflows.
Extract, Transform, Load (ETL) is a critical process in setting up a data science technology stack. It involves extracting data from various sources, transforming it into a format suitable for analysis, and loading it into a data warehouse or data lake. This process is vital to ensure that the data used for analysis is accurate, consistent, and reliable. There are several tools available for ETL, each with its own strengths and weaknesses. 
Apache Spark: Apache Spark is a popular open-source distributed computing system that is commonly used for large-scale data processing. It provides various modules for ETL, such as Spark SQL, Spark Streaming, and Spark MLlib. Spark's built-in support for distributed data processing and machine learning makes it an ideal choice for data science projects.
Apache NiFi: Apache NiFi is an open-source data integration tool that is used for data routing, transformation, and system mediation. It provides a user-friendly web interface that makes it easy to create, monitor, and manage ETL pipelines. NiFi is designed to handle data in real-time, making it an excellent choice for streaming data processing.
Talend: Talend is a powerful open-source data integration tool that offers a comprehensive set of ETL features. It supports various data sources, including databases, flat files, and cloud storage. Talend provides a user-friendly drag-and-drop interface for building ETL pipelines, making it easy for non-technical users to create complex data integration workflows.
4
Tools used for Extraction, Transformation, and Loading (Contd..)
Apache Airflow: Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It provides a flexible and extensible architecture that makes it easy to integrate with various data sources and tools. Airflow's support for dynamic workflows and DAGs (Directed Acyclic Graphs) makes it an ideal choice for complex ETL pipelines.
AWS Glue: AWS Glue is a fully-managed ETL service provided by Amazon Web Services (AWS). It supports various data sources and provides a serverless architecture, making it easy to scale up or down based on demand. Glue provides a visual interface for creating ETL workflows and integrates seamlessly with other AWS services such as S3, Redshift, and EMR (Schatz et al. 2022).
Selecting the right ETL tool is crucial for setting up a data science technology stack. Tools has its strengths and weaknesses, and the choice ultimately depends on the specific requirements of the project. It is recommended to evaluate each tool's features and functionality before making a decision to ensure that it meets the project's needs.
Apache Airflow: Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It provides a flexible and extensible architecture that makes it easy to integrate with various data sources and tools. Airflow's support for dynamic workflows and DAGs (Directed Acyclic Graphs) makes it an ideal choice for complex ETL pipelines.
AWS Glue: AWS Glue is a fully-managed ETL service provided by Amazon Web Services (AWS). It supports various data sources and provides a serverless architecture, making it easy to scale up or down based on demand. Glue provides a visual interface for creating ETL workflows and integrates seamlessly with other AWS services such as S3, Redshift, and EMR.
Selecting the right ETL tool is crucial for setting up a data science technology stack. Tools has its strengths and weaknesses, and the choice ultimately depends on the specific requirements of the project. It is recommended to evaluate each tool's features and functionality before making a decision to ensure that it meets the project's needs.
5
Technologies that support Business Intelligence
Business Intelligence is an essential component of a data science technology stack, providing insights into an organization's data to make better business decisions. It provides critical insights into an organization's data, enabling better business decision-making. BI tools and technologies can help organizations collect, process, analyze, and visualize data, providing actionable insights into business performance, customer behavior, market trends, and other key indicators. It provides the infrastructure for analysis of the data and it provides the user-friendly approach for presentation of the information.
Tableau: Tableau is a popular data visualization tool that provides a user-friendly interface for creating interactive dashboards, reports, and charts. It supports various data sources, including databases, cloud storage, and spreadsheets. Tableau's drag-and-drop interface and interactive visualization capabilities make it easy for non-technical users to...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here