dnsdvvhvcgdv
Introduction Importance of chosen area: Telecommunication companies currently huge and dynamic. The information extracted from its data sets have multiple uses. It can produce results and conclusion on many topics. Telecommunication area is highly dynamic area. Which helps in generation of varieties of data and its analysis is of great importance if we want to study this area of business.. The most interesting fact behind choosing this dataset is its volume. In this data mining project, we selected Telecommunication data set because. its highly competitive and rapidly expanding every time. Other business organizations which are dependent on telecommunication can also take reference from its data sets before starting their business. As this field has multiple dimension we got to learn about data mining and its tool extraction within its large area. The main sources of telecommunication data are call detail, network and customer data (Samarkin and Tarasov, 2016). The information extracted by datamining can help telecommunication achieve its goal. At present these analysis helps the organization for understanding the business involved, catch any sorts of fraudulent activities, know telecommunication patterns, improve quality services to users, etc. This experiment is conducted to analyze the telecommunication data for analysis of activities to track the reason of people switching to other telecommunication facilities. Data Preparation and Feature extraction: Select data: After we decided the field of dataset, the next step is the process of collecting and selecting data. The data relevant to data analysis was collected from different sources. Clean data: Data cleaning refers to the process of removing and fixing incorrect, inconsistent, corrupted and duplicate data in the data sets. The manual process of cleaning data won’t be helpful in the data sets with thousand records. So, we have opted for an application ‘open refine’ for cleaning our data set. It is a powerful tool for cleaning large and messy data. We can run this software in any platform. It allows us to choose files, sort records, check for redundancies, and remove non consistent entries. With the use of this software the data cleaning process was more automated and complexity was reduced. Construct data/feature extraction Feature extraction is a process of finding reducing unimplemented and less important data to describe large sets of data (U, 2016). This process involves using only required resources for generating data. It makes the process of classification easy and fast. It is also referred as the projection of higher dimension feature to lower dimension feature. The feature thus reduced must be uncorrelated and cannot be reduced further in future. The data also must have a large variance. Construct Data: We have updated the datasets with number of steps before starting the process of data mining. It is actually the process of creating new feature with the combination of previous feature. We are using the datasets of telecommunication for quality of service to users of telecommunication. Output Derived attributes: The derived attributes are referred as a completely new records constructed from the existing ones. As the paperless field and PaymentMethod field made similar sense we added them add DigitalPayment(bool) field in place of these two fields. Add new attributes to accessed data: We have added a new column entertainment interest in the data sets. The column gives a quick overview on the users’ interest. We have grouped together raw attributes that are highly correlated. We used principal Component Analysis the most standard one for this process. Activities single attribute transformations: The process Attribute transformation involves altering the data by replacing the selected attribute by one or more new attribute. This process facilitates analysis and the new attributes are functionally dependent on the old ones. Modeling: Selecting the actual modeling technique: In data mining, data modeling refers to formulating every step and choosing techniques required to achieve the solution. The modeling tools should provide more expertise in the methodological point of view. It plays a crucial role in the growth of any business. It is a process of creating a model of data for the data stored in database. There are different data modeling techniques they are: Flat, Star schema, Hierarchical, relational an Object-relational. We have chosen Object-oriented database model for our data mining. The Object-oriented database is a data base that is based on the OOP concept. It stores data in the form of objects. It helped in persistent storage of objects. Record the actual modeling technique used As we used the object-oriented data base for data modeling. Using this concept, we organized elements of data and standardized the relationship within them and the real-world entities. The composition relationship is based on how an object is composed of other. Object oriented attributes have more semantics and can be reused many times. We also used the concept of encapsulation in order for restricting the direct access to some data components. Also, the polymorphism helped us in implementing one interface at different levels. Output modeling assumptions: After the phase of data modeling successfully, we are now trying to compare between our assumptions and data description report. The following mentioned are our major assumptions. · There exists a small gap based on the gender of client. · There are less dependencies on most of clients. · 90% of clients are connected with telephone service. · Half of the clients do not prefer additional service. · More than 50% of people use paper less billing and payment. · One third of clients make payment through electronic check. · About half of clients have a contract type- ‘From month to month’ Generate task designs Task Generate test designs: Testing is the process to ensure that the working of our data mining model is good against the real data. In this process, we have separated the data into training and testing sets to test the accuracy of system. After this process, we performed the analyses on result to discover pattern that have a meaning in targeted business ratio. The following is the sample of technique we used for defining the train and test dataset for training. drop_elmnts = ['UnNamed: 0', 'customerID', ‘Churn’] train = data.drop(drop_elements, axis=1)//for_train test = data[‘Churn’]//for_test After, we proceed to the next step that is finding out the correlation. Here we found out the pair wise correlation of all columns in the data frame. training_correlation= data.corr() Churn = pd.DataFrame(training_correlation[‘Churn’].sort_values(ascending=true)) Churn Churn tenure -0.348 TechSupport -0.28 OnlineBackup -0.19 DeviceProtection -0.18 Dependents 0.16 InternetService -0.05 StreamingMovies -0.04 StreamingTv -0.04 customerId -0.01 gender -0.009 PhoneService 0.009 PaymentMethod 0.104 SeniorCitizen 0.15 MonthlyCharges 0.18 PaperlessBilling 0.19 Churn 1 Build Models: The process of building model involves multiple sets of data combined and analyzed to uncover relationships and patterns between them. Here, we have used KNIME for generating data models. We crated data flows(pipelines), executed some or all of the analysis steps, and later studied the result, models, with the help of interactive widgets and views (Lee, 2021). It is a free and an open-source data mining tools written in Java and based on Eclipse. With the use of KNIME, we skipped the coding part and have the result presented using visualization tools. Our first prediction is that the user will churn. Following are the steps that we have undergone: · The first step in building model is to locate the csv file i.e. the datasets for data mining from KNIME application. Then we execute the node. · We chose decision tree learner node for demonstration by fixing settings at the node configuration. · Execute node and open view to see decision tree. We pluck probability that customer will pay digitally. · The results can be viewed using data visualization tools such as ROC curve, pie chart etc. Output Model description: With the use of KNIME we created a workflow to predict the growth of digital payment choice by customers. We created a valuable and efficient workflow in record time without any use of codes. The graphical user interface is very comfortable and its main beauty over others is its workflow coach that consulted us during the confusion by anticipating the next step (Abualkibash, 2019). Also, we had a number of options for better visualization providing the viewer a deep level of understanding. It was relatively easy to find and reuse the node to build the complete pipeline. Our final data model revealed the probability of user of churning. And the model can be improved more later by using the advanced machine learning algorithm in future. We can also improve it more by applying an optimization loop when reused. Evaluations and Conclusions: As a result, we generated the prediction of users to churn by processing the telecommunication data. We performed the step-by-step process of data mining to evaluate the result. We used various tools and algorithms to visualize our result of prediction of the churn rate and possibility of customers on the basis of their location. There were 27% of people prone to churn the current telecommunication service. We found out that most of the customers are from remote geographic location are tending to churn as compared to customer of other location. That means the telecom service facility needs to be improved in these regions. The current telecommunication must focus in this issue for continuing its service for a long run. Bibliography Samarkin, M. and Tarasov, V., 2016. Telecommunication company big data classification by data mining technique. Infokommunikacionnye tehnologii, 14(3), pp.258-263. U, S., 2016. A Survey on Feature Extraction Techniques for Image Retrieval using Data Mining & Image Processing Techniques. International Journal Of Engineering And Computer Science,. Lee, E., 2021. How do we build trust in machine learning models?. SSRN Electronic Journal,. Abualkibash, M., 2019. Machine Learning in Network Security Using KNIME Analytics. International Journal of Network Security & Its Applications, 11(5), pp.1-14.