Assignment 1 Bank Marketing Case Study: loading and merging data Learning outcomes Load data using input Files in Various Formats to combine information from many data domains and sources Rename...


Assignment 1


Bank Marketing Case Study: loading and merging data


Learning outcomes



  1. Load data using input Files in Various Formats to combine information from many data domains and sources

  2. Rename columns and convert column types from character to numeric to prepare for merging

  3. Merge sas datasets to obtain a datawarehouse ready for analysis


Introduction


The head of Marketing wants to know which customers have the highest propensity for buying a Certificate of Deposit (CD) from the institution. The goal of this assignment is to create part of an analytical data mart by combining information from many data domains and sources.


Q1. Load data from customer_banking_info_promo.xslx



  • define the library name "mylib" and specify its location using libname

  • Use proc import DATAFILE to import customer_banking_info_promo.xlsx into a sas dataset named customer_banking_info_promo under mylib

  • Print the first five rows of the dataset add (obs=5) at the end of proc print.

In[]:
libname mylib '/folders/myfolders/Assignments';

Q2. Examine the variable Customer ID. Check the type and format.


In[]:

Q3. Column deletion/renaming


Look at the description of the different columns here:https://archive.ics.uci.edu/ml/datasets/bank+marketing



duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.



  • Within a data step, perform the following:

    • keep the output dataset name same as input (customer_banking_info_promo)

    • Rename "customer_id2" to customer_id

    • drop the column "duration" from the dataset.



  • print the first 5 observations in the dataset

In[]:

Q4. Load data from customer_banking_info.csv


load the data and print the first five rows.

In[]:

Q5. Renaming columns



  • use proc contents to examine the list of variables as before. You will see that customer_id1 is numerical with len=8. This is important to check as this column will be used to merge the datasets.

  • Within a data step, perform the following:

    • keep the output dataset name same as input (customer_banking_info)

    • Rename "customer_id1" as customer_id



  • print the first 5 observations in the dataset

In[]:
* use proc contents here;
In[]:
* code to rename columns here and print;

Q6. SAS data from customer_demographics.sas7bdat



  • print the first 5 rows of customer_demographics.sas7bdat

  • use proc contents and examine the list of variables. What is the type of customer_id

In[]:
* code to print here;
In[]:
* use proc contents here;

Q7. Convert from character to numeric type


Before merging multiple datasets, the common column between the datasets should be of the same type.
In customer_banking_info_promo, customer_id is defined as character. you are given a sample data step code to run:



  • the output dataset name customer_banking_info_promocv

  • to convert customer_id to numeric variable, we use the input function. reference:http://support.sas.com/kb/24/590.html

  • sample statement: numeric_var = input(char_var, 8.);

  • rename the numerical column as customer_id: rename old_name= customer_id;

In[]:

check the customer_id variable type again by using proc contents or proc means to see the list of numerical variables

In[]:
*check type again;

Q8. Data Merging



  • Join the three sources of data into a single SAS data set.

    • sort each of the datasets by customer_id

    • merge the three datasets using the merge function within a data step. name the new dataset as "customer_all"

    • print the first five observations.




Refer to



section 10.9 Controlling Observations in a Merged Data Set Chapter 10 - Subsetting and Combining SAS Data Sets Learning SAS by Example: A Programmer's Guide, Second Edition
In[]:
* code for merging goes here;
In[]:
* print data;

Q9. Reorder the columns


Reorder the column names in customer_all so customer_id appears first.


In[]:
*reorder;   /* to rename dataset if needed;  proc datasets library=mylib;    change customer_all2=customer_all; run; */  * print code here;
Oct 06, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here