Assignment Question is attached in word file for your action Dear Team I have uploaded once again the assignment Hand out with complete detail requirements and the data set which is attached please...

1 answer below »
Assignment Question is attached in word file for your action Dear Team I have uploaded once again the assignment Hand out with complete detail requirements and the data set which is attached please note I need the full assignment with all mentioned requirement with in the Assignment word document to be completed with relevant all software Hadoop, SAS,Hive) in an organised format before 17th March 2019. please the document should have the following topics;


AbstractIntroductionRelated workPhase 1: [using SAS studio]Data observationData cleaningPhase2: [using SAS and tableau]EDAData transformationFeature engineeringPhase 3: [SAS and Cloudera hadoop & hive]Prediction & Classification8HypothesisDiscussioni. Initial data explorationData pre-processing/cleaningPrediction/classification:Loading on hdfs/hiveHypothesis with Hive-SQLConclusion:References




CS-IT019-2-A-Data Management Individual Assignment Page 5 of 5 DATA MANAGEMENT LEARNING OUTCOMES 1. Evaluate the various data types, data storage systems and associated techniques for indexing and retrieving data. 2. Design feature engineering techniques to transform transactional data into meaningful inputs in order to create a predictive model. 3. Propose a suitable approach to designing a data warehouse to store and process large datasets. DATA MANAGEMENT The machine learning pipeline involves several tasks before the development of a predictive/descriptive models. The inevitable and vital process includes preparing and understanding the data. Moreover, the performance of the predictive/descriptive model depends on the choice of pre-processing techniques. For the assignment, you are required to prepare and explore the given dataset. It is imperative to explain and justify the pre-processing, transformation, and feature engineering techniques that have been chosen. Your analysis should be deep and in detail, also it must go further than what has already been covered in this course. The assignment should involve a number of experiments, and a detailed exploration and analysis of the results using SAS Studio, Apache Hadoop distribution, and Visual Analytics Tools (Tableau). You need to do the following tasks: 1. Related Works In this section, you are supposed to research and present the other works related to the application domain. 2. Initial Data Exploration This section should contain the following task. · Indicate the type of each attribute (nominal, ordinal, interval or ratio). · Identify the values of the summarising properties for each attribute including frequency and spread e.g. value ranges of the attributes, frequency of values, distributions, medians, means, variances, and percentiles. Wherever necessary, use proper visualisations for the corresponding statistics. · Using SAS explore your dataset and identify any outliers, missing values, "interesting" attributes and specific values of those attributes. 3. Data Pre-processing Investigate the required method(s) to handle the incomplete, noisy and inconsistent data. Report each of the applied techniques with detailed explanations. Show your results and justify your approach. NOTE: Easiest way to handle dirty data is through removing the feature(s) / instance(s). Choosing this method will be award ZERO for pre-processing. 4. Feature Engineering Several Data Mining/Machine Learning algorithms are designed to work with qualitative or quantitative data and very few algorithms support mixed data. Hence, this task requires you to develop two datasets. The first dataset should represent all variables in the qualitative and second dataset in quantitative. Individual attributes, need to be discretized/transformed with an appropriate method(s) and proper justification to be provided. In addendum, the metadata should be created for each dataset. 5. Exploratory Data Analysis (EDA) This task requires you to perform an analysis on the two datasets generated during your feature engineering. You are evaluated based on the approaches undertaken to get familiar with the dataset. 6. Apache Hadoop Load the dataset (cleaned dataset or transformed dataset) into Hive configured with optimized read performance on the tables. You are free to choose your own choice of Apache Hadoop distribution (Hortonworks, Cloudera, MapR etc.). 7. Hypothesis Formulate a minimum of FIVE (5) hypotheses based on the dataset (cleaned dataset or transformed dataset) with required analytical variable(s). Interpret the hypotheses with the query resulted from HIVEQL and/or visualization. Deliverables The deliveries include: · A report, which structure should follow the tasks of the assignment. · SAS program (Initial Data Exploration, Data Pre-processing, and Dataset Transformation) and Hive queries with an individual file for each task. Your report should include the following: Abstract – A self-contained, short, and powerful statement/brief that describes your work. It may contain the scope, purpose, results, and contents of the work. [180 to 250 words] Introduction - The purpose of your report. Background information about the topic. You also have to place some brief details of your methods applied for the study. Include an outline of the structure of the report. [800 to 1000 words] Related Work - Carefully structure your findings. It may be useful to do a chronological format where you discuss from the earliest to the latest research, placing your research appropriately in the chronology. Alternately, you could write in a thematic way, outlining the various themes that you discovered in the research regarding the topic. [1000 to 1500 words] Method - This section should contain detail exploration of the dataset, pre-processing, feature engineering, EDA, Hive and Hypothesis. [No limit] Discussion - For each of the task include a section title in your report. Finally, you need to summarize your findings, and this summary section should NOT be a narrative of your tasks, but a summarized informative section of what is your findings of the data. This section should provide detail interpretation of the work along with the supporting related works. [500 to 1000 words] For example, it should include details like specific characteristics (or values) of some attributes, important information about the distributions, relationship or association that exist between variables found that should be investigated more rigorously, etc. Conclusion – In this section, you need to state your position about what you gained in this assignment that can contribute to other readers. Documentation Format: · Typeface: Times New Roman. Boldface, italic & lines can be used for emphasizing and to enhance readability. · Font size: 12 (except titles and headings). · Margins: 1” from the left, right, top & bottom of the edges of the A4 paper. · Spacing: 1.5 lines between texts of a paragraph. · Alignment: Justify. · Headers and footers can be used all pages must be numbered accordingly. · Standard cover page as available in the learning management system Level Masters COMSATS2019
Answered Same DayMar 12, 2021

Answer To: Assignment Question is attached in word file for your action Dear Team I have uploaded once again...

Vidhi answered on Mar 17 2021
147 Votes
data_manage/attribute.ctl
{"version":2,"type":"import","id":"52015a6a-3e41-43bd-8d7e-1fe2b1090424","name":"dataset","label":"dataset","description":"","created":1552754001473,"modified":1552754002381,"notes":"","parameters":{"server":"","target":"com.sas.ep.sascoder.execution.producers.VPP","action":"runSASCode","priority":"Reserved","code":"/* Generated Code (IMPORT) */\r\n/* Source File: dataset.csv */\r\n/* Source Path: /home/shaziyaislam260/new one */\r\n/* Code generated on: 3/16/19, 10:03 PM */\r\n\r\n%web_drop_table(WORK.IMPORT);\r\n\r\n\r\nFILENAME REFFILE '/home/shaziyaislam260/new one/dataset.csv';\r\n\r\nPROC IMPORT DATAFILE=REFFILE\r\n\tDBMS=CSV\r\n\tOUT=WORK.IMPORT;\r\n\tGETNAMES=YES;\r\nRUN;\r\n\r\nPROC CONTENTS DATA=WORK.IMPORT; RUN;\r\n\r\n\r\n%web_open_table(WORK.IMPORT);","resource":false,"outputType":"TABLE","outputName":"IMPORT","outputLocation":"WORK","fileName":"dataset.csv","filePath":"/home/shaziyaislam260/new one","fileType":"","fileSheet":"","fileTable":"","delimiterOption":"","dataRowOption":-1,"guessingRowsOption":-1,"getnamesOption":true,"quoteDelimiterOption":true,"eolDelimiterOption":""},"properties":{"left":"20","top":"20","width":"100","height":"60","region":"output","fillcolor":"#E0E6F1","linecolor":"#6882a3","tooltip":"","portsonly":false,"key":"control","visible":true}}
data_manage/Bar Chart-results (1).html
Results: Bar Chart
The SGPlot Procedure
The SGPlot Procedure
data_manage/Bar Chart-results (2).html
Results: Bar Chart
The SGPlot Procedure
The SGPlot Procedure
data_manage/Bar Chart.ctk


Bar Chart
Bar charts compare numeric values or statistics between different values of a chart variable. Bar charts show the relative magnitude of data by displaying bars of varying height. Each bar represents a category of data.
A0EEED19-14E9-4AA8-9E3E-4F1B22B078F2
SGPLOT
3.7

SAS Studio Task Reference Guide


The SGPLOT Procedure


SAS Tutorials







Category:
Variable:
Subcategory:
URL variable:
Group analysis by:
Weight:




DATA
DATA
ROLES
ADDITIONAL ROLES
APPEARANCE

Include filter clause as footnote

Where clause:
TITLE AND FOOTNOTE
Title:
Font size (default: 14 pt):
Footnote:
Font size (default: 12 pt):
CHART ORIENTATION
Vertical
Horizontal

Options:
Display grouped bars:
Clustered side by side
Stacked on one another
Legend location:
Outside (default)
Inside

Measure:
Frequency count (default)
Frequency percent
Variable
Statistic:

Percent of the sum
Sum (default)
Mean
Error bars:
(none)
Both
Upper
Lower
Type:
Confidence limit (default)
Standard deviation
Standard error
Specify a statistic multiplier
Multiplier:
Confidence level:
95% (default)
99%
90%
Use custom value
Custom confidence level (%):

BARS
Show labels
Set color
Color
Color transparency:
0% (default)
25%
50%
75%
100%
Details
Apply color gradient
Effect:
Gloss
Sheen
Crisp
Matte
Pressed

CATEGORY AXIS
Reverse tick values
Show tick values in data order
Display label:
Label:
Rotate labels in case of tick collisions
Rotate by 45 degrees
Rotate by 90 degrees
Create a reference line
Reference value:
Reference value as label
Custom label
Label:
Line offset:
0 (default)
-0.5
-0.25
0.25
0.5
MEASURE AXIS
Show grid lines
Display label:
Display label:
Default label
No label
Statistics name in label
No statistics name in label
Custom label
Label:
Use logarithmic scale
Base value:
2
10 (default)
e
Axis baseline:
Rotate labels in case of tick collisions
Rotate by 45 degrees
Rotate by 90 degrees
Create a reference line
Reference value:
Reference value as label
Custom label
Label:
Specify minimum value
Minimum value:
Specify maximum value
Maximum value:
GRAPH SIZE
Units:
Inches (default)
Cms
Pixels
Width:
Height:
Width:
Height:
Width:
Height:





































































































































































































































































































































The maximum value of measure axis should be greater than or equal to the minimum value of measure axis.




##--Extract first item from list--##
#if($category.size()>0) #set($CAT=$category.get(0)) #end
#if($measure.size()>0) #set($RESP=$measure.get(0)) #end
#if($group.size()>0) #set($GRP=$group.get(0)) #end
#if($urlVar.size()>0) #set($URL=$urlVar.get(0)) #end
#if($byVar.size()>0) #set($BY=$byVar.get(0)) #end
#if($weight.size()>0) #set($WGT=$weight.get(0)) #end
##--Set Graph Size--##
#if($dimType=="in")
#set($WIDTH="$inchWidth$dimType")
#set($HEIGHT="$inchHeight$dimType")
#end
#if($dimType=="cm")
#set($WIDTH="$cmWidth$dimType")
#set($HEIGHT="$cmHeight$dimType")
#end
#if($dimType=="px")
#set($WIDTH="$pixelWidth$dimType")
#set($HEIGHT="$pixelHeight$dimType")
#end
##--Set output size--##
ods graphics / reset width=$WIDTH height=$HEIGHT imagemap;
#if($BY)
##--Sort data by BY variable--##
proc sort data=$dataSource out=_BarChartTaskData;
by $BY;
run;
#end
##--SGPLOT proc statement--##
proc sgplot data= #if($BY) _BarChartTaskData #else $dataSource #end#if($dataSource.getWhereClause()!="")(where=($dataSource.getWhereClause())) #end;
#if($BY)
##--BY Variable--##
by $BY;
#end
##--Build qouted strings for title and footnote to clean up any quotes provided by user--##
#if($titleString!="") #set($qTitleString=$CTMUtil.doubleQuoteString($titleString)) #end
#if($footnoteString!="") #set($qFootnoteString=$CTMUtil.doubleQuoteString($footnoteString)) #end
##--TITLE and FOOTNOTE--##
#set($pt="pt")
#if($titleString!="")
#set($titleSizePt="$titleSize$pt")
title height=$titleSizePt $qTitleString;
#end
#if($footnoteString!="" || ($dataSource.getWhereClause()!="" && $includeAsFootnote=='1'))
#set($footSizePt="$footnoteSize$pt")
#if($dataSource.getWhereClause()!="" && $includeAsFootnote=='1') footnote justify=left height=$footSizePt
#set($whereStr="$whereClause $dataSource.getWhereClause()") $CTMUtil.doubleQuoteString($whereStr);
#end
#if($footnoteString!="")
footnote2 justify=left height=$footSizePt $qFootnoteString;
#end
#end
#if(($useColor && $useColor=='1') || $barTrans!='0') #set ($SETATTRS='Yes')
#else #set($SETATTRS='No')
#end
##--Bar chart settings--##
#if($barOrient=='vertical') vbar $CAT
#else hbar $CAT
#end /
#if($RESP) response=$RESP #end
#if($URL) url=$URL #end
#if($GRP)
group=$GRP groupdisplay=$groupDisplay
#end
#if($SETATTRS=='Yes') fillattrs=(
#if($useColor=='1') color=$barColor #end
    #if($barTrans!='0') transparency=$barTrans #end)
#end
#if($barStatLabels=='1') datalabel #end
#if($barLimits)
#if($barLimits!='none')
limits=$barLimits limitstat=$barLimitStat
#if($barNumStdChkbox && $barNumStdChkbox=='1') numstd=$barNumStd #end
#if($barLimitStat=='clm')
## Calculate alpha from confidence level
#if($confLevelCombo=="confLevel99Choice") alpha=0.01 #elseif($confLevelCombo=="confLevel90Choice") alpha=0.10
#elseif($confLevelCombo=="confLevelCustomChoice") #if($confLevel!=95) alpha=%sysevalf((100-$confLevel)/100) #end
#end
#end
#end
#end
#if($fillGradient=='1') fillType=gradient #end
#if($RESP)
#if($statChoiceResponse!='sum') stat=$statChoiceResponse #end
#else
#if($measureCombo!='freq') stat=$measureCombo #end
#end
#if($barSkin!='none') dataskin=$barSkin #end
#if($RESP)
#if($displayRespLabel1=='statsLabel') statlabel
#elseif($displayRespLabel1=='noStatsLabel') nostatlabel
#end
#end
#if($logAxis=='1') baseline=$logBaseline #end
#if($WGT) weight=$WGT #end
;
#set($catAxis=0)
#if($displayCatLabel!='defaultLabel' || $sortByData=='1' || $catReverse=='1' || $catLabelRotation=='1') #set($catAxis=1) #end
#if($catAxis=='1')
##--Category Axis--##
#if($barOrient=='vertical') xaxis
#else yaxis
#end
#if($sortByData=='1') discreteorder=data #end
#if($catReverse=='1') reverse #end
#if($displayCatLabel=='noLabel') display=(nolabel) #end
#if($displayCatLabel=='customLabel') label=$CTMUtil.doubleQuoteString($catLabel) #end
#if($catLabelRotation=='1') valuesrotate=#if($catTickRotate=='catDiagonalRotation') diagonal #else vertical #end #end
;
#end

#set($respAxis=0)
#if(($measureCombo=='measureChoiceVar' && $measure.size()>0 && ($displayRespLabel1=='noLabel' || $displayRespLabel1=='customLabel'))
|| ($measureCombo!='measureChoiceVar' && ($displayRespLabel2=='noLabel' || $displayRespLabel2=='customLabel'))
|| $showRespGrid=='1' || $logAxis=='1' || $respLabelRotation=='1' || $respAxisMin=='1' || $respAxisMax=='1') #set($respAxis=1) #end
#if($respAxis=='1')
##--Response Axis--##
#if($barOrient=='vertical') yaxis
#else xaxis
#end
#if($respAxisMin=='1') min=$respAxisMinValue #end
#if($respAxisMax=='1') max=$respAxisMaxValue #end
#if($showRespGrid=='1') grid #end
#if(($measureCombo=='measureChoiceVar' && $measure.size()>0 && $displayRespLabel1=='noLabel') || ($measureCombo!='measureChoiceVar' && $displayRespLabel2=='noLabel')) display=(nolabel) #end
#if(($measureCombo=='measureChoiceVar' && $measure.size()>0 && $displayRespLabel1=='customLabel') || ($measureCombo!='measureChoiceVar' && $displayRespLabel2=='customLabel')) label=$CTMUtil.doubleQuoteString($respLabel) #end
#if($logAxis=='1')
type=log #if($logBaseCombo!='10') logbase=$logBaseCombo #end
#end
#if($respLabelRotation=='1') valuesrotate=#if($respTickRotate=='respDiagonalRotation') diagonal #else vertical #end #end
;
#end
#if($catRefLine=='1')
##--Category Reference Line--##
refline #if($CAT.get('type')=='Char') $CTMUtil.doubleQuoteString($catRefLineValue) #else $catRefLineValue #end /
#if($barOrient=='vertical') axis=x
#else axis=y
#end
lineattrs=(thickness=2 color=blue)
    #if($catRefLineOffset!='0') discreteoffset=$catRefLineOffset #end
    label#if($catRefLabel=='catRefCustom')=$CTMUtil.doubleQuoteString($catRefCustomLabel) #end
labelattrs=(color=blue)
;
#end
#if($respRefLine=='1')
##--Response Reference Line--##
refline $respRefLineValue /
#if($barOrient=='vertical') axis=y
#else axis=x
#end
lineattrs=(thickness=2 color=green)
    label#if($respRefLabel=='respRefCustom')=$CTMUtil.doubleQuoteString($respRefCustomLabel) #end
labelattrs=(color=green)
;
#end
#if($GRP && $legendLoc=='inside')
##--Legend Settings--##
keylegend / location=$legendLoc;
#end
run;
##--Clean up--##
ods graphics / reset;
#if($titleString!="") title; #end
#if($footnoteString!="" || ($dataSource.getWhereClause()!="" && $includeAsFootnote=='1'))
#if($dataSource.getWhereClause()!="" && $includeAsFootnote=='1') footnote; #end
#if($footnoteString!="") footnote2; #end
#end
#if($BY)
proc datasets library=WORK noprint;
delete _BarChartTaskData;
run;
#end

{"customLabel":"Custom label","catLabelRotation":"0","lower":"Lower","statsLabel":"Statistics name in label","respAxisMinValue":"","confLevelCombo":"confLevelDefaultChoice","inchWidth":"6.4","appearanceTab":"APPEARANCE","barTrans50Choice":"50%","catRefLineOffset50Choice":"0.5","barsGroup":"BARS","footnoteString":"","barTrans100Choice":"100%","catRefLineOffset":"0","confLevelCustomChoice":"Use custom value","graphSize":"GRAPH SIZE","logBaseCombo":"10","statChoiceResponse":"sum","orientation":"CHART ORIENTATION","barOrient":"vertical","respRefCustomLabel":"","catDiagonalRotation":"1","respAxisMin":"0","respAxisMax":"0","measureChoicePercent":"Frequency percent","category":"json:[{\"value\":\"Age\",\"type\":\"Numeric\",\"length\":8,\"format\":\"BEST12.\",\"informat\":\"BEST32.\",\"className\":\"RoleObject\"}]","horizontal":"0","titleString":"","includeAsFootnote":"0","cmWidth":"16","outside":"Outside (default)","sortByData":"0","respDiagonalRotation":"1","respVerticalRotation":"0","pressed":"Pressed","catRefValue":"1","groupedBarStyle":"Display grouped bars:","catRefLabel":"catRefValue","displayRespLabel1":"defaultLabel","CLM":"Confidence limit (default)","displayRespLabel2":"defaultLabel","title":"TITLE AND FOOTNOTE","cmHeight":"12","statChoicePercent":"Percent of the sum","catRefCustomLabel":"","sheen":"Sheen","sasOS":"Linux LIN X64 3.10.0-693.21.1.el7.x86_64","barLimits":"none","barTrans25Choice":"25%","catRefLineOffsetNegative50Choice":"-0.5","barTrans75Choice":"75%","statChoiceSum":"Sum (default)","dataGroup":"DATA","groupDisplay":"cluster","urlVar":"","barTransDefaultChoice":"0% (default)","barDetails":"Details","crisp":"Crisp","catRefLineOffsetNegative25Choice":"-0.25","gloss":"Gloss","catRefLine":"0","catTickRotate":"catDiagonalRotation","displayCatLabel":"defaultLabel","stdErr":"Standard error","respRefCustom":"0","logAxis":"0","stdDev":"Standard deviation","rolesGroup":"ROLES","whereClause":"Where clause:","vertical":"1","respTickRotate":"respDiagonalRotation","both":"Both","statChoiceMean":"Mean","respRefValue":"1","matte":"Matte","catRefLineOffset25Choice":"0.25","pixelWidth":"640","logBase2":"2","measureCombo":"freq","measureAxis":"MEASURE AXIS","logBase10":"10 (default)","pixelHeight":"480","barSkin":"none","catReverse":"0","none":"(none)","measureChoiceFreq":"Frequency count (default)","barNumStd":"1","byVar":"","sasVersion":"9.450000000000001","logBaseE":"e","logBaseline":"","respAxisMaxValue":"","dataTab":"DATA","defaultLabel":"Default label","confLevel99Choice":"99%","catRefLineValue":"","catLabel":"","subcatGroup":"Options:","confLevelDefaultChoice":"95% (default)","cm":"Cms","barColor":"#CAD5E5","respRefLineValue":"","catRefCustom":"0","stack":"0","dataSource":{"librarytable":"WORK.IMPORT"},"weight":"","measure":"","respRefLine":"0","upper":"Upper","inchHeight":"4.8","catRefLineOffsetDefaultChoice":"0 (default)","noStatsLabel":"No statistics name in label","barLimitStat":"clm","measureChoiceVar":"Variable","titleSize":"14","footnoteSize":"12","confLevel90Choice":"90%","useColor":"0","showRespGrid":"1","fillGradient":"0","noLabel":"No label","barTrans":"0","respLabel":"","dimType":"in","in":"Inches (default)","barStatLabels":"0","respRefLabel":"respRefValue","respLabelRotation":"0","px":"Pixels","catVerticalRotation":"0","legendLoc":"outside","inside":"Inside","barNumStdChkbox":"0","cluster":"1","confLevel":"95","addRolesGroup":"ADDITIONAL ROLES","categoryAxis":"CATEGORY AXIS","group":"","statisticGroup":"Statistic:"}
data_manage/Box Plot-results (1).html
Results: Box Plot
The SGPlot Procedure
The SGPlot Procedure
data_manage/Data Exploration-results.html
Results: Data Exploration
The SGScatter Procedure
The SGScatter Procedure
data_manage/Data Exploration.ctk


Data Exploration
The Data Exploration task provides graphs that can be used to explore relationships among selected variables.
e5929f47-9122-4e2d-9a87-f490dbd02db2
SGSCATTER UNIVARIATE FREQ BOXPLOT
3.7

SAS Studio Task Reference Guide


The SGSCATTER Procedure


The UNIVARIATE Procedure


The FREQ Procedure


The BOXPLOT Procedure


SAS Tutorials






Continuous variables:
Classification variables:
Group analysis by:




DATA
DATA
ROLES
ADDITIONAL ROLES
PLOTS
NOTE
Plots are available when you specify at least one continuous variable or two classification variables if no continuous variable is selected.
SCATTER PLOT MATRIX
Scatter plot matrix
Add histograms
Add normal density curve
Add kernel density estimate
Add prediction ellipses
Probability level:
Scatter plot matrix grouped by {1}
Scatter plot matrix
PAIRWISE SCATTER PLOTS
Pairwise scatter plots
Add a prediction ellipse
Probability level:
{1} vs {2}
{1} vs {2} grouped by {3}
REGRESSION SCATTER PLOTS
Regression scatter plots
Select response variables
Add a fitted line
Add a loess fit
Add a fitted, penalized B-spline curve
MOSAIC PLOT
Mosaic plot
Square mosaic plot
Specify colors of the mosaic plot tiles:
Row variable levels
Pearson residuals
Standardized residuals
HISTOGRAM
Histogram
Add normal density curve
Add kernel density estimate
Add inset statistics
BOX PLOT
Comparative box plot
Available when classification variable is specified.
HISTOGRAM AND BOX PLOT
Histogram and box plot
Available when no classification variable is specified.
Add inset statistics

Scatter plot matrix macro
Pairwise scatter plot macro
Regression scatter plot macro
Mosaic plot macros
Histogram and box plot template
Histogram (one-way or two-way)
By group histogram (one-way or two-way)
One-way histogram
One-way histogram of each class variable
Two-way histogram
Get group variable values.
Build proc sql where clause.
Subsetting the data set.
Build plot group info.
Create by group Histogram.









































































































































































Select at least one continuous variable or two classification variables if no continuous variable is selected.


Click to select at least one plot.


Select at least one response variable.



options validvarname=any;
#if($showScatterplotMatrix=="1" || $showPairwiseScatterplot=="1" || ($showRegressionScatterplot=="1" && $responseVar.size()>0) || $showMosaicPlot=="1" || $showComboPlot=="1" || $showHistogram=="1" || $showBoxPlot=="1")
ods noproctitle;
#end
#if($showScatterplotMatrix=="1" || $showPairwiseScatterplot=="1" || ($showRegressionScatterplot=="1" && $responseVar.size()>0) || $showComboPlot=="1" || $showHistogram=="1" || $showBoxPlot=="1")
ods graphics / imagemap=on;
#end
#if($showScatterplotMatrix=="1" || $showPairwiseScatterplot=="1" || $showRegressionScatterplot=="1" || $showMosaicPlot=="1" || $showComboPlot=="1" || $showHistogram=="1")
#if($continuousVariable.size()>=1 || $classVariable.size()>=1)
#if($byVariable.size()>0)
proc sort data=$dataset out=WORK.TempSorted5215;
by #foreach($item in $byVariable) $item #end;
run;
#end
#end
#end
######## Scatter plot matrix macro ########
#if($showScatterplotMatrix=="1")
/* $scatterPlotMatrixMsg */
%macro scatterPlotMatrix(xVars=, title=, groupVar=);
## Scatterplot matrix is used to explore relationships among numerical variables ##
#if($byVariable.size()>0)
proc sgscatter data=WORK.TempSorted5215;
#else
proc sgscatter data=$dataset;
#end
matrix &xVars /
%if(&groupVar ne %str()) %then %do;
group=&groupVar legend=(sortorder=ascending)
%end;
#if($diagonalHistogram=="1" || $diagonalKernel=="1" || $diagonalNormal=="1")
diagonal=(#if($diagonalHistogram=="1") histogram #end #if($diagonalKernel=="1") kernel #end #if($diagonalNormal=="1") normal #end)
#end
#if($scatterplotMatrixEllipse=="1")
ellipse=(type=predicted #if($scatterplotMatrixProbLevel && $scatterplotMatrixProbLevel!="") alpha=$MathTool.div($MathTool.sub(1000,$MathTool.mul(1000, $MathTool.toDouble($scatterplotMatrixProbLevel))),1000) #end)
#end;
title &title;
#if($byVariable.size() > 0)
by #foreach($item in $byVariable) $item #end;
#end
run;
title;
%mend scatterPlotMatrix;
#end
######## Scatter plot matrix macro ########
######## Pairwise scatter plot macro ########
#if($showPairwiseScatterplot=="1")
/* $pairwiseScatterPlotMsg */
%macro pairwiseScatterplot(xVar=, yVar=, title=, groupVar=);
#if($byVariable.size()>0)
proc sgscatter data=WORK.TempSorted5215;
#else
proc sgscatter data=$dataset;
#end
plot (&yVar)*(&xVar) /
%if(&groupVar ne %str()) %then %do;
group=&groupVar legend=(sortorder=ascending)
%end;
#if($pairwiseScatterplotEllipse=="1")
ellipse=(type=predicted #if($pairwiseScatterplotProbLevel && $pairwiseScatterplotProbLevel!="") alpha=$MathTool.div($MathTool.sub(1000,$MathTool.mul(1000, $MathTool.toDouble($pairwiseScatterplotProbLevel))),1000) #end)
#end;
title &title;
#if($byVariable.size() > 0)
by #foreach($item in $byVariable) $item #end;
#end
run;
title;
%mend pairwiseScatterplot;
#end
######## Pairwise scatter plot macro ########
######## Regression scatter plot macro ########
#if($showRegressionScatterplot=="1" && $responseVar.size()>0)
/* $regScatterPlotMsg */
%macro regressionScatterplot(xVar=, yVar=, title=, groupVar=);
#if($byVariable.size()>0)
proc sgscatter data=WORK.TempSorted5215;
#else
proc sgscatter data=$dataset;
#end
plot (&yVar)*(&xVar) /
%if(&groupVar ne %str()) %then %do;
group=&groupVar legend=(sortorder=ascending)
%end;
#if($regFitScatter=="1") reg #end #if($loessFitScatter=="1") loess=(lineattrs=(pattern=ShortDash)) #end #if($splineFitScatter=="1") pbspline=(lineattrs=(pattern=MediumDash)) #end;
title &title;
#if($byVariable.size() > 0)
by #foreach($item in $byVariable) $item #end;
#end
run;
title;
%mend regressionScatterplot;
#end
######## Regression scatter plot macro ########
######## Mosaic plot macros ########
#if($showMosaicPlot=="1" && $classVariable.size()>2)
/* $mosaicPlotMsg */
%macro byGroupMosaic(rowVar=, colVar=);
ods graphics / imagemap=off;
#if($byVariable.size()>0)
proc freq data=WORK.TempSorted5215;
#else
proc freq data=$dataset;
#end
ods select MosaicPlot;
tables (&rowVar)*(&colVar) / plots=mosaicplot(
#if($mosaicTileColor=="pearsonResidu")
colorstat=pearsonres
#elseif($mosaicTileColor=="stdResidu")
colorstat=stdres
#end
#if($sqaureMosaic=="1") square #end
);
#if($byVariable.size() > 0)
by #foreach($item in $byVariable) $item #end;
#end
run;
ods graphics / imagemap=on;
%mend byGroupMosaic;
#end
#if($showMosaicPlot=="1" && $classVariable.size()>2)
%macro mosaicPlot(numClassVars=, classVarList=);
%local i j colVar rowVar;
## Mosaic plot for class variables ##
%if(&numClassVars>=2) %then %do;
%do i=1 %to %eval(&numClassVars-1);
%do j=%eval(&i+1) %to %eval(&numClassVars);
%let colVar=%scan(%str(&classVarList), &i, %str( ));
%let rowVar=%scan(%str(&classVarList), &j, %str( ));
%byGroupMosaic(rowVar=&rowVar, colVar=&colVar);
%end;
%end;
%end;
%mend mosaicPlot;
#end
######## Mosaic plot macro ########
######## Histogram and Box plot template ########
#if($showComboPlot=="1")
## GTL to construct combination histogram/boxplot ##
/* $histBoxTemplateMsg */
proc template;
define statgraph histobox;
dynamic AVAR;
begingraph;
entrytitle "Distribution of " eval(catq('q',colname(AVAR)));
layout lattice / rows=2 columndatarange=union
rowgutter=0 rowweights=(0.75 0.25);
layout overlay / yaxisopts=(offsetmax=0.1) xaxisopts=(display=none);
layout gridded / columns=2 border=on autoalign=(topright topleft);
#if($comboPlotInset=="1")
%let _lft = halign=left;
%let _rgt = halign=right;
entry &_lft "Mean"; entry &_rgt eval(strip(put(mean(AVAR), best.)));
entry &_lft "Std Dev"; entry &_rgt eval(strip(put(stddev(AVAR), best.)));
entry &_lft "N"; entry &_rgt eval(strip(put(n(AVAR), best.)));
#end
endlayout;
histogram AVAR /;
endlayout;
layout overlay /;
BoxPlot Y=AVAR / orient=horizontal;
endlayout;
endlayout;
endgraph;
end;
run;
#end
######## Histogram and Box plot macro ########
######## Histogram macros ########
#if($showHistogram=="1")
/* $histogramMsg */
%macro DEHisto(data=, avar=, classVar=);
%local i numAVars numCVars cVar cVar1 cVar2;
%let numAVars=%Sysfunc(countw(%str(&avar), %str( ), %str(q)));
%let numCVars=%Sysfunc(countw(%str(&classVar), %str( ), %str(q)));
%if(&numAVars>0 & &numCVars>0) %then %do;
%if(&numCVars=1) %then %do; ## one class variable ##
%let cVar=%scan(%str(&classVar), 1, %str( ), %str(q));
proc sql noprint;
select count(distinct &cVar) into :nrows from &data;
quit;
/* $onewayHistMsg */
## One-way histogram ##
proc univariate data=&data noprint;
var &avar;
class &cVar;
histogram &avar / nrows=&nrows
#if($normalDensityPlot == "1") normal(noprint)#end
#if($kernelDensityPlot == "1") kernel#end
;
#if($histogramInset == "1") inset mean std n / position=ne; #end
run;
%end;
%else %do; ## two class variables ##
/* $onewayHistClassMsg */
## One-way histogram of each class variable ##
%do i=1 %to %eval(&numCVars);
%let cVar=%scan(%str(&classVar), &i, %str( ), %str(q));
proc sql noprint;
select count(distinct &cVar) into :nrows from &data;
quit;
proc univariate data=&data noprint;
var &avar;
class &cVar;
histogram &avar / nrows=&nrows
#if($normalDensityPlot == "1") normal(noprint)#end
#if($kernelDensityPlot == "1") kernel#end
;
#if($histogramInset == "1") inset mean std n / position=ne; #end
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here