i need it before 13 may
Microsoft Word - proj2.docx Project2PLpgSQL Pleasemakesureyouhavesubmittedallthequestionsbefore: Fri18May,5:00pm 1. Aims Thisprojectaimstogiveyoupracticein • readingandunderstandingamoderatelylargerelationalschema(MyMyUNSW) • implementingSQLqueriesandviewstosatisfyrequestsforinformation • implementingPLpgSQLfunctionstoaidinsatisfyingrequestsforinformation Thegoal is tobuildsomeusefuldataaccessoperationsontheMyMyUNSWdatabase.Athemeof thisprojectis"dirtydata".AsIwasbuildingthedatabase,usingacollectionofreportsfromUNSW's informationsystemsandthedatabasefortheacademicproposalsystem(MAPPS),Idiscoveredthat thereweresome inconsistencies inpartsof thedata (e.g.duplicateentries in the table forUNSW buildings,orstudentswhowerementionedinthestudentdata,buthadnoenrolmentrecords,and, worse,enrolmentrecordswithmarksandgradesforstudentswhodidnotexistinthestudentdata). I removedmostof theseproblemsas Idiscovered them,butnodoubtmissedsome.Someof the exercisesbelowaimtouncoversuchanomalies;pleaseexplorethedatabaseandletmeknowifyou findotheranomalies. 2. Howtodothisproject: • readthisspecificationcarefullyandcompletely • familiarizeyourselfwiththedatabaseschema(description, SQLschema, summary) • makeaprivatedirectoryforthisproject,andputacopyoftheproj2.sqltemplatethere • youmustusethecreatestatementsinproj2.sqlwhendefiningyoursolutions • lookattheexpectedoutputsintheexpected_qXtablesloadedaspartofthecheck.sqlfile • solveeachoftheproblemsbelow,andputyourcompletedsolutionsintoproj2.sql • checkthatyoursolutioniscorrectbyverifyingagainsttheexampleoutputsandbyusing thecheck_qX()functions • testthatyourproj2.sqlfilewillloadwithouterrorintoadatabasecontainingjusttheoriginal MyMyUNSWdata • double-checkthatyourproj2.sqlfileloadsinasinglepassintoadatabasecontainingjustthe originalMyMyUNSWdata 3. Introduction AllUniversitiesrequireasignificantinformationinfrastructureinordertomanagetheiraffairs.This typically involves a large commercial DBMS installation. UNSW's student information system sits behind the MyUNSW web site. MyUNSW provides an interface to a PeopleSoft enterprise managementsystemwithanunderlyingOracledatabase.Thisback-endsystem(Peoplesoft/Oracle) isoftencalledNSS. UNSW has spent a considerable amount ofmoney ($80M+) on theMyUNSW/NSS system, and it handlesmuchoftheeducationaladministrationplausiblywell.Mostpeoplegripeaboutthequality oftheMyUNSWinterface,butthesystemdoesallowyoutocarryoutmostbasicenrolmenttasks online. Despiteitssuccesses,however,MyUNSW/NSSstillhasanumberofdeficiencies,including: • nowaitinglistsforcourseorclassenrolment • norepresentationfordegreeprogramstructures • poorintegrationwiththeUNSWOnlineHandbook Thefirstpoint is inconvenient,since itmeansthatenrolment intoafullcourseorclassbecomesa sequence of trial-and-error attempts, hoping that somebody has dropped out just before you attempttoenrolandthatno-oneelsehasgrabbedtheavailablespot. The second point prevents MyUNSW/NSS from being used for three important operations that wouldbeextremelyhelpfultostudentsinmanagingtheirenrolment: • findingouthowfartheyhaveprogressedthroughtheirdegreeprogram,andwhatremains tobecompleted • checkingwhataretheirenrolmentoptionsfornextsemester(e.g.getalist ofsuggestedcourses) • determiningwhentheyhavecompletedalloftherequirementsoftheirdegreeprogramand areeligibletograduate NSScontainsdataaboutstudent,courses,classes,pre-requisites,quotas,etc.butdoesnotcontain any representation of UNSW's degree program structures. Without such information in the NSS database,itisnotpossibletodoanyoftheabovethree.So,in2007theCOMP9311classdeviseda datamodelthatcouldrepresentprogramrequirementsandrulesforUNSWdegrees.Thiswasbuilt on top of an existing schema that represented all of the core NSS data (students, staff, courses, classes,etc.).TheenhanceddatamodelwasnamedtheMyMyUNSWschema. The MyMyUNSW database includes information that encompasses the functionality of NSS, the UNSWOnlineHandbook, and theCATS (roomallocation) database. TheMyMyUNSWdatamodel, schemaanddatabasearedescribedinaseparatedocument. 4. SettingUp ToinstalltheMyMyUNSWdatabaseunderyourGriegserver,simplyrunthefollowingtwo commands: $createdbproj2 $psqlproj2-f/home/cs9311/web/18s1/proj/proj2/mymyunsw.dump Ifeverythingproceedscorrectly,theloadoutputshouldlooksomethinglike: SET ...awholebunchofthese SET CREATEEXTENSION COMMENT SET CREATEDOMAIN CREATEDOMAIN CREATEDOMAIN CREATEDOMAIN CREATETYPE CREATEDOMAIN ...awholebunchofthese CREATEDOMAIN CREATEFUNCTION CREATEFUNCTION SET SET CREATETABLE ...awholebunchofthese CREATETABLE ALTERTABLE ALTERTABLE ...awholebunchofthese ALTERTABLE Pleasecarefully investigateanyoccurrenceofERRORduringtheload.Thedatabaseloadingshould takelessthan60secondsonGrieg,assumingthatGriegisnotunderheavyload.(Ifyouleaveyour project until the last minute, loading the database on Grieg will be considerably slower, thus delayingyourworkevenmore.Thesolution:atleastloadthedatabaseRightNow,evenifyoudon't startusingitforawhile.)(Notethatthemymyunsw.dumpfileis50MBinsize;copyingitunderyour homedirectoryoryour/srvrdirectoryisnotagoodidea). If you have other large databases under your PostgreSQL server on Grieg or you have large files under your/srvr/YOU/directory, it is possible that you will exhaust your Grieg disk quota. In particular,youwillnotbeabletostoretwocopiesof theMyMyUNSWdatabaseunderyourGrieg server.Thesolution:removeanyexistingdatabasesbeforeloadingyourMyMyUNSWdatabase. If you are running PostgreSQL at home, the fileproj2.tar.gzcontains copies of the files:mymyunsw.dump,proj2.sqlto get you started. You can grab thecheck.sqlseparately, once it becomes available. If you copyproj2.tar.gzto your home computer, extract it, and perform commandsanalogoustotheabove,youshouldhaveacopyoftheMyMyUNSWdatabasethatyou canuseathometodothisproject. A useful thing to do initially is to get a feeling forwhat data is actually there. Thismay help you understandtheschemabetter,andwillmakethedescriptionsoftheexerciseseasiertounderstand. Lookattheschema.Asksomequeries.Doitnow. Examples... $psqlproj2 ...PostgreSQLwelcomestuff... proj2=#\d ...lookattheschema... proj2=#select*fromStudents; ...lookattheStudentstable... proj2=#selectp.unswid,p.namefromPeoplepjoinStudentsson(p.id=s.id); ...lookatthenamesandUNSWidsofallstudents... proj2=#selectp.unswid,p.name,s.phonefromPeoplepjoinStaffson(p.id=s.id); ...lookatthenames,staffids,andphone#sofallstaff... proj2=#selectcount(*)fromCourse_Enrolments; ...howmanycourseenrolmentrecords... proj2=#select*fromdbpop(); ...howmanyrecordsinalltables... proj2=#...etc.etc.etc. proj2=#\q You will find that some tables (e.g.Books,Requirements, etc.) are currently unpopulated; their contents arenotneeded for this project. Youwill also find that there are anumberof viewsand functionsdefinedinthedatabase(e.g.dbpop()andtranscript()fromabove),whichmayormaynot beusefulinthisproject. SummaryonGettingStarted Tosetupyourdatabaseforthisproject,runthefollowingcommandsintheordersupplied: $createdbproj2 $psqlproj2-f/home/cs9311/web/18s1/proj/proj2/mymyunsw.dump $psqlproj2 ...runsomecheckstomakesurethedatabaseisok $mkdirProject2Directory ...makeaworkingdirectoryforProject2 $cp/home/cs9311/web/18s1/proj/proj2/proj2.sqlProject2Directory Theonlyerrormessagesproducedbythesecommandsshouldbethosenotedabove.Ifyouomitany ofthesteps,thenthingswillnotworkasplanned. Notes Readthesebeforeyoustartontheexercises: • themarksreflecttherelativedifficulty/lengthofeachquestion • usethesuppliedproj2.sqltemplatefileforyourwork • youmaydefineasmanyadditionalfunctionsandviewsasyouneed,providedthat(a)the definitionsinproj2.sqlarepreserved,(b)youfollowtherequirementsineachquestionon whatyouareallowedtodefine • makesurethatyourquerieswouldworkonanyinstanceoftheMyMyUNSWschema; don'tcustomizethemtoworkjustonthisdatabase;wemaytestthemonadifferent databaseinstance • donotassumethatanyquerywillreturnjustasingleresult;evenifitphrasedas"most"or "biggest",theremaybetwoormoreequally"big"instancesinthedatabase • youarenotallowedtouselimitinansweringanyofthequeriesinthisproject • whenqueriesaskforpeople'snames,usethePeople.namefield;it'stherepreciselyto producedisplayablenames • whenqueriesaskforstudentID,usethePeople.unswidfield;thePeople.idfieldisan internalnumerickeyandofnointeresttoanyoneoutsidethedatabase • unlessspecificallymentionedintheexercise,theorderoftuplesintheresultdoesnot matter;itcanalwaysbeadjustedusingorder by. Infact,ourcheck.sqlwillorderyour resultsautomaticallyforcomparison. • thepreciseformattingoffieldswithinaresulttupledoesmatter;e.g.ifyouconverta numbertoastringusingto_charitmaynolongermatchanumericfieldcontainingthesame value,eventhoughthetwofieldsmaylooksimilar • developqueriesinstages;makesurethatanysub-queriesorsub-joinsthatyou'reusing actuallyworkcorrectlybeforeusingtheminthequeryforthefinalview/function Eachquestionispresentedwithabriefdescriptionofwhat'srequired.Ifyouwantthefulldetailsof theexpectedoutput,takealookattheexpected_qXtablessuppliedinthecheckingscript. 5. Tasks Note that the mymyunsw.dump used in project 2 is different from that used in project 1, please confirm that you load the correct database when you start your work. Q1 (8 marks) You may use any combination of views, SQL functions and PLpgSQL functions in this question. However, you must define at least a PLpgSQL function called Q1. Please write a PLpgSQL function Q1(course_id integer) that takes a course_id as parameter and outputs two numbers: (1) among all the rooms in UNSW, the number of rooms which can carry all the students enrolled in this course (i.e., rooms.capacity >= the total number of students enrolled in this course); (2) among all the rooms in UNSW, the number of rooms which can carry all the students enrolled in this course and also carry all the students in the enrolment waitlist of this course (i.e., rooms.capacity >= the total number of students enrolled in this course + the total number of students in the enrolment waitlist of this course). You should use the following type definition and function header: create type RoomRecord as (valid_room_number integer, bigger_room_number integer); create or replace function Q1(course_id integer) returns setof RoomRecord... Note: (1). If a given course_id is not a valid UNSW course id, please throw an exception ‘INVALID COURSEID’. (2). All the students enrolled (in a given course) should be counted even though his/her mark is null. Sample results (details can be found in check.sql): proj2=#select * from q1(52491); Q2 (8 marks) You may use any combination of views, SQL functions and PLpgSQL functions in this question. However, you must define at least a PLpgSQL function called Q2. Please write a PLpgSQL function Q2(staff_id integer) that takes a staff’s id as parameter and returns all teaching records of the given staff. Each transcript tuple should contain the following information: cid, term, code, name, uoc, avg_mark, highest_mark, median_mark, totalEnrols. You should use the following type definition for the transcript tuples: create type TeachingRecord as ( cid integer, -- course ID term char(4), -- semester code (e.g. 98s1) code char(8), -- UNSW-style course code (e.g. COMP1021) name text, -- corresponding subject name of the course uoc integer, -- units of credit of this course average_mark integer, -- the average of marks of this course highest_mark integer, -- the highest mark of this course median_mark integer, -- the median of marks of this course totalEnrols integer, -- the total number of students enrolled in this course with non-null mark ); Note: (1). If a given staff_id is not a valid UNSW staff id, please throw an exception ‘INVALID STAFFID’. (2). The average_mark and median_mark should be rounded to the nearest integer. (3). A student whose mark is null should not be counted when you calculate the average_mark, highest_mark, median_mark and totalEnrols. (4). Ignore the teaching record where totalEnrols = 0. (5). Return null value if the uoc is null. You should use the following and function header: create or replace function Q2(staff_id integer) returns setof TeachingRecord... Sample results (details can be found in check.sql): proj2=#select * from q2(50413833); Q3 (9 marks) You may use any combination of views, SQL functions and PLpgSQL functions in this question. However, you must define at least a PLpgSQL function called Q3. Given the id of an organizational unit, please write a PLpgSQL function Q3 (org_id integer, num_courses integer, min_score integer) to help the UNSW administrative officers to find out all the students satisfying the following: (1) he/she has taken more than num_courses courses offered by the given organization (note that a student may fail a course and take it again, thus we need to count duplicate courses); (2) he/she has got score no less than min_score for at least one course offered by the given organization. Each tuple should include: • His/Her unswid (should be taken from People.unswid) • His/Her student_name (should be taken from People.name) • His/Her course_records course_records of a student is a concatenation of several records with ‘,’. Each record is about a course he/she has taken and is offered by the given organization. Each record should include the code of the course (Subjects.code), the name of the course (Subjects.name), the semester that he/she has learned this course (Semesters.name), the name of the organization that offers the course (OrgUnits.name), and the score he/she got (Course_enrolments.mark). Note: (1). If a given OrgUnits.id is not a valid UNSW organization id, please throw an exception ‘INVALID ORGID’. (2). A given organization may have lots of sub-organizations. For example, the faculty of engineering has 9 schools, such as biomedical engineering and CSE, and each school may have several divisions or departments. You need to include all the sub-organizations recursively. (3). Course records of a student should be sorted in descending order of score (null score value should always be sorted to the end and displayed as ‘null’). If two or more course records have the same score, they should be sorted in ascending order of Courses.id. Records should be concatenated by ‘\n’, so that each record will be displayed in a separate line. (4). Please note that, for course_records, there is a space after each ‘,’. All text fields are verified with exact text matching. (5). Only the first five course records sorted in above order should be displayed. If the course records of a student are less than five, all the course records should be displayed. You should use the following type definition and function header: create type CourseRecord as (unswid integer, student_name text, course_records text); create or replace function Q3(org_id integer, num_courses integer, min_score integer) returns setof CourseRecord... Sample results (details can be found in check.sql): proj2=# select * from q3(52, 35, 100); Note that PostgreSQL uses a ‘+’ character to indicate an end-of-line in its