TheVLDBJournalcSpringer-Verlag1996TheGMAP:aversatiletoolforphysicaldataindependence
OdysseasG.Tsatalos1,MarvinH.Solomon2,YannisE.Ioannidis2
12
IBMAlmadenResearchCenter,650HarryRd,SanJose,CA95120-6099,USA;e-mail:odysseas@almaden.ibm.com
ComputerSciencesDepartment,UniversityofWisconsin-Madison,1210W.DaytonStreet,Madison,WI53706-1685,USA;e-mail:{solomon,yannis}@cs.wisc.edu
EditedbyMatthiasJarke,JorgeBocca,CarloZaniolo.ReceivedSeptember15,1994/AcceptedSeptember1,1995
Abstract.Physicaldataindependenceistoutedasacentralfeatureofmoderndatabasesystems.Itallowsuserstoframequeriesintermsofthelogicalstructureofthedata,lettingaqueryprocessorautomaticallytranslatethemintooptimalplansthataccessphysicalstoragestructures.Bothrelationalandobject-orientedsystems,however,forceuserstoframetheirqueriesintermsofalogicalschemathatisdirectlytiedtophysicalstructures.Wepresentanapproachthatelim-inatesthisdependence.Allstoragestructuresaredefinedinadeclarativelanguagebasedonrelationalalgebraasfunctionsofalogicalschema.Wepresentanalgorithm,integratedwithaconventionalqueryoptimizer,thattranslatesqueriesoverthislogicalschemaintoplansthataccessthestoragestruc-tures.Wealsoshowhowtocompileupdaterequestsintoplansthatupdateallrelevantstoragestructuresconsistentlyandoptimally.Finally,wereportonexperimentswithapro-totypeimplementationofourapproachthatdemonstratehowitallowsstoragestructurestobetunedtotheexpectedorob-servedworkloadtoachievesignificantlybetterperformancethanispossiblewithconventionaltechniques.
Keywords:Indexing–Physicaldatabasedesign–Materi-alizedviews–Physicaldataindependence
1Introduction
Physicaldataindependenceisusuallydescribedastheabil-itytowritequerieswithoutbeingconcernedwithhowthedataareactuallystructuredondisk.Incurrentdatabasesys-tems(DBMSs),however,queriesaretiedtologicalcon-structs,suchasrelations,classextents,orobjectsets,thatcloselytrackthephysicalorganizationofdata.Inarelationaldatabase,forexample,eachrelationisusuallystoredasafile,perhapswithaprimaryindex.Thedatabaseadministra-torcanimproveperformancebyaddingsecondaryindicesorbyspecifyingtheclusteringoffiles,butmoreextensive
Correspondenceto:OdysseasG.Tsatalos
improvementsrequiremodifyingthelogicalschema,forex-amplebyde-normalizingtables.Suchmodificationsnecessi-taterewritingqueriesand,thus,physicaldataindependenceislost.
Ourgoalistoimprovephysicaldataindependencebydecouplingphysicaldecisionssuchasclusteringandrepli-cationfromthelogicaldatamodel,sothatthephysicalorga-nizationcanbealteredwithoutchangingthelogicalschemaorquerieswrittenagainstit.Amoresubtlebenefitisthatitplacesawiderrangeofpossibilitiesfordataorganizationatthedisposalofthedatabaseadministrator.Forexample,thefactthatthedataaredescribedinatraditionalnormalizedrelationalschemashouldnotprecludeareplicated,nestedphysicalorganization,ifthatorganizationwouldachievebetterperformancefortheanticipatedmixofqueriesandupdates.
Assumethatdataarestoredinfilesofrecords,possiblyimplementedbyanindexstructuresuchasaB+-tree.Insteadofrequiringaone-to-onecorrespondencebetweenlogicaldataconstructsandphysicalstoragestructures(e.g.,relation↔file),weallowthecontentsofeachfiletobedefinedasafunctionofthelogicalschema,specifiedbyarestrictedrelational-algebraexpression.Wecallthecombinationofafileanditsdefinitionagmap(pronounced“gee-map”andanacronymforgeneralizedmultilevelaccesspath).Inthesim-plestcases,gmapscorrespondtotraditionalstoragestruc-turessuchasanunorderedfileofthetuplesinarelationoranindexonthatfile.gmaps,however,canalsobeusedtopartitionthedatabaseverticallyandhorizontallyandaddmultipleaccesspaths,generalizingpathindices.Sincegmapsareallowedtocontainoverlappingdata,theycanalsocap-tureredundantstoragestructures.Gmapsareinvisibleatthelogicallayer,sotheirdefinitionsaffectonlytheperformanceofqueriesandnottheirsemantics.
Inthispaper,werestrictbothgmapdefinitionsandqueriestoproject-select-join(psj)expressionsoverasim-plesemanticdatamodel.Wedemonstratethatsuchexpres-sionsarepowerfulenoughtoexpressmostconventionalstor-agestructures,aswellasmore“exotic”techniquessuchaspathindices(BertinoandKim19;MaierandStein1986),fieldreplication(KatoandMasuda1992;ShekitaandCarey19),andmore.Wepresentanalgorithmtotranslateuser
102
Dnamenameworks-inFadvisesenrolledareanameteachesattendsSyearnameIsAlevelCattendedlevelassistsTAFig.1.Thelogicalschema
queries,expressedaspsj-queriesoverstructuresinthelogi-calschema,intorelationalexpressionsoverthegmaps.Wealsoshowhowthistranslationcanbeintegratedintoacon-ventionalqueryoptimizer.
Oneofthebenefitsofourapproachisthatgmapsmaystoreredundantdatatoimprovetheperformanceofqueries.Thus,updatesmayneedtochangemultiplegmapsinacon-sistentmanner.Weshowhowasimplemodificationofthequerytranslationalgorithmcanproduceplanstoperformtheseupdates.Wealsodemonstratehowthisflexibilitycanbeusedinseveralotherareas,e.g.,accelerationofbulkload-ingofthedatabaseandaccelerationofupdatesofcomplexobjects.
Allofthealgorithmspresentedinthispaperhavebeenimplementedinaprototypesystem.Wereportonexperi-mentswithatestdatabasethatillustratesthatforaplausiblemixofqueriesandupdates,ourtechniquesallowthephysi-calrepresentationtobetunedtoprovidebetterperformancethanwhatcouldbeachievedthroughstandardrelationalorobject-orientedmethods.Weconcludethispaperbyiden-tifyingtheareasthatneedfurtherdevelopmentbeforeourtechniquecanbewidelyusedinapracticalsetting.2Thegmapdefinitionlanguage
Inthissection,weintroduceourdatamodelandthecorre-spondingdatadefinitionlanguage(DLL).TheDDLhastwoparts,thelogicalDDL,whichdefinesthelogicalschemacapturingtheconceptualorganizationofthedata,andthephysicalDDL,whichdefinesthestoragestructurescontain-ingthedatathatinstantiatethelogicalschema.Wepresentthemodelintwonotations,asemanticone(resemblingtheERmodel)andaformalrelationalone.Thetwonotationsareequivalent;thesemanticnotationismoreintuitiveasauserinterface,butallofouralgorithmsmanipulatetherelationalformsofschemas.
2.1Thelogicaldatadefinitionlanguage
Inthesemanticnotation,schemasaredepictedasgraphs.Throughoutthispaperweillustrateourapproachwithanexampledatabasedescribingauniversityanditspersonnel(seeFig.1).ThetextualformoftheschemaisgiveninAp-pendixAusingODL(Catel1993).
NodesinFig.1representdomains,andsolidedgesrepre-sentrelationshipsbetweenthem.Leavesrepresentprimitive
domainssuchasintegers,characterstrings,orrealnum-bers.Internalnodesrepresentdomainspopulatedwithiden-titysurrogates(tupleorobjectidentifiers).Inourexampleschema,thesedomainsareDept(department),Faculty,Student,Course,andTA(teachingassistant).Toreduceclutterinthefigures,thesedomainnamesareabbreviatedtotheirinitials.Functionaldependenciesareindicatedbyarrowheads.Inclusiondependencies(formallydefinedinSect.3)canalsobeexpressedbutarenotshownforsimplicity.IsAassociationsaredenotedbydashedarcspointingtothesu-pertype.Forourpurposes,theyaresimplyrelationshipswithcertainfunctionalandinclusiondependenciesimpliedbyde-fault.AnameoftheformD.disusedtodenotebothaprim-itivedomainanditsrelationshiptoaninternaldomain.Forexample,Course.levelnamesbothaprimitivedomainofintegersanditsrelationshiptotheCoursedomain.
Intherelationalformofthedatamodel,eachedgeofaschemagraphfromdomainAtodomainBisrepresentedasabinaryrelationwithattributesAandB.Becauseofthiscorrespondence,weoftenusetheterm“attribute”asasynonymfor“domain”(nodeinthegraph)andtheterm“baserelation”asasynonymfor“relationship”(edge).Ouralgorithmsoperateonthe(binary)relationalformoftheschema,sotheyapplytoanysemanticmodelthatcanberepresentedbybinaryrelationswithfunctionalandinclusiondependencies.
2.2Thephysicaldatadefinitionlanguage
Inoursystem,allphysicalstoragestructuresaredefinedasgmaps.Agmapconsistsofasetofrecords(thegmapdata),aquerythatindicatesthesemanticrelationshipsamongtheattributesoftheserecords(thegmapquery),andadescrip-tionofthedatastructureusedtostoretherecords(thegmapstructure).Althoughtheactualdatabasestoresgmapdataratherthanthebaserelations,thegmapdatamaybethoughtofastheresultofrunningthegmapqueryonthebaserela-tions.
GmapqueriesareexpressedinasimpleSQL-likelan-guage.Forexample,thegmap
def_gmapcs_faculty_by_areaasbtreebygivenFaculty.areaselectFaculty
whereFacultyworks_inDeptand
Dept=cs_oidstoresasetofpairscontainingFacultyidentifiersandthecorrespondingareanames.Onlyfacultymembersinthecomputersciencedepartment(identifiedbytheconstantcsoid)areincluded.ThegmapstructureisaB+-treein-dexedbyFaculty.area.Theentirebyclausedefinesthegmapquery.Attributesfollowingthegivenandselectkeywordsarecalledinputandoutputattributes,respectively,andthepredicateDept=csoidiscalledaselection.In-putattributesformthesearchkeyforgmapstructuresthatallowassociativeaccess.
Thegmapquerycanalsobeexpressedgraphicallyasasubgraphoftheschemagraphcalledthequerygraph(seeFigs.2–9).ShadededgescorrespondtorelationshipsorIsAassociationsexplicitlymentionedinthewhereclauseor
implicitlymentionedaspartofprimitiveattributenames.Inputattributesareindicatedbysmallarrows,andoutputattributesareindicatedbydoublecirclesaroundnodes.Re-strictionsaredescribedasannotationsonthecorrespondingnodes.
Eachqueryexpressibleinthislanguageisequivalenttoarestrictedpsjqueryontherelationalformofthelogicalschema:
Q=πAσS(R1onR2on···onRn)
Intheaboveexample,Q=πF,F.areaσD=cs
oid(F.area
onworksin).
Expressiblequeriesobeythefollowingrestrictions:1.theyarerange-restricted,i.e.,allattributesinSandA
areattributesoftherelationsR2.eachattributeoftherelationsRi
intheprojectionlistA
iappearsatmostonce3.selectionsareconjunctionsofcomparisons(=,>,≥,<,≤)betweenattributesandconstants
4.joinsarenatural,i.e.,onlyattributeswiththesamenamearejoinedandallattributesmaintaintheirnameintheresult.Inparticular,self-joinsarenotallowed.Intheremainderofthepaper,weusethetermpsj-querytorefertoaquerythatconformstotheserestrictions.2.3Thequerylanguage
Weoftenusetheterm“logicalquery”torefertoqueriesposedonthelogicalschema.Inthispaper,weconsideronlylogicalquerieswritteninthesamelanguagethatisusedforgmapqueries.Thatis,theymustberestrictedpsj-queries.Inaddition,eachquerymustbetranslatableintoapsj-queryovergmapsorprojectionsofthem.Thus,wedonothan-dlecaseswherelogicalqueriesneedtobetranslatedintounionsorarbitrarysequencesofpsj-queries.Notethattrans-latedlogicalqueriesinvolverelationswitharbitraryarity(thegmapdata),whilegmapqueriesinvolvebinaryrelationsonly(correspondingtorelationships).2.4Examples
gmapscanbeusedtodefinearbitraryphysicalrepresenta-tions,includingthoseofaconventionalnormalizedrelationaldatabase,anobject-orienteddatabase,oranycombinationofthetwo.
Toillustratetheobject-orientedapproach,supposewewanttoclustertogetherallinformationabouteachfacultymember.GiventheobjectidentifierofaFacultyobject,weshouldbeabletoretrievepersonalinformationaswellastheobjectidentifiersofthefacultymember’sdepartment,advisees,andcoursestaught.
Agmapthatmeetsthesespecificationsmaybedefinedasfollows(Fig.2):
def_gmapfaculty_dataasheapby
givenFacultyselectStudent,Dept,Course,Faculty.area,Faculty.name
103
Dnameworks-inFareaadvisesSteachesCTAFig.2.Thefacultyclassextent
DFareaSCTAFig.3.Afacultyindexon”area”
whereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCourseAsecondaryindexinarelationalsystemcanalsobedefinedeasilyinourlanguage.Forexample,anindexonthefacultyareaisdefinedasfollows(Fig.3):
def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultyNotethattheindexisnotdefinedintermsofthepreviousgmap,aswouldbethecaseinarelationaldatabase,butintermsofthelogicalschema.
Inthefacultydataexample,itmightbedesirabletoincludeinafacultymember’srecordthedepartmentnameinadditiontothedepartmentid,becauseforexample,thede-partmentnameisfrequentlyprintedalongwiththenameofthefacultymember.Thedepartmentnameis,inthiscase,anestedattributeoftheFacultydomain.Thisessen-tiallyimplementsfieldreplication(KatoandMasuda1992;ShekitaandCarey19),whichhasbeenshowntoofferseveraladvantages.Theonlychangenecessaryistoadd“Dept.name”totheselectclause.Onlyaslightmodi-ficationoftheearliergmapdefinitionisrequired(Fig.4):def_gmapfaculty_field_replasheapbygivenFaculty
selectFaculty.name,Faculty.area,Dept,Student,Course,Dept.namewhereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCourseSimilarly,supposeapplicationsfrequentlyaskforthelistingofthefacultyofaspecificdepartment.Inthiscaseweneedafastaccesspathfromthedepartmentnameto
104Dnamenameworks-inFareaadvisesSteachesCTA Fig.4.ReplicationofanestedattributeDworks-innameFSCTA Fig.5.Anestedindexthefacultydomain.Astructureforacceleratingsuchpathexpressionsiscalledanestedindex(MaierandStein1986;BertinoandKim19),whichallowsindexingonanestedattributeofadomain.Suchanindexiseasilyspecifiedasagmap(Fig.5):
def_gmapfaculty_nested_indexasbtreebygivenDept.nameselectFacultywhereFacultyworks_inDept.Inthepreviousexamples,thegmapdataincludedallFacultyinstances.However,therearecaseswherewefrequentlyaccessonlysomeinstancesofadomain.Object-orientedsystemsthatstoreinstancesinexplicitcollectionsratherthanclassextents(Careyetal.1988;MaierandStein1986;Orensteinetal.1992)allowthecreationofcollectionindices,whichprovidefastaccesspathsonlytothesubsetsofthedomainsthatareincludedinthecollection.Ourgmapdefinitionlanguageispowerfulenoughtoexpresssuchin-dicesbyusingrestrictions.Forexample,ifwewouldliketomodifythepreviouslydefinedindexonfacultyareasothatitkeepsonlydataforfacultyinthecomputersciencedepartment,thedefinitionwouldbeasfollows(Fig.6):def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultywhereFacultyworks_inDeptand
Dept=cs_oidwherecsoidistheobjectidentifierofthecomputersci-encedepartment.
Fieldreplicationandpathindexingtechniquestypicallyimposerestrictionsrelatedtothelogicalschema.Forexam-ple,infieldreplication,thenestedattributeisrequiredtobeanestedpartoftheobjectinwhichitisreplicated.Fur-thermore,theobjectmustfunctionallydeterminethenested
D = csFworks-inareaSCTAFig.6.Acollectionbasedindex
Dnameworks-inFnameareaadvisesSteachesCTA Fig.7.Replicationofanon-functionalnestedattribute
attribute.Similarrestrictionsholdforpathindexing.Ourno-tionofqueriesenablesonetodescribeanyschemasubgraph,independentlyoftheedgeproperties.Theextrafreedomal-lowsspecificationofseveralusefulnovelstructures:–Astoragestructurethatreplicatesnestedattributesthatarenon-functionallydetermined.Forexample,wecanextendthefacultydatastructure(Fig.2)byrepli-catingthenamesoftheadvisedstudents(Fig.7):def_gmapfaculty_field_with_snamesasheapbygivenFaculty
selectFaculty.name,Faculty.area,Dept,Student,Course,Student.namewhereFacultyworks_inDeptand
FacultyadvisesStudentandFacultyteachesCourse–Acrossbetweenapathindexandanindexonacom-positekey.Itallowseachcomponentofthekeytobesuppliedbyaseparatepath.Forexample,thefollowinggmapbuildsanindexthatmapsarea/course-levelpairstofacultyinthatareathatteachessuchcourses(Fig.8):def_gmapfaculty_multi_fieldasbtreebygivenFaculty.area,Course.levelselectFaculty
whereFacultyteachesCourse.–Anarbitrarydecompositionofdataintheinheritancehierarchy.Forexample,thefollowinggmapclustersateachingassistant’snametogetherwithotherteachingassistantattributesthatdonotpertaintoarbitrarystu-dents(Fig.9):
def_gmapta_with_snameasheapbygivenTA
DFareaSteachesClevelTA Fig.8.AcompositekeypathindexDFSnameIsAClevelassistsTA Fig.9.AverticallydecomposedsubclassextentselectTA.level,Course,Student,Student.namewhereTAisaStudentand
TAassistsCourse
Acompletetaxonomyofexistingindexingschemesandotheradvancedstoragestructuresthatcanbedefinedbyus-inggmapsispresentedelsewhere(TsatalosandIoannidis1994).
3Querytranslation
Inthissectionwepresentthetranslationalgorithmofalog-icalqueryintoqueriesovergmaps.Wefirstintroducesomeadditionalnotationanddefinitions,andalsodiscusstwoaux-iliaryproblemsthatariseaspartofquerytranslation.3.1Notation
Forconvenience,πQσn···onweRrepresentalogicalpsj-queryQ=
k)asatripleQQisQ(Rthe1seto
r,Qs,Qpofsets.isrofjoinedbinaryrelations{Rtheselectionpredicateσa1,...,Rsetofsimplerk},Qspredicates(comparisonsbetweenQrepresentedasattributesandconstants),andQpisthesetofattributesretainedbytheprojectionπWecallthesetQ,anditsmemberstar-Q.getattributes.ConsiderpthequerytargetforexamplethefollowingqueryQ(Fig.6):
givenFaculty.areaselectFacultywhereFacultyworks_inDeptand
Dept=cs_oidQisapsj-query:ifwecallRabandRbcthebaserela-tionscorrespondingtotheFaculty.areaandworksin
105
relationships,respectively,anda,b,ctheattributescorre-spondingtothedomainsFaculty.area,FacultyandDept,respectively,thenwecanrewritethequeryasQ=
πresentedabσc=csasoid(Rtheabtriple:no
Rbc).Usingournotation,Qcanberep-{Rab,Rbc},{c=csoid},{a,b}.(RecallthatFaculty.areaisoverloadedtoreferbothtoanattributeandtoarelationship.)
GivenaqueryQ,wefrequentlydealwithitspartthatincludesonlyrelationsinasetR,anditspartthatincludesonlyrelationsnotinR.ThesearedenotedQ[rel∈R]andQ[rel∈R]respectively.Forexample,Q[rel∈{Rb},{c=csoid},{Rσbc}]={subsetofQbc}=πbc=csoidRbc.Similarly,thesthatmentionsonlyattributesinasetAisdenotedQs[attr∈A].Finally,thesetofattributesinasetofrelationsRisdenotedA(R).3.2Definitions
Thenaturaljoinoftwopsj-queriesPandQ,denotedPonQ,isthenaturaljoinoftheirresultrelations.Theadd-joinoftwopsj-queriesPandQ,denotedP⊕Q,isthepsj-queryP⊕Q=Pr∪Qr,Ps∪Qs,Pp∪Qp.Theadd-joindiffersfromthenaturaljoininthatitnaturallyjoinsthetwoqueryresultsbeforeanyprojectionhastakenplace.Theprojectionstepisappliedafterthejoinandretainsallcolumnsthatwouldhavebeenretainedbyeitherquery.Thus,thesamesetofcolumnsasinthenaturaljoinareproduced.
Notethat,ingeneral,ifweonlyhavethequeryresultsinhand,theadd-joincannotbecalculatedsinceitmayrequireknowledgeofsomeofthemissingcolumns.Considerfor
examplethequeriesQThenaturaljoinofQ1=RabandQ2=πac(Rabno
Rbc).whiletheiradd-joinis1andQsimplyR2isexpressionswillproducedifferentabno
RabresultsRonπac(RabonRbc),
bc.Ingeneralthetwoandtheadd-joinmaynotevenbeproduciblefromtheresultsofQ(sincethebattributeismissingfromtheresult1andQaloneofQ2LetR,Rrelationswithacommonattribute2).b.AninclusionabbcbetwodependencyfromRRabtoRbcexists,denotedRab.b⊆bc.b,ifeveryvalueofbinRabappearsalsoinRbc.
3.3Queryequivalence
Whentranslatingalogicalqueryintoaqueryovergmaps,weoftenneedtotesttheequivalenceofpsj-queries.Twopsj-queriesQ1andQ2areequivalent,denotedQ1≡Qproducethesameresultforanyvalidinstanceof2,iftheythedatabaseschema.Equivalencetestingofarbitraryconjunc-tivequeries,evenwithouttakingintoaccountanydependen-cies,isNP-complete(Ahoetal.1979;ChandraandMerlin1977).Ontheotherhand,wecanefficientlycomparetwopsj-queriessyntacticallytoseeiftheyareidentical(uptotrivialdifferencessuchastheorderingofthejointerms).Thisisasufficientconditionforequivalence,whichweuseinourtranslationalgorithm.Wearealsointerestedintwospecialcasesofequivalencetesting,wherepsj-queriesofspecificformsareinvolvedandvarioustypesofdependen-ciesaretakenintoaccount.Thesearediscussedinthenexttwosubsections,wheresufficientconditionsforequivalenceareprovided.
106
3.3.1Coverage
AqueryQcoversasetofrelationsRifQ[rel∈R]≡πA(R)(Q)
(1)
i.e.,whenthesubqueryinvolvingonlytherelationsinRisnotaffectedbyRab≡πab(RIngeneral,theableft-handno
theRrestofthequery.sideof(1)nisRForexample,if
bc),thenRaboabccovers{Rsupersetofab}.theright-handside.When(1)holds,thepartofthequerythatinvolvesrelationsnotinR(relationRbcinourexample)hasnoeffectonπA(R)(Q),inthesensethatitdoesnotfilteroutanytuplesproducedbytherestofthequery.Anal-gorithmthatimplementsnecessaryandsufficientconditionsfortestingcoverageispresentedelsewhere(Tsatalos1994).Thealgorithmmakesuseoftheinclusiondependenciesoftheschema.
3.3.2Naturaljoinversusadd-join
Ingeneral,ifQ.However,1andQinthe2aretwopsj-queries,Q1⊕Q2⊆Qstraints,1onQ2Q1⊕QqueriesQ=R2≡Q1no
presenceQofcertainintegritycon-2.Forexample,considerthe1abandQ2=πac(RabonRbc).Ingeneral,QQ1onQ2=Rab1⊕Q2=RnonRπac(RabonRbc)isnotequivalentto
bc.If,however,bisfunctionallyde-terminedbyaabino
Rab(thatis,aisakeyforRab)thetwojoinsareequal.Intuitively,theinformation“lost”bypro-jectingawaythebattributeinπac(RabonRbc)onQcanbecompletelyrecoveredfromtheremainingattributes(ainthiscase).
Detectingwhenthenaturaljoinoftwopsj-queriesisequivalenttoapsj-queryisveryimportantinourquerytrans-lationalgorithm,sinceitallowsustorewritethejoinoftwogmaps(whicharepsj-queries)asapsj-query.Thealgorithmiterativelyperformssuchrewritingsinordertoexpressthejoinofseveralgmapsasapsj-query,whichisthencheckedsyntacticallyforequivalencewiththeuserquery.Asasuf-ficientconditionforguaranteeingthatthenaturaljoinoftwoqueriesisapsj-query,wetestifitisequivalenttotheadd-joinofthequeriesinquestion.Analgorithmthatim-plementsasufficientconditionfortestingthisequivalenceispresentedelsewhere(Tsatalos1994).3.4Querytranslationalgorithm
Belowwepresentanalgorithmtotranslatealogicalpsj-queryintoqueriesovergmaps.ForthesakeofclaritywedefermostefficiencyconsiderationstoSect.4.
Algorithm1.Givenapsj-queryQandasetofpsj-queriesG,findsubsets{G1,...Q≡πQpσQs(πA(Qr)G1no
,G·n·}·on⊆πG,suchthat
A(Qr)Gn)1.letH={G∈GsuchthatGGp∩A(Qr∩Gr)=/∅and
2.Qs[attr∈A(Gr)]=s[attr∈A(Qr)]∪Qs[attr∈G3.p]and
GcoversQr}
4.foreachsubset{G1,...,Gn}ofHdo
5.letS={πA(Qr)σQsG1,...,πA(Qr)σQsGn}6.whilethereisG,H∈SsuchthatGonH≡G⊕H7.replaceGandHinSby8.
ifS={Q′
GonH
}whereQ≡πQp(Q′)thenaccept{G}asasolution
1,...,GnThealgorithmfirstnarrowsdownitssearchspacetogmapsthathavesomethingtodowiththequery(lines1–3).Morespecifically,agmapmusthaveatleastonerelationincommonwiththequery,withatleastoneattributeofthatrelationincludedinthegmapresult(line1);thequeryselectionsonattributesofthegmaprelationsmustbeeitheronthetargetattributesofthegmap(sothattheycanbeappliedonthem)oridenticaltoselectionsthatthegmapitselfhas(line2);thegmapmustcovertherelationsthatithasincommonwiththequery(otherwise,thegmapwillnothavealltheinformationneededbythequery;line3).
Eachpossiblesubsetoftherelevantgmaps(line4)givesrisetoasinglecandidatetranslation(assumingthatselec-tionsarealwayspushedthroughthejoins):πQp(πA(Qr)σQsG1no
···onπA(Qr)σQsGn)(2)
Therestofthealgorithmtestswhetherornotthisquery
expressionisequivalenttothegivenlogicalquery.Eachjoinoperandin(2)isaprojectionandaselectiononagmap.Sinceweverifiedearlier(line2)thatthequeryselectionscanbepushedthroughthegmapprojections,thejoinoperandsarepsj-queries.Thealgorithmtriestoexpresstheirjoinasapsj-queryaswell.Thejoinoperandsarescanned(line5)andanypairwhosenaturaljoinisequivalenttoitsadd-join(line6)isreplacedbyasinglejoinoperandwhichisagainapsj-query(line7).Thesetofjoinoperandsthuskeepsreducing.Atsomepoint,thesetcannolongerbereduced,eitherbecausethereisjustonepsj-queryleftorbecausethereisnopairthatsatisfiestheequivalencetest(line6).Intheformercase,theremainingpsj-queryisequivalenttotheinitialjoinexpression(2)afterperformingonefinalprojectionstep(πQp)andcanbesyntacticallycheckedforequivalence(line8)withthelogicalquery.Inthelattercase,thesubsetchoseninline4isrejected.
Proposition1.Givenapsj-queryQandasetofpsj-queriesG,eachsubset{G1,...,Gn}⊆GgeneratedbyAlgorithm1,satisfiesthecondition:
Q≡πQpσQs(πA(Qr)G1no
···onπA(Qr)Gn)BecausethereareexponentiallymanysubsetsofH,Al-gorithm1requiresexponentialtime.However,checkingif
agivensubsetofgmapscanformasolution(lines5–8)takespolynomialtime.Inthenextsection,weshowhowwecanrunthealgorithminconjunctionwithaconventionaloptimizertoavoidenumeratingallsubsets.
Anexamplemayhelpillustratethealgorithm.ConsideraqueryQthatasksforthenamesanddepartmentid’sofstudentsattendingall500-levelcourses:
def_queryQbyselectStudent.name,DeptwhereStudentattendsCourse
andStudentenrolledDeptandCourse.level=500
Inourformalrepresentation,
Qr={attends,enrolled,Course.level},Qs={Course.level=500}and
Qp={Student.name,Dept}.
Supposethedatabaseconsistsofthreegmaps:anindexGG1fromthenamesofstudentstotheirdepartments,anindex2fromthenamesofstudentstocoursesthattheyattendandthelevelsofthosecourses,andanindexGleveltocoursesatthatlevel,togetherwith3fromacourse-thedepartmentsthatsupplystudentstothosecourses.Theirdefinitionsfollow:def_gmapG1asbtreeby
givenStudent.nameselectDeptwhereStudentenrolledDeptdef_gmapG2asbtreeby
givenStudent.nameselectCourse,Course.level
whereStudentattendsCoursedef_gmapG3asbtreeby
givenCourse.levelselectDept,Course
whereStudentattendsCourseandStudentenrolledDept
Allthreegmapsarerelevanttothequery(theypassthetestsoflines1–3).Consider,forexample,G=Gasatriple,G={Student.name,Course,Course.2.Writtenlevel},∅,{Student.name,attends}.Line1issat-isfiedbytheattributeCoursewhichappearsbothinGandintherelationshipattendssharedbyGQpline2,notethatGattr∈Grandr.ForCourse.levels=∅,andQ=500s[}.Tocheckp]=Qthats[attr∈A(Gr)]={Gcov-ersQr(line3),notethatG[Qr]isGmodifiedbythere-movalofStudent.name(bothasanattributeinGarelationshipinGpandaname,G[Qr).IfweassumethateveryStudenthasr]isequivalenttoπA(Qr)(G),whichistheresultofprojectingouttheStudent.nameattributefromG.
ThealgorithmconsiderssubsetsoftherelevantgmapsG1,G2,G3.Consider,forexample,thesubset{Gcandidatesolutioncorrespondingtothiscombination1,G2}.TheisthenaturaljoinofG1andG=500followed2followedbyaselec-tionforCourse.levelbyaprojection.Theloopoflines6and7willbeexecutedoncewhetherGStudent.name1no
Gtocheck
2≡Gfunctionally1⊕G2.ThistestwillfailunlessdeterminesStudent;oth-erwise,twotuplesthatjoinontheStudent.nameneednotjoinontheirStudentidaswell.IfStudent.namefunctionallydeterminesStudent,thenthejoinontheStudentidisirrelevant:wecanprojectoutthatattributebeforeperformingthejoin,whichimpliesthattheadd-joinisequivalenttothenaturaljoin(line6).Assumingthatthisisthecase,line8ofAlgorithm1eventuallyconcludesthatthecandidatesolutionisacorrectone.
Followingthesameprocess,thealgorithmrejectsthesubsets{G1,G3}and{G2,Gadd-joinandnaturaljoinaredifferent.3},becausethecorrespondingHowever,thecombi-nation{G1,G2,G3}isacorrectsolution.Duringthecourse
107
oftheloopoflines6and7,thealgorithmtestsallpairsofgmapsinthissubsettocheckwhetherornottheiradd-joinisequivalenttotheirnaturaljoin.Aswesaw,allthepairsfail,exceptforG(G1⊕G2.Inthenextiteration,thepair(G1⊕G2,G3)isconsideredandthealgorithmverifiesthat
1⊕G2)no
GInterestingly,3≡(Gthesolution1⊕G2)⊕Gusing3.allthreegmapsislikelytobemoreefficientthantheonethatusesonlyGbecausetheindexonCourse.levelinGaccelerate1andG2,theselectioninthequery.Thenextsection3willshowshowagmap-equippedoptimizeridentifiesandprunestheinferiorplan.
4Integrationwithaqueryoptimizer
ThepresentationofAlgorithm1emphasizesclarityattheexpenseofefficiency.Itimpliesthatallsubsetsofthegmapsareenumeratedinrandomorderandeachistestedtoseeifitprovidesasolutiontotheequation.Allsubsetsthatpassthetestarefeasibleplans.Theversionofthealgorithmthatisactuallyimplementedbyoursystemisconsiderablymoresophisticated.Itisintegratedwithaconventionaldynamic-programmingqueryoptimizer(Selingeretal.1979),whichcontrolstheorderinwhichsubsetsareevaluatedandusescostinformationandintermediateresultstoprunethesearchspace.
Aconventionaldynamic-programmingoptimizeritera-tivelyfindsoptimalaccessplansforincreasinglylargerpartsofaquery.Wefollowthesestepsinmoredetail,showingateachstepwhatneedstobechangedforagmap-equippeddatabase(Table1).WethenidentifythepiecesofAlgo-rithm1thatcorrespondtothesechanges.Inwhatfollows,forsimplicity,weavoidanydiscussionof“interestingor-ders”(Selingeretal.1979).Wealsousethetermcompletesolutiontorefertoagmapaccessplan(i.e.,aspecificse-quenceofjoins,togetherwiththemethodusedforeachjoin)thatisequivalenttothelogicalquery,andpartialsolutionforagmapaccessplanthatcouldpotentiallybeenhancedtobecomeacompletesolution.Apartialsolutiondoesnotnec-essarilyhavetobeapsj-query;itmaybethatnoreorderingofitsjoinsmakesthemequivalenttoadd-joins.
Likeaconventionaloptimizer,thegmapoptimizeronlyattemptstojoinapartialsolutionwithgmapsthatsharepro-jectedattributeswithit,thusavoidingCartesianproducts.EachstepinthegmapoptimizercorrespondstoapartofAlgorithm1.Step(a1)ofthefirstiterationcorrespondstolines1–3ofAlgorithm1;itfindsallgmapsthatarerelevanttothequery.Theremainingstepsofalliterationsrepresenttherestofthealgorithm.Movingfromiterationtoiteration,step(c)ofeachiterationcorrespondstoaspecificimple-mentationofline4,whereallsubsetsofrelevantgmapsthatarenotprunedonthewayareexploredinincreasingsize.Afterthefirstiteration,step(a1)formsthejoinsofthesesubsets(solutions)andstep(a2)correspondstolines6–8,wherethesesolutionsareexaminedforcompleteness.Step(a2)canbeimplementedincrementallytakingintoaccounttheresultsofearlieriterationsonsmallerpartialsolutions.Notethatstep(b)ofeachiterationhasnocounterpartinAlgorithm1becauseitdealsonlywithpruningthesearchspaceandnotwithtranslation.Implementingthisstepis
108
Table1.Stepbystepcomparisonofaconventionaloptimizerversusonedesignedforagmap-equippeddatabaseConventionaloptimizerIteration1
Foreachqueryrelation:
a)Findallpossibleaccesspaths.
b)Comparetheircostandkeeptheleastexpensive.
c)Ifthequeryinvolvesonlyonerelation,stop.c)Iftherearenopartialsolutions,stop.Iteration2
Foreachqueryjoin:
a)Considerjoiningtherelevantaccesspathsfoundinthepreviousiterationusingallpossiblejoinmethods.
b)Comparethecostoftheresultingjoinplansandkeeptheleastexpensive.
c)Ifthequeryinvolvesonlytworelations,stop.Iteration3···
Iteration2
a1)Considerjoiningallpartialsolutionsfoundinthepreviousiterationwithanothergmapusingallpossiblejoinmethods.
a2)Distinguishpartialandcompletesolutionsamongresultingjoins.b)Compareeachnewlygeneratedsolutiontoallothersolutions.Ifanysinglegmaporgmapcombinationhasneitheragreatercontributiontothequerynoralowercostthananother,pruneit.c)Iftherearenopartialsolutions,stop.Iteration3···
gmapoptimizerIteration1
a1)Findallgmapsthatarerelevanttothequery.
a2)Distinguishbetweenpartialandcompletesolutionsamongthem.b)Compareallpairsofgmaps.Ifonehasneitheragreatercontributiontothequerynoralowercostthantheother,pruneit.
notstraightforwardbecauseitinvolvesnotonlythecostbutalsothecontributionofsolutionstothequery.Contributionsofpartialsolutionscanbecomparedonthebasisoftheirpiecesthatcorrespondtopsj-queriesandthesetofattributesintheirresult.Wheneachpieceofapartialsolutionhassubsetsoftherelationsandprojectedattributesofapieceofanotherpartialsolution,thentheformercontributeslessandcan,therefore,beremovedfromfurtherconsiderationifitalsohasahighercost.Querysignatures,anencodingofthenamesofalltherelationsusedbythequery,canbeusedtoperformthesecomparisonsefficiently(Finkelstein1982).Itisinterestingtoseehowthenewalgorithmbehaveswhenitisgivenasetofgmapsthatrepresentsatraditionalrelationalphysicalschema.Assume,forexample,thatonegmapisafilecontainingtheextentoftheFacultyrelationwithallassociatedattributes,
def_gmapfaculty_relationasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept
whereFacultyworks_inDeptwhileanothergmapisasecondaryindexontheFaculty.areafield,
def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultyAssumethatthelogicalqueryrequeststhenamesofallfacultyinthedatabasearea.Duringthefirstiterationbothgmapsareconsidered.Scanningtherelationextentwouldbefarmoreexpensivethanaccessingtheindex,butthetwosolutionsarenotcomparable.Sincetheindexsimplyre-turnsFacultyids,itisnotadequatetoanswerthequery,whiletheextentis.Duringtheseconditeration,theindex(theonlypartialsolutionleft)isconsideredforajoinwiththeFacultyextent.ThejoinwouldbelessexpensivethanscanningtheFacultyextent,whilebothplansareequiv-alenttothelogicalquery.Thus,thesolutionfoundduringthefirstiterationiseliminatedinthesecond.Atthispoint,
thereisnopartialsolutionleftandthealgorithmends.Thisexampledemonstratesthataccessplansthatareprunedinaconventionaloptimizerarealsoprunedinitsenhancedver-sion.However,sinceanaccessplanconsideredatiterationnintheconventionalversionmaycombinemorethanngmaps,itmaybeconsideredatalateriterationintheen-hancedversion,thusdelayingpotentialprunings.Ingeneral,weexpecttheperformanceofthemodifiedoptimizertobesimilartotheperformanceoftheconventionalone.Ourex-perienceobtainedbyusingtheoptimizerfortheexamplesshowninSect.8supportstheprediction.5Updatepropagation
Relationalsystemsmitigatedependenciesbetweenthelogi-calandthephysicalschemathroughtheuseofstoredqueriesdefining“virtual”relationscalledviews,andusersexpresstheirqueriesintermsoftheseviews.Withthisapproach,thelogicalschemabecomesa(relational)functionofthephys-icalschema.Viewupdates,however,aredifficultorimpos-sibletosupport.Theusualsolutionistorequireupdatestobeexpressedintermsoftheunderlyingschema.
Ourapproachistheinverseoftheabove.Wedefinethephysicalstructuresasfunctionsofthelogicalschema.Althoughquerytranslationbecomesmorecomplicated,wehaveshownabovethatitisstillfeasibleandcanbeinte-gratedwiththeoptimizationstageofaconventionalsystem,addinglittleoverheadtothepreparationofqueryplansandnooverheadtotheexecutionofthoseplans.Inaddition,up-datesbecomemuchsimpler.Translatingthemintothephys-icalschematurnsintothematerializedviewmaintenanceproblem,whichadmitssimplesolutions.
Asdiscussedelsewhere(Blakeleyetal.19),propagat-ingupdatesintomaterializedviewsrequirestheexecutionofqueriesoverthebaserelationsandtheinsertedordeletedtuples.However,herewedonotnecessarilyhavethebaserelationsstored,andtheactualdatamaybereplicatedinmanyplaces.
Inthis,sectionwefirstdescribehowalogicalupdatecanbespecified,andtherestrictionsthatitmustobey.Thenweshowhowthelogicalupdatecanbetranslatedintoaphysicalupdateovergmaps,andhowthistranslationcanbeintegratedwiththequeryoptimizertoofferageneralpur-poseupdatepropagationmechanism.Finally,weillustratetheupdatepropagationprocessusinganexample.5.1Specifyingupdates
Insertionsarespecifiedbysupplyingaquery(theupdatequery)andasetoftuplestobeinserted(theupdatedata),correspondingtothetargetattributesofthequery.Thedatabasemustbeupdatedinsuchawaythatthechangeintheresultsofthequerybetweentheoriginalandupdateddatabaseispreciselythesetoftuplesintheupdatedata.Deletionsaredefinedsimilarly,withtherolesof“original”and“updated”databasereversed.NotethedifferencefromthequeryusedwhenspecifyingupdatesinSQL-likelan-guages,inwhichthequeryisusedtogeneratetheupdatetuples.Thequeryheredescribesonlythe“schema”ofthetuples.Sincetheupdatedatacanbetheresultofanotherquery,nogeneralityislost.
Forexample,studentscanbecomeenrolledincoursesbysupplyingasetof(StudentId,CourseId)pairsandtheupdatequery
def_queryenroll_studentasselectStudent,Course
whereStudentattendsCourse
Describingtheschemaoftheupdateusingaquery,al-lowsustosupportupdatesthatdonotdependonaspecificphysicalschema.Forexample,relationaldatabasesallowtheinsertionofentityinstancesbecausetheyassumetheexis-tenceofanentityextent.Object-orientedsystemsalsohavesimilarrestrictions.Inourcase,bydescribingtheupdateasalogicalqueryweavoidanyphysicaldatadependence.How-ever,itisisnotourintentiontosupportupdatesdefinedbyarbitraryqueries,sincethatcouldleadintotheviewupdateproblem.Weonlywanttousethequerytodefineapatternonthelogicalschemawhichrepresentsthe“schema”oftheupdate.Thusweneedtoimposethefollowingrestrictions:Noselections.Ifselectionswereallowed,theycouldonlysupplyinformationthatwouldberedundanttoorin-consistentwiththeupdatedata.Forexample,theupdatequerydef_querydefine_undergraduate_coursesasselectCourse,Course.name,Course.level
whereCourse.level<700merelystatesthatalltuplesintheupdatedatahavealevelfieldcontainingavaluelessthan700.
Noprojections.Allattributesofthequeryrelationsmustbeincludedinitstarget.Forexample,aninsertionrequestdefinedbythequery
def_queryadd_graduate_studentasselectStudent.year,CoursewhereStudentattendsCourse
109
andtheupdatedata{[5,cs7]}requeststhataftertheupdate,thereshouldbeanadditional5th-yearstudenttakingthecoursewiththeindicatedid,butgivesinsuf-ficientinformationtoconstructsuchastudent.Suchanupdatehasambiguoussemanticsandthusshouldnotbeallowed.
Notuple-generatingdependencies.Therelationdefinedby
theupdatequeryshouldnotsatisfyanytuple-generatingdependenciesUllman1988).Forexample,therelationdefinedbytheupdatequery
def_queryassign_work_to_facultyasselectFaculty,Student,CoursewhereFacultyadvisesStudent
FacultyteachesCoursesatisfiesthemultivalueddependencyFaculty→→Student.IfthefacultymemberFwasteachingcourseC1beforetheupdate,thetuple[F,S,CS,C2]cannotbeaddedwithoutalsoadding[F,includedonlythefirstofthesetuples,1].Iftheupdatedatatherewouldbenowaytoperformtheupdateaccordingtothesemanticsdefinedabove.
Evenwiththeserestrictions,wecannotcompletelyguar-anteethevalidityofanupdate.Forexample,ifwehavenotdefinedanygmaptostoreFacultydata,aninsertionofaFacultytuplewouldbeinvalid.IfweuseasinglegmaptostoretheFacultyandtheDeptdata,thenwewillnotbeabletoinsertafacultymemberwhodoesnotworkinanydepartment.Althoughthesemanticsoftheupdatequerydependsonlyonthelogicalschema,itsvaliditymaydependonthechoiceofgmapsusedtodefinethephysicalschema.Inparticular,thephysicalschemamusthavesufficient“in-formationcapacity”toholdtheinserteddata(Milleretal.1993).Theseissuesarenotaddressedinthispaperandareasubjectoftheauthors’ongoingresearch.Furthermore,wedonotaddressthequestionofwhethertheupdateconformswiththeintegrityconstraintsofthedatabase.Specificgmapstructuresorgmapcombinationsmayhelpacceleratetestingtheintegrityconstraints.However,here,weassumethatanexternalmechanismverifiesthattheupdatesdonotconflictwithexistingfunctionalorinclusiondependencies.5.2Updatetranslation
ConsideranupdatequeryUandagmapqueryGthathave
somerelationsincommon,i.e.,U/∅.Ourgoaliscalculatetheeffectoftheupdater∩Gonther=
todatastoredinG.Giventherestrictionsthatweimposedontheupdatequery,notablythefactthattheupdatequeryUmaynotcontainanyprojectionsorselections,wecaneasilyfactorGasG=πGpσGs(Uno
I)(3)
whereIisthepartofthegmapthatisirrelevanttothe
update:
I=Gr−Ur,Gs[A(Gr−Ur)],A(Gr−Ur)∩(Gp∪A(Ur))
Whilecomplicated,thisformulaisessentiallythesameasG[rel∈Ur];theonlydifferenceisthattheprojectionstep
110
ofIretainsadditionalattributesinthetarget,tomakesurethatallneededcolumnsjoinwithU.
Equation(3)allowsustocalculatethetuplesneededtobeinsertedordeletedfromthegmapGusingonlythedatainthedatabasebeforetheupdateandtheupdateitself.IfweviewGandUasthemultisetsoftuplesthatareproducedbythequeriesGandU(assumethatmultisetprojection,selectionandjoinoperatorsareusedinthequeries),respec-tively,andweusethesubscriptsoldandnewtorefertosetsproducedbeforeandaftertheupdate,thenwecanrewriteEq.(3)as
Gnew=πGpσGs(Unewno
Inew)=πGpσGs((Uold+∆U)on(Iold+∆I))
where“+”representsunionofmultisetsiftheupdateisaninsertion,andmultisetdifferenceiftheupdateisadeletion.SinceIisirrelevanttotheupdate,∆I=∅.Furthermore,multisetprojection,selectionandjoin,distributesovermul-tisetunionanddifferenceyieldingthefollowing:Gnew=πGpσGs((Uold+∆U)onIold)
=Gold+πGpσGs(∆UonIold)
Finally,sinceIisapsj-query,itcanbetranslatedusingAlgorithm1intoaqueryexpressionovergmaps.Thus,thechange∆Gofthegmapcontentscanbeproducedbythequery
∆G=πGpσGs
(∆UonπIpσIs(πA(Ir)G1on···onπA(Ir)Gm)),
(4)
whichonlyusesexistinggmapsandtheupdatedata.
Equation(4)canbeusedfordeletionsaswellasinser-tionsifthegmapswhosetargetdoesnotfunctionallydeter-minetheirotherattributesmaintaintheirdataasmultisets(thatis,theyrecordduplicatesormultiplicities).5.3Updatepropagationalgorithm
Belowwepresenttheupdatepropagationalgorithmwhichusesthetranslationmechanismdescribedabove.Forthesakeofclaritywedefermostefficiencyconsiderationstotheendofthissection.
Algorithm2.GivenalogicalupdatespecifiedbytheupdatequeryUandasetofgmapsG,findtheupdatethatneedstobeperformedoneachofthegmapsinG.
1.foreachG∈Gdo
2.letI=Gr−Ur,Gs[A(Gr−Ur)],A(G))
r−Ur)∩(G∪A(Up
3.ifI≡Grthencontinue//noupdateisneeded4.translateIintoaqueryQovergmapsinGusing
Algorithm1
5.executethequeryπnQ)6.insert/deletealltuplesGpσgeneratedGs(Uo
bythequeryinto/from
GForeachgmapG,thealgorithmfirstidentifiesthepartIthatisirrelevanttotheupdate(line2).Ifthewholegmapisirrel-evant,noupdatepropagationisneeded(line3).Otherwise,
weuseAlgorithm1totranslateIintoaqueryexpression
overgmaps(line4).Theupdatedataarejoinedwiththeresultofthisquery,producingamultisetoftuples(line5).Iftheupdateisaninsertion,thetuplesareinsertedintoG;otherwisetheyaredeletedfromG.
Thereareafewperformanceissuesthatweneedtoad-dress.First,thetranslationofIisperformedthroughtheenhancedqueryoptimizer.Theoptimizerguaranteesthattheproducedplanwillbeoptimal:anyavailablegmapthatcanacceleratequeriescanalsobeusedtoacceleratetheper-formanceofupdates.Second,inmanyapplications,thereisafixedsetoflogicalupdatesthatareusedregularly(suchascoursesbeingcreatedorstudentsenrolling).Thesecom-monupdatequeriescanbetranslatedinadvanceandstoredasqueryplans.Theyonlyneedtoberecompiledwhentheupdatequeryorthephysicalschemachanges.
Finally,thealgorithmpresentedaboverequiresthattheupdategmapholdstheleftmostpositioninthejointree,whichmaynotbetheoptimaljoinorder.Forexample,ifweareinsertingalargenumberoftuples,itwouldnotbeadvisabletoloadtheupdategmapfirst.Since,however,wehavemodeledtheupdateasagmap,wecanletthequeryoptimizerselecttheappropriatejoinorderforthatgmapaswell.Forthis,thequeryoptimizerneedstobeextendedwiththenotionofarequiredgmapwhichshouldbeusedinanyplanproducedbytheoptimizer.Then,fortheupdatetranslation,wecanflagtheupdategmapasrequiredanduseittogetherwiththeremainingdatabasegmapstocompilethegmapqueryG.5.4Updateexample
Weillustratethealgorithmabovewithanexample.Considertheupdatequeryenrollstudentpresentedearlier:
def_queryenroll_studentbyselectStudent,Course
whereStudentattendsCourse
Assumethatourdatabaseconsistsoftwogmaps,onethatmapsfacultymemberstocoursesthattheyteach,
def_gmapFCasbtreeby
givenFacultyselectCoursewhereFacultyteachesCourse
andonethatrecordsthestudentsandteacherofeachcourse,def_gmapCFSasbtreeby
givenCourseselectFaculty,StudentwhereFacultyteachesCourseand
StudentattendsCourseTopropagatetheupdatetothedatabase,weconsidereachdatabasegmapseparately.ThegmapFCisnotaffectedbytheupdatesinceithasnocommonrelationswiththeupdatequery.TheupdatestogmapCFSdependbothontheupdatedataandontheexistingcontentsofFC.Weneedtoappendtoeach(CourseId,StudentId)pairintheupdatedatathefacultymemberwhoteachesthecoursebeforeitisaddedtoCFS.Thealgorithmconstructsthetuplestobeinsertedbyconsideringthepartofthegmapthatisnotaffectedbytheupdate,i.e.,thequery
def_queryQbyselectCourse,
FacultywhereFacultyteachesCourseandfindingatranslationforit.QueryQcandefinitelybeansweredbyusinggmapFC,andthusthetuplestobeinsertedintoCFSarefoundbyjoiningtheupdategmapenrollstudentandthegmapFC.gmapCFScannotbeusedasthesourceoftheneededinformationbecauseCFSwillnotcontain(CourseId,FacultyId)pairsforcoursesthatdonotyethaveanystudents.Line3ofAlgorithm1testswhetherCFScoverstherelationteaches.Aslongasthereisnoinclusiondependencyfromteachestoattends,theCFSgmapwillberejected.6Applications
InSect.2,wedemonstratedhowgmapssubsumethefacili-tiesofprimaryandsecondarystoragestructuresinconven-tionaldatabasesystems.Inthissection,weoutlineavarietyofotherapplicationsthatusetheintegratedquerytranslation-optimizationengineforqueriesandupdates.6.1Databaseloading
Existingdatabasesoftenprovidespecialfeaturestosupportbulkloadingofdata.Thesefacilitiestendtobeadhocandimposerestrictionsontheformatoftheimporteddata.Theusermustmanuallytranslateallimporteddataintofilesthatmatchtheprimarystoragestructuresandthenloadeachoneindividually.
Iftheimporteddatacanbedescribedasapsj-queryonthelogicalschema,initialloadingcanbeviewedasaspe-cialcaseofphysicalschemaevolution.Theimporteddatafilescanbeviewedasinefficientgmapstructures.The“real”gmapsusedtostorethedatapermanentlyareloadedbyrunningtheirqueriesagainsttheimportedfiles.Forexam-ple,toruntheexperimentdescribedinSect.8,wepopu-latedourdepartmentdatabasewithdatafromavarietyofsources.CourseenrollmentswereretrievedfromanIMSdatabasemaintainedbytheregistrar.FacultyadvisingdatawerefoundinanIngresdatabase.TAassignmentswerekeptinahand-maintainedASCIIfile,andsoon.AllthesedataweredumpedintoUnixfilesinasimpletextualformat.WethenwroteasetofinterfacefunctionsthatallowedustoviewtheseUnixfilesasanewgmapstructure.Thefunc-tionsallowediterationthroughthetuplesandstatisticsin-quiries(suchasacountoftuples)forthebenefitofthequeryoptimizer.Duringloading,alldatawereautomaticallycom-bined(e.g.,joinedandprojected)tomatchtheirfinalformatsandthenbulk-loadedintothestoragestructures.Thequery-processingengineguaranteedthatalldatawereprocessedoptimally.
6.2Databaseinteroperability
Inthepreviousparagraph,weshowedhowtocreateathin“veneer”overUnixtextfilestomakethembehavelikegmaps.Forpurposesofbulkloading,asimpleread-only
111
interfacesufficed,butwecouldhaveaddedfunctionsforin-serting,deletingandassociativelyaccessingtuplesinthesefiles.Inshort,withverylittlework,atextfilecanbemadetoemulateagmap.Thisideacanbeextendedtosupporthet-erogeneousstorageorganizationsbyhidingtheirdifferencesunderthegmapabstraction.Aslongasthedatacontentsofallthesedistinctsourcescanbedescribedbypsj-queriesoverasinglelogicalschema,thequeryoptimizercantrans-latelogicalqueriesintoaccessplansoverthem.ThisstrategyhasmanysimilaritieswiththeADMSsystem(Roussopou-los1993)whichusestheconceptofviewindexing(Rous-sopoulos1982)toefficientlyrepresentmaterializedviews,togetherwithasemanticqueryoptimizertosupportdatabaseinteroperability.
6.3Mainmemorycaching
Anotherapplicationofgmapsistosupportcacheddataintransientmain-memorydatastructuressuchasarraysorhashtables.Ifthecontentsofthestructurecanbedescribedbyapsj-queryonthelogicalschema,itcanbetreatedlikeagmap.Thefactthatitisnotpersistentsimplymeansthatitsdatamustbereplicatedinother(persistent)gmaps.Manydatabaseapplicationscachepartsofthedatabase,butbe-causeofrestrictionsindatareplication,thesedatacannotbepartofthephysicalschema,andthushavetobeaccessedandmaintainedmanually.
6.4Acceleratingcomplexstructureupdates
Althoughpathindicesareusefulforacceleratingcommonqueries,theyareexpensivetomaintain.Previousproposalsforbothnestedindicesandfieldreplicationsuggestincludinglinksalongtheindexedpathtobeusedduringupdateprop-agationtoacceleratethejoinsrequired.Inbothcases,thedatastructuresandthealgorithmsformaintainingthemarecustom-madefortheapplication.Wecanachievethesameeffectbyusingsimplegmapsalongthepathsthatneedtobetraversedduringtheupdatepropagation.Gmapsplacedatpointswhereexpensivejoinsareperformedactlikejoinacceleratorsinthesamewaythatinternallinksacceleratejoins.Inaddition,theyaremuchmoreversatileinthattheycanbeusedbyarbitraryqueriesthatinvolvethesejoinssincetheyarenottiedtoanyspecificlargerstructure.
Suppose,forexample,thatweuseaheaptostorecoursedataandalsoreplicatethenameanddepartmentnameofthefacultymemberwhoteacheseachcourse:
def_gmapcourse_dataasheapbygivenCourse
selectCourse.name,Course.level,Faculty.name,Dept.name
whereFacultyteachesCourseand
Facultyworks_inDeptWhenfacultymembersmovetoadifferentdepartment,weneedtoupdatethecoursedataforalltheircourses.ShekitaandCarey(19)suggestaddingbackwardlinksfromfacultytotheirrespectivecourses;ineffect,advocat-ingapointer-basedjoin.
112
Wepropose,instead,toaddagmapfromfacultytocourses:
def_gmapcourses_indexashash_tablegivenFacultyselectCoursewhereFacultyteachesCourseAlthoughtheperformanceofthisapproachmaynotmatchthatofapointerjoin[ithasbeenshownthatpointerjoinsaregenerallyfasterthanjoinsacceleratedthroughanexternalindex(Careyetal.1990)],itofferssignificantadvantages.First,thegmapscanbeusednotonlyforupdatingthestruc-turebutbyanyotherqueryaswell.Second,ifusagepatternschangesothatmaintenanceoftheacceleratorisnotbene-ficial,wecansimplyremoveit.Inthehardwiredapproach,wewillalwaysbestuckwiththesamemachinery.Finally,ourapproachallowstheusertoplacejoinacceleratorsatindividuallychosenpoints.Itismuchhardertoachievethisflexibilitywiththehardwiredapproach.6.5Supportforcomplexobjects
Usingupdatequeriestodescribetheschemaofthemodifieddatafacilitatesthehandlingofcomplexobjects.Acom-plexobjectinsertioncanbedescribedasanupdategmapwhosequerydescribesthecomplexobjectschema.Iftheinsertedobjectconsistsofseveraltuplesofdata,thetuplesarenotinsertedoneatatime.Instead,thequeryplantreatstheupdategmaplikeanyotherphysicaldatacontainerandperformsbulkoperations.Theresultofthequeryplanex-ecutionisasetoftuplestobeinsertedintoeachphysicalstoragestructure.Anyavailablebulk-loadinginterfacestothesestructurescanbeexploited.7Implementation
Toverifytheapplicabilityandpracticalityofouralgorithmsandobtainafeelingfortheirperformance,webuiltaproto-typeimplementationofoursystemontopofSHORE(Careyetal.1994).SHOREisanobject-orienteddatabasesystemunderconstructionattheUniversityofWisconsin-Madison.WeextendedSHOREwithfacilitiestosupportgmapsandtranslateandexecutequeriesandupdates.Logicalschemadefinitionsareparsedandstoredinalogical-schemacatalog.Physicalstoragestructuresarecreatedfromgmapdefinitions.Parsedgmapqueriesarestoredpersistentlyinasecondphysical-schemacatalog.Thedataorganization,keys,andrecordformatarealsodeterminedbythegmapdefinitions.Gmapsarecreatedandpopulatedbyprocessingupdaterequests.
WehavebuiltaqueryprocessorusingthealgorithmsinSects.3and4.Foralltheexamplespresentedinthispa-per,querytranslationaddsonlyanegligibleoverheadtotheoverallquerycost.Thequeryprocessoralsocontainshookstosupporttheupdateprocessor,asoutlinedinSect.5.Theupdateprocessoracceptsthreelistsofgmaps:updategmaps,targetgmapstobeupdated,anddatabasegmapsthatmaybeusedtosupplydatafortheupdates.Forsimpleupdates,thefirstlistcontainsjustonegmap,andthetargetanddatabase
Table2.ParametersofthedatabaseFacultyStudentsCoursesTAsDeptsInstances500050000100002000100KB/inst
1
0.8
1
0.85
3
listseachcontainallthegmapsinthephysical-schemacat-alog.Othercombinationsofargumentssupportotherappli-cationsdescribedinSect.6.
Wehavedesignedasimplecommoninterfacetoallowavarietyofstoragestructurestoemulategmaps.Thisinter-faceincludesoperationstostoreandretrievedataandmakecostinquiries.WehavecreatedimplementationforexistingSHOREfacilities(B+-treesandheaps)aswellasUnixfiles(forimportingandexportingdata)andmain-memorystruc-tures(forupdategmaps).Tosupporttheuseofthealgorithmin+Sect.5fordeletions,wehavealsomodifiedtheSHOREB-treefacilitytomaintainacountofduplicateinsertions.8Aperformancedemonstration
Inthissection,wedescribeexperimentswithatestdatabasethatillustratethatforaplausiblemixofqueriesandupdates,ourtechniquescanprovidebetterperformancethaneitherrelationalorobject-orienteddatabases.AsweobservedinSect.2,gmapscanbeusedtodescribetherelationsandtheprimaryandsecondaryindicesofrelationaldatabases,aswellastheclassextents,objectsets,andpathindicesofobject-orienteddatabases.Thus,weareabletouseoursystemtosimulatetwo“conventional”configurations;onebasedonanormalizedrelationaldesignandonefollowingatypicalobject-orienteddatabasedesign.AllofourresultsarereportedintermsofcountsofI/Ooperations,sinceabsoluteperformancein“real”databaseswouldbeaffectedbyava-rietyofimplementation-dependentfeaturesthatarebeyondourcontrol.
Intheexperiment,weusedanextendedversionoftheuniversitydatabasepresentedearlier.Wepopulatedonede-partmentwithactualdatadescribingthecomputersciencedepartmentattheUniversityofWisconsin-Madisonandgen-eratedsyntheticdatafor99moredepartmentstocreateadatabaseofreasonablesize.Whiletheactualdatabaseusedfortheexperimentincludesadditionalfields,forsimplic-ityweonlydiscussthelogicalschemaaspresentedinAp-pendixAandFig.1.AfewinterestingparametersofthedataarepresentedinTable2.8.1Theworkload
Theworkloadcontainsmultiplerunsofeightqueriesandfiveupdates.Theactualnumberofrunswasdeterminedbyvariousfactors.Wetriedtomaintainabalancebetweenexpensivequeriesandsimpleones,holdtheupdateloadataboutathirdofthetotalloadformostoftheconfigurations,andmaketherelativefrequenciesasrealisticforauniversityenvironmentaspossible.Tables3and4brieflydescribeeachqueryandupdate.Forexample,theworkloadcontainsthreerunsofqueryQ6.ThelasttwocolumnscontainthetotalcostinpageI/OsattributedtoallrunsofqueryQ6
Table3.QueriesintheworkloadNo.MixGivenFindRelOOQ110FacultynameField,department3030Q210FacultynameCoursestaught
3030Q310StudentnameYear,department,advisor5050Q410TAnameSupportlevel,course4040Q53FacultyareaStudentsadvised5757Q63StudentnameCourses,teachers3942Q72CoursenameStudentsattending8480Q81Departmentname
Coursestaught
59
Table4.UpdatesintheworkloadNo.MixUpdatedescriptionRelOOU11Facultyteachesanewcourse57U22Facultystartsadvisingastudent1014U320Studentaddsacourse160120U44TAisassignedtoacourse2020U56Studentenrollsinadepartment
24
24
whenitisprocessedontwodifferentconfigurationsofthedatabase:arelationalone(Sect.8.2)andanobject-orientedone(Sect.8.3).Notethateachupdateinsertsasinglepairofvalues.Forexample,whenastudentaddsacourse(U3),theinsertedpairwouldbe(StudentId,CourseId).
ToobtainI/Ocounts,weinstantiatedqueriesandup-datesusingrandomvaluesfortheinputparametersandtheinsertedpairs,usedtheactualplansobtainedbyoursystemtoperformthequeryorupdate,andrecordedthesizeoftheresult,aswellasthesizesofallintermediateresultsusedinjoins.Fromthesedata,+wecalculatedI/OcountsusingasimplemodeloftheB-treeandtheheapstoragestructures,basedontheassumptionsthatthepagesizeis8KBandthat4MBareusedforbuffers.Theanalyticalapproachwaschoseninfavorofmeasuringactualresponsetimes,sincetheunderlyingSHOREsystemwasstillunderdevelopmentatthetimeofexperimentation.
8.2Relationaldesign
Therelationalconfigurationfollowsatextbooktranslationofthelogicalschemaintorelations.Wecreatedarelationforeachinternalnodeoftheschemaandoneforthemany-to-manyrelationshipattends.Weassumedthattheat-tributenameisaprimarykeyforeachentityintheschema,thusplayingtheroleoftheidentitysurrogatefortherela-tionalconfiguration.Wedecomposedtheinheritanceasso-ciationvertically,i.e.,theTArelationincludesTA-specificdata(supportlevel),aswellasStudent.nameandCourse.namefortheassistedcourse.Tosimplifythedis-cussion,weassumedthatnestedloopsistheonlyavailablejoinmethod.Secondaryindiceswereaddedwhenneces-sarytoacceleratejoinsandselections,andweclusteredtherecordsfavorablyintheheaps.
Therewasonecasewheregmapscouldnotsimulatetheneededstructure.QueriesQ6andQ7needasecondaryindexontheattendsrelationonbothcoursenameandstudentname.Theseindiceswouldreturnaphysicalpointerintheattendsrelation.Theycannotbeexpressedasagmapbecausegmapdatacanonlybevaluesofnodesintheschemagraph,andattendsisanedgeandnotanode.
113
Note,however,thattheneedfortheseindicesisanarti-factoftherelationaldesignrestrictions:wewouldbebetteroffwithtwoindicesthatmapcoursesdirectlytostudentsandviceversa.Thisconstruct,calledajoinindex,hasbeenshowntoofferadvantagesovertheconventionalrelationalapproach(Valduriez1987).Thus,wereplacedtheattendsrelationwithtwogmapsthatsimulatethedefinitionofajoinindexovertheattendsrelationship.Theresultsshowedthatthejoinindexperformedbetterthanacombinationofarelationclusteredononeofthetwonamesandapairofsecondaryindices.ThecompletephysicalschemafortherelationalconfigurationispresentedinAppendixB.1.
Thetotaldatabasesizeforthisconfigurationis74MB.Runningthefullworkloadonthatdatabasecosts613pageI/Os,%causedbyqueriesand36%byupdates.TheexactcontributionofallrunsofeachqueryandupdateinthetotalcostisshowninTables3and4.8.3Object-orienteddesign
Thephysicalschemafortheobject-orientedconfigurationincludesoneextentfileforeachinternaldomaininthelog-icalschema.Therelationshipattendsisstoredbothaspartofstudentobjectsandaspartofthecourseobjects,i.e.,studentscontainpointerstoallcoursestheyattend,andcoursescontainpointerstothestudentsattendingthem.ThisduplicationallowsefficientexecutionofqueriesQ6andQ7.Theotherrelationshipsarestoredaspartofthedomainclos-esttotheedgelabelinFig.1.Forexample,theadvisesrelationshipisrepresentedbyanattributeoftheFacultyclass.Wealsoencounteredacaseforwhichwehadtoal-tertheplanneddesigntoconformwiththegmapdefinitionlanguage.Initially,wethoughtthatthestudentextentshouldcontainbothstudentsandTAs.However,creatingagmapthatcontainsmembersofbothdomains,i.e.,membersofaclassanditssubclass,requiresaquerythatallowssomeformofunion,andourrestrictednotionofpsj-querydoesnotsup-portsuchqueries.Instead,wefollowedthesamedesignasintherelationalconfigurationanddecomposedTAvertically.Forthegivenworkload,thisalterationhasnoperformanceeffect.ThegmapdefinitionsthatcreatethedesiredphysicalschemaarepresentedinAppendix10.Thetotaldatabasesizeforthisconfigurationis75MB.Runningthefullwork-loadonthedatabasecosts583pageI/Os,68%causedbyqueriesand32%byupdates.ThelastcolumnofTables3and4showthecostsforallrunsofeachqueryandupdate.8.4Object-orienteddesignwithcomplexaccesspathsTheprevioustwoconfigurationsdonotuseanycomplexaccesspathsuchaspathindices1orfieldreplication.Itisworthwhiletoestimatetheimprovementthatcanbeachievedbyequippingtheobject-orientedversionofourdatabase,withsuchaccesspaths.Forthegivenworkload,fieldrepli-cationcanbeappliedinthreecases,asshowninTable5.Eachrowofthetabledescribeswhatisreplicated,theaddi-tionalspacerequiredinmegabytes,theaffectedqueries(no
1
WeusethetermpathindicescollectivelyforwhatBertinoandKim(19)callpathindices,nestedindices,andmulti-indices
114
Table5.ConventionalcomplexpathtechniquesReplicateInSpaceNoSaved“Faculty.workin.name”Faculty0.2Q110“Student.enrolled.name”Student2Q310“TA.assists.name”
TA0.1Q4
20Totalsaved
40
Table6.ApplicationsofgmapsReplicateInSpaceNoSaved“Student.advises−1.name”Student2Q3,U218“Course.teaches−1.name”Course0.4Q6,U19“Faculty.advises.name”Faculty0.6Q551“Course.attends−1.name”Course12Q776“Faculty.teaches.name”
Faculty
0.4Q850Totalsaved
204
updateisaffectedbytheseadditions),andthesavingsthatcanbeattributedtothereplicationforallrunsoftheaffectedquery.
InAppendixB.3weshowthealtereddefinitionsofthefaculty,student,andTAgmaps,withthenewcomplexpathsadded.Therestofthephysicalschemaisidenticaltotheearlierobject-orientedschema.Thedatareplicationadds2MBofadditionalspace,bringingthetotaldatabasesizeto77MB.Thenewdatabaseoffersa7%performancegainoverthepreviousone.Itisinterestingtonotethatthetech-niqueoffieldreplicationasoriginallydescribed(KatoandMasuda1992;ShekitaandCarey19)cannotbeappliedtoanyotherfieldinourdatabasebecauseitisrestrictedtoedgesfromclassestosingle-valuedattributes.Similarre-strictionsalsoprohibitanyusefulapplicationofpathindicesinourdatabase.Therestrictionswereimposedbyprevi-ousauthorsforareason:theimplementationoftheseaccesspathsandthetaskofpropagatingupdateswouldbefarmorecomplexwithoutthem.8.5Aconfigurationwithgmaps
Next,weconsidertheresultsoffullyequippingthedatabasewithgmaps.Insteadofdesigningaphysicalorganizationfromscratch,weshowhowwecanmakeincrementalchangesontheobject-orientedphysicalschematoimproveitsperformance.
First,wereplicateattributesoverpathsthattraversere-lationshipsbothintheinversedirection(fromattributetoclass)andintheforwarddirection(fromclasstoattribute).SuchareplicationbringscostsavingsinqueriesQ3andQ7.Second,weconsiderreplicatingattributesoverpathsthatarenotfunctionalbyassociatingmultiplevalueswitheachinstanceoftherootofthepath.SuchreplicationisverybeneficialforourworkloadsinceitcaneliminatethemostexpensivestepofthemediumandlargequeriesQ5,Q7,andQ8.ThealteredgmapdefinitionsareshowninAp-pendixB.4.TheexactcostsavingsandoverheadsthatarecausedbyeachadditionalaccesspathareshowninTable6.Thesemodificationsofferasignificanttotalgainof204I/Os,or35%overthecostoftheinitialobject-orientedap-proach.Ifweaddtothesesavingsthoseoftheprevioussec-tion(withgmapssimulatingfieldreplication)thetotalgainclimbsto42%.Itisinterestingtonotethelowupdateover-
Table7.Summarycosts
SpaceQueryUpdateTotal(MB)costcostcostRelational74394219613Object-oriented75398185583OO+extensions77358185543gmaps(1)80.4227188415gmaps(2)
92.4
151
188
339
headofallthesereplications.Thereasonisthatwereplicatedattributes,likestudent.name,thatarestatic,i.e.,donotgetupdated.Asaresult,theonlyupdatesthatareaffectedarethoseontheintermediatelinksofthepath.Theattendsrelationship,forexample,isthemostvolatilerelationshipintheschema,soweexpectthattheslightestupdateoverheadwouldoverwhelmanysavingsgained.However,sincetheattendsrelationshipisbi-directional(storedasanattributeofboththestudentandthecourse),bothextentsneedtobeupdatedanyway.Readingandwritingonemoreattribute,student.name,doesnotaddanyoverhead.
Thespaceoverheadisalsolowinallbutonecase:repli-catingthestudentnamesineachcoursemorethandoublesthesizeofthecourseextent.Whetherornotthespace-timetrade-offisworthwhiledependsontheapplication.Theto-talincreaseinspaceusageis15.4MB,bringingthetotaldatabasesizeto92.4MB.Table7comparesthequerycost,updatecost,anddiskusageofallfourapproaches.Forthegmapconfiguration,weshowtheresultsbothwithandwith-outreplicationofstudentnames,inbothcasesincludingtheenhancementsoftheconfigurationwithcomplexpaths.8.6Achangeintheworkload
Finally,weconsidertheproblemofadaptingthedatabasetoachangingworkload.Assumethatourworkloadisaug-mentedbyasinglerunofoneadditionalqueryQ9thatasksforthenamesofallTAssupportedbyadepartment.Moreprecisely,Q9asksforallTAswhoassistacoursetaughtbyafacultymemberthatworksinagivendepartment.Thesimplestwaytohandlethenewqueryintheinitialobject-orientedconfigurationwouldbetoreclustertheTAextentsothatitisorderedbythesupportingdepartmentoftheTAs,andaddanindexfromcourseidentificationstoTAs.ThecostofthequeryQ9,however,remainshigh–1I/Os–andthetotalcontributionofthequerytotheload,takingintoaccounttheadditionalupdateoverheads,climbsto172I/Os.Weusethisconfigurationasthebaselineforthenewworkload.
Theproblemwiththenewqueryisthatitrequiresun-clusteredaccesstothelargestudentfileinordertoobtainthestudentname.ReclusteringthefiledoesnothelpsincequeryQ5dependsonthatspecificclusteringforitsperfor-mance.Keepingtwocopiesofthestudentfile,clusteredforQ5andQ9,respectively,isaunwisebecauseofthehighupdateratesoftheattendsrelationship.Suchasolutionwouldalsoresultinasignificantincreaseinthedatabasesizewithoutmuchhelpinprocessingcost.
gmapsofferadditionaloptions.Wecanselectivelyrepli-catethestudent.nameattributerequestedbythenewqueryfromthestudentextenttotheTAextent.Thislowers
thequerycostby95I/Os.Thenetgainafteraccountingfortheadditionalcostofupdatesis79I/Os.Table8comparesthethreenewconfigurations.Savingsandlossesaregivenrelativetothebaselineconfiguration.Noticethatwhilealloptionsareavailabletothegmap-equippeddatabase,mostconventionaldatabaseswouldonlyhavethefirstoption,andafewmaysupportthesecond.
Underthenewworkload,thegmap-equippedconfigura-tionhasaloadof422I/Oscomparedto755I/Osintheobject-orientedsystem,fortotalsavingsof44%.
9Relatedwork
Mostexistingdatabasesystemsdonotprovidetruephysi-caldataindependence,sinceeveryconstructofthelogicalschemacorrespondsdirectlytoaprimaryphysicalstructure.Forexample,everyrelationinmostrelationalsystems,everyclassextentinextent-basedOOsystems[e.g.,Orion(Kimetal.1990),andZoo(Ioannidisetal.1993)]andeverycollec-tionincollection-basedOOsystems[e.g.,GemStone(MaierandStein1986),Extra/Excess(Careyetal.1988),andOb-jectStore(Orensteinetal.1992)]isstoredinaseparatefile.Themainflexibilityatthephysicallevelcomesfromsec-ondaryaccesspathstothesefiles.
Severalextensionsofboththeprimaryphysicalstruc-tureandthesecondaryaccesspathshavebeenrecentlypro-posedintheliteratureandallowstoringtogetherdatafrommorethanonelogicalconstruct.InSects.2and8,wedis-cussedpathindices(BertinoandKim19;Kimetal.19;MaierandStein1986),joinindices(Valduriez1987),andfieldreplication(KatoandMasuda1992;ShekitaandCarey19),notingtheirrestrictionsandcomparingtheirperfor-mancetoourscheme.Anotherapproachtodecomposingthedatabaseishierarchicaljoinindices(Valduriezetal.1986),ageneralizationofjoinindicesthatallowsonetobuildanindexoveridentitysurrogatesthatpopulatetreesofthelog-icalschemagraph.Accesssupportrelations(ASR)offeradifferentgeneralizationofjoinindices(KemperandMo-erkotte1990a),whichallowsthedefinitionofindicesovertheinstancesofarbitrarychainsoflogicalschemanodes.Thisschemeoffersahigherdegreeofflexibilityandal-lowsthedefinitionofindicesthatstorebothcompleteandpartialinstancesofeachchain.Exceptforthelastfeature,thecontentsofbothhierarchicaljoinindicesandASRscanberepresentedaspsj-queries,andcanthusbedefinedasagmap.However,sincegmapqueriesdonotsupportunions,theycannotrepresentouterjoins,andthereforecannotstoreincompleteinstancesofchains.
Withrespecttothetranslationalgorithm,ourworkmostcloselyresemblesresearchattheUniversityofWaterlooonmaterializedviews(Blakeleyetal.19;YangandLarson1987).Ouralgorithmsupportsamorerestrictedquerylan-guage,butusesinformationaboutinclusionandfunctionaldependenciesaswellas“topological”informationimplicitinagraph-basedlogicalschema.Thisinformationallowsustoidentifysolutionsthatwouldbemissedbythemoregeneralalgorithm.Section5containsanexampleofasolu-tionthatcanonlybefoundwheninclusiondependenciesare
115
takenintoaccount.2Similarly,ourhandlingoffunctionaldependenciesismoregeneralthanthatofthealgorithmofYangandLarson(1987),whichsimplyusestheprimarykeyinformationforeachrelation.Unlessallnon-trivialdepen-denciesaregeneratedbysuperkeys(i.e.,unlessallrelationsareinatleast4thNormalForm),ourschemewillfindmoresolutions.
Withrespecttotheintegrationwiththerestofthequeryoptimizer,mostearliereffortsuseatwo-stageapproach,wherethequeriesarefirsttranslatedintoqueriesoverphys-icalstructures,andtheresultingqueriesarethenoptimizedone-by-onebyaconventionaloptimizer.Inadditiontotheworkonmaterializedviews(Blakeleyetal.19;YangandLarson1987),sucheffortsincluderesearchwhosegoalwasnotphysicaldataindependencebutsimplyprocessingefficiency.Examplesincluderesearchonreusingcommonsubexpressionswithinaquery(Hall1976)orbetweenmul-tiplequeries(Sellis1986),reusingresultsofpreviousqueries(Finkelstein1982),andusingintegrityconstraintsforseman-ticqueryoptimization(Chakravarthy1990).KemperandMoerkotte(1990b)optforaunifiedapproachoftranslationandoptimizationfortheASRsbyextendingarule-basedop-timizertoincludeappropriaterewritingrules.Ourapproachofenhancingaconventionaloptimizerwiththenecessarytranslationstepstakesadvantageoffullcostinformationavailabletotheoptimizertoperformearlypruningofin-feriorsolutions,whilekeepingtheoveralloptimizationcostlow.
10Conclusion
Wehavepresentedanewapproachtophysicalschemade-signthatusesadeclarativelanguagetodescribethecontentsofstoragestructures.Carefullyrestrictingthelanguageal-lowsefficientalgorithmstotranslatequeriesoverthelogicalschemaintoaccessplansusingthephysicaldatastructures.Wehaveshownhowtointegratethequerytranslationalgo-rithmintoaconventionalqueryoptimizer.Asimplemod-ificationofthequerytranslationalgorithmsupportspropa-gationofupdatestothedatabase.Aprototypesystemthatincorporatesthemajoraspectsofthisapproachiscurrentlyoperational.Wehaveusedittodemonstrateinarealisticenvironmenthowourapproachcanachievesignificantper-formancegainsovermoreconventionalschemes.
Furtherdevelopmentisnecessaryforourtechniquetoreachitsfullpotentialinapracticalsetting:
–Thispaperassumesthattheuserqueryhasalreadybeendecomposedinpsj-querycomponentsthatobeythesamerestrictionsasthegmapdefinitionlanguage(Sect.2.2).Itshouldbepossibletoextendourtechniquetotranslateuserqueriescontainingconstructsnotallowedingmapqueries,suchasarbitraryjoinsofgmaps,duplicateoc-currencesofgmapsinthewhereclauseandduplicateoccurrencesofattributesinthequerytarget.
2
IntheexampleofSect.5.4,thetranslationoftheupdatequeryforthegmapCFShasonesolutionifthereisnoinclusiondependencyfromteachestoattends,andtwosolutionsotherwise.Analgorithmthatdoesnotcheckforinclusiondependenciescannotdiscoverthesecondsolution
116
Table8.AlternativedatabasereconfigurationsDescriptionoftechniqueReclusterstudentextentKeeptwocopiesofstudentextentReplicate“Student.name”inTAextent
Spaceoverhead(MB)0400.1
ActivitiesaffectedQ5,Q9Q9,U3Q9,U4
Saved(lost)(5)(35)79
–AswediscussedinSect.5,wedonotguaranteethatthedatadescribedinthelogicalschemacanbestoredinthephysicalschema.Itshouldbepossibletodevisesufficientconditionstoensurethataproposedphysicalschema“covers”alogicalschema,inthesensethatithassufficient“informationcapacity”tostoreanydatabaseinstanceconsistentwiththelogicalschema.Whentheseconditionsarenotmet,thesystemshouldbeabletowarnaboutupdatesthatresultininformationloss.
–Thetranslationalgorithmshouldbeextendedtotakeintoaccountadditionalintegrityconstraintsandpermitthetranslationofqueriesintounionsofotherqueries.Itshouldalsobeabletoincorporatestoragestructuresthatofferahierarchicalinterface.
–Finally,theincreaseofavailablechoicesinthephysicalschemadesignputstheburdenonthedatabaseadmin-istratortomakethecorrectchoices.Weneedtodesigntoolsthatguidetheadministratorinchoosingtheappro-priatecombinationofstoragestructures.
Acknowledgements.OdysseasG.TsatalosandMarvinH.SolomonhavebeenpartiallysupportedbytheAdvancedResearchProjectAgency,ARPAordernumber018(formerly8230),monitoredbytheU.S.ArmyResearchLaboratoryundercontractDAAB07-91-C-Q518.YannisE.IoannidishasbeenpartiallysupportedbytheNationalScienceFoundationunderGrantsIRI-9113736,IRI-9224741,andIRI-9157368(PYIAward)andbygrantsfromDEC,IBM,HP,AT&T,andInformix.
Appendices
A.LogicalschemainODL
interfaceDept{public:
attributestringname;};
interfaceFaculty{attributestringname;attributestringareaattributeref interfaceStudent{attributestringname;attributeshortyear;attributeref interfaceTA:publicStudent{attributefloatsupport_level;attributeref }; interfaceCourse{attributestringname;attributeshortlevel; };} BPhysicalschema B.1.Relationalphysicalschema //Relations def_gmapDeptRelationasheapbygivenDeptselectDept.namedef_gmapFacultyRelationasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept.name whereFacultyworks_inDeptdef_gmapCourseRelationasheapbygivenCourseselectCourse.name,Course.level,Faculty.namewhereFacultyteachesCoursedef_gmapStudentRelationasheapbygivenStudent selectStudent.name,Student.year,Dept.name,Faculty.name whereStudentenrolledDeptandFacultyadvisesStudentdef_gmapTARelationasheapbygivenTAselectStudent.name,TA.support_level,Course.namewhereTAisaStudentandTAassistsCourse//NameIndices def_gmapDept_name_indexas btreebygivenDept.nameselectDeptdef_gmapFaculty_name_indexas btreebygivenFaculty.nameselectFaculty def_gmapCourse_name_indexas btreebygivenCourse.nameselectCourse def_gmapStudent_name_indexasbtreebygivenStudent.nameselectStudent //OtherIndices def_gmapFaculty_area_indexasbtreebygivenFaculty.areaselectFacultydef_gmapCourse_teaches_indexasbtreebygivenFaculty.nameselectCoursewhereFacultyteachesCoursedef_gmapStudent_advisor_indexasbtreebygivenFaculty.nameselectStudentwhereFacultyadvisesStudentdef_gmapTA_assists_indexasbtreebygiveCourse.nameselectTAwhereTAassistsCoursedef_gmapCourse2Student_join_indexasbtreeby givenCourseselectStudentwhereStudentattendsCoursedef_gmapStudent2Course_join_indexasbtreeby givenStudentselectCoursewhereStudentattendsCourse B.2.Object-orientedphysicalschema//ClassExtents def_gmapDeptExtentasheapbygivenDeptselectDept.name def_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Student.CoursewhereFacultyworks_inDept andFacultyadvisesStudentandFacultyteachesCoursedef_gmapCourseExtentasheapbygivenCourseselectCourse.name,Course.level,Student whereCourseattendedStudentdef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year,Dept,Course whereStudentenrolledDeptandStudentattendsCoursedef_gmapTAExtentasheapbygivenTAselectStudent,TA.support_level,CoursewhereTAisaStudentandTAassistsCourse//NameIndices def_gmapDept_name_indexas 117 btreebygivenDept.nameselectDeptdef_gmapFaculty_name_indexas btreebygivenFaculty.nameselectFaculty def_gmapCourse_name_indexas btreebygivenCourse.nameselectCourse def_gmapStudent_name_indexas btreebygivenStudent.nameselectStudent//OtherIndices def_gmapFaculty_area_indexasbtreebygivenFaculty.areaselectFacultydef_gmapFaculty_advises_indexasbtreebygivenStudentselectFacultywhereFacultyadvisesStudentdef_gmapFaculty_teaches_indexasbtreebygivenCourseselectFacultywhereFacultyteachesCoursedef_gmapFaculty_works_indexasbtreebygivenDeptselectFacultywhereFacultyworks_inDept B.3.Object-orientedschemawithcomplexpathsdef_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Dept.name,Student,Course whereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCoursedef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year, Dept,Dept.name,Course whereStudentenrolledDeptandStudentattendsCoursedef_gmapTAExtentasheapbygivenTAselectStudent,TA.support_level,Course,Course.name whereTAisaStudentandTAassistsCourseB.4.gmapSchema def_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Dept.name,Student,Student.name,Course,Course.name whereFacultyworks_inDeptand 118 FacultyadvisesStudentandFacultyteachesCourse def_gmapCourseExtentasheapbygivenCourseselectCourse.name,Course.level, Student,Student.name,Faculty.namewhereCourseattendedStudentandFacultyteachesCoursedef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year, Dept,Course,Faculty.name whereStudentenrolledDeptandStudentattendsCourseandFacultyadvisesStudentReferences 1.AhoA,SagivA,UllmanJ(1979)Equivalencesamongrelationalex-pressions.SIAMJComput8:218–247 2.BertinoE,KimW(19)Indexingtechniquesforqueriesonnestedobjects.IEEETransKnowlDataEng1:196–214 3.BlakeleyJ,CoburnN,LarsonP(19)Updatingderivedrelations.ACMTransDatabaseSyst14:369–400 4.CareyM,DeWittD,VandenbergS(1988)Adatamodelandquerylanguageforexodus.In:ProcACMSIGMODConf,June,pp413–423,Chicago,Ill 5.CareyM,ShekitaE,LapisG,LindsayB,McPhersonJ(1990)Anincrementaljoinattachmentforstarburst.In:ProcIntVLDBConf,pp662–673,Brisbane,Australia 6.CareyM,DeWittD,FranklinM,HallN,McAuliffeM,NaughtonJ,SchuhD,SolomonM,TanT,TsatalosO,WhiteS,ZwillingM(1994)Shoringuppersistentapplications.In:Proc.ACMSIGMODConf,May,pp383–394,Minneapolis,Minn 7.CatelRGG(1993)Theobjectdatabasestandard:ODMG-93.Morgan-Kaufman,SanMateo,Calif 8.ChakravarthyU(1990)Logic-basedapproachtosemanticqueryopti-mization.ACMTransDatabaseSyst15:163–2071990. 9.ChandraA,MerlinP(1977)Optimalimplementationofconjunctivequeriesinrelationaldatabases.In:ProceedingsoftheAnnualACMSymposiumonTheoryofComputing,May,pp77–90 10.FinkelsteinS(1982)Commonexpressionanalysisindatabaseappli-cations.In:ProcACMSIGMODConf,pp3–374,Orlando,Fla11.HallPV(1976)Optimizationofasinglerelationalexpressionina RDBMS.IBMJResDev20 12.IoannidisY,LivnyM,HaberE,MillerR,TsatalosO,WienerJ(1993) Desktopexperimentmanagement.IEEEDataEng16:19–23 13.KatoK,MasudaT(1992)Persistentcaching.IEEETransSoftware Eng18:631–5 14.KemperA,MoerkotteG(1990a)Accesssupportinobjectbases.In: ProcACMSIGMODConf,pp290–301,AtlanticCity,NJ 15.KemperA,MoerkotteG(1990b)Advancedqueryprocessinginobject basesusingaccesssupportrelations.In:ProcIntVLDBConf,pp290–301,Brisbane,Australia 16.KimW,BertinoE,GarzaJ(19)Compositeobjectsrevisited.In: ProcACMSIGMODConf,pp337–347,Portland,Ore 17.KimW,GarzaJF,BallouN,WoelkD(1990)Architectureofthe ORIONNext-generationdatabasesystem.IEEETransKnowlDataEng2:109–124 18.MaierD,SteinJ(1986)Indexinginanobject-orientedDBMS.In:2nd InternationalWorkshoponObject-OrientedDatabaseSystems,Septem-ber,pp171–182,PacificGrove,Calif 19.MillerR,IoannidisY,RamakrishnanR(1993)Theuseofinformation capacityinschemaintegrationandtranslation.In:ProcIntVLDBConf,pp120–133,Dublin,Ireland 20.OrensteinJ,HaradhvalaS,MarguilesB,SakaharaD(1992)Querypro-cessingintheObjectStoredatabasesystem.In:ProcACMSIGMODConf,pp403–412,SanDiego,Calif 21.RoussopoulosN(1982)Viewindexinginrelationaldatabase.ACM TransDatabaseSyst7:259–290 22.Roussopoulos,EconomouN,andStamenasA(1993)ADMS:atestbed forincrementalaccessmethods.IEEETransKnowlDataEng5:762–773 23.SelingerPG,AstrahanMM,ChamberlinDD,LorieRA,PriceTG (1979)Accesspathselectioninarelationaldatabasemanagementsys-tem.In:ProcACMSIGMODConf,pp23–34,Boston,Mass 24.SellisT(1986)Globalqueryoptimization.In:ProcACMSIGMOD Conf,pp191–205,Washington 25.ShekitaE,CareyM(19)Performanceenhancementthroughrepli-cationinanobject-orientedDBMS.In:ProcACMSIGMODConf,pp325–336,Portland,Ore 26.TsatalosO(1994)Thegmap:aversatiletoolforphysicaldatainde-pendence.PhDthesis,UniversityofWisconsin-Madison,December27.TsatalosO,IoannidisY(1994)Aunifiedframeworkforindexingin databasesystems.In:ProceedingsoftheInternationalConferenceonDatabaseandExpertSystemApplications,September,pp183–192,Athens,Greece 28.UllmanJ(1988)Principlesofdatabaseandknowledge-basesystems, vol1.ComputerSciencePress,NewYork,pp423–425,Rockville,Md29.ValduriezP(1987)Joinindices.ACMTransDatabaseSyst12:218–24630.ValduriezP,KhoshafianS,CopelandG(1986)ImplementationTech-niquesofComplexObjects.In:ProcIntVLDBConf,pp101–109,Kyoto,Japan 31.YangHZ,LarsonPA(1987)QuerytransformationforPSJ-queries.In: ProcIntVLDBConf,pp245–254,Brighton,UK
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- 7swz.com 版权所有 赣ICP备2024042798号-8
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务