您好,欢迎来到微智科技网。
搜索
您的当前位置:首页The GMAP A versatile tool for physical data independence

The GMAP A versatile tool for physical data independence

来源:微智科技网
TheVLDBJournal(1996)5:101–118

TheVLDBJournalcSpringer-Verlag1996󰀄TheGMAP:aversatiletoolforphysicaldataindependence

OdysseasG.Tsatalos1,MarvinH.Solomon2,YannisE.Ioannidis2

12

IBMAlmadenResearchCenter,650HarryRd,SanJose,CA95120-6099,USA;e-mail:odysseas@almaden.ibm.com

ComputerSciencesDepartment,UniversityofWisconsin-Madison,1210W.DaytonStreet,Madison,WI53706-1685,USA;e-mail:{solomon,yannis}@cs.wisc.edu

EditedbyMatthiasJarke,JorgeBocca,CarloZaniolo.ReceivedSeptember15,1994/AcceptedSeptember1,1995

Abstract.Physicaldataindependenceistoutedasacentralfeatureofmoderndatabasesystems.Itallowsuserstoframequeriesintermsofthelogicalstructureofthedata,lettingaqueryprocessorautomaticallytranslatethemintooptimalplansthataccessphysicalstoragestructures.Bothrelationalandobject-orientedsystems,however,forceuserstoframetheirqueriesintermsofalogicalschemathatisdirectlytiedtophysicalstructures.Wepresentanapproachthatelim-inatesthisdependence.Allstoragestructuresaredefinedinadeclarativelanguagebasedonrelationalalgebraasfunctionsofalogicalschema.Wepresentanalgorithm,integratedwithaconventionalqueryoptimizer,thattranslatesqueriesoverthislogicalschemaintoplansthataccessthestoragestruc-tures.Wealsoshowhowtocompileupdaterequestsintoplansthatupdateallrelevantstoragestructuresconsistentlyandoptimally.Finally,wereportonexperimentswithapro-totypeimplementationofourapproachthatdemonstratehowitallowsstoragestructurestobetunedtotheexpectedorob-servedworkloadtoachievesignificantlybetterperformancethanispossiblewithconventionaltechniques.

Keywords:Indexing–Physicaldatabasedesign–Materi-alizedviews–Physicaldataindependence

1Introduction

Physicaldataindependenceisusuallydescribedastheabil-itytowritequerieswithoutbeingconcernedwithhowthedataareactuallystructuredondisk.Incurrentdatabasesys-tems(DBMSs),however,queriesaretiedtologicalcon-structs,suchasrelations,classextents,orobjectsets,thatcloselytrackthephysicalorganizationofdata.Inarelationaldatabase,forexample,eachrelationisusuallystoredasafile,perhapswithaprimaryindex.Thedatabaseadministra-torcanimproveperformancebyaddingsecondaryindicesorbyspecifyingtheclusteringoffiles,butmoreextensive

Correspondenceto:OdysseasG.Tsatalos

improvementsrequiremodifyingthelogicalschema,forex-amplebyde-normalizingtables.Suchmodificationsnecessi-taterewritingqueriesand,thus,physicaldataindependenceislost.

Ourgoalistoimprovephysicaldataindependencebydecouplingphysicaldecisionssuchasclusteringandrepli-cationfromthelogicaldatamodel,sothatthephysicalorga-nizationcanbealteredwithoutchangingthelogicalschemaorquerieswrittenagainstit.Amoresubtlebenefitisthatitplacesawiderrangeofpossibilitiesfordataorganizationatthedisposalofthedatabaseadministrator.Forexample,thefactthatthedataaredescribedinatraditionalnormalizedrelationalschemashouldnotprecludeareplicated,nestedphysicalorganization,ifthatorganizationwouldachievebetterperformancefortheanticipatedmixofqueriesandupdates.

Assumethatdataarestoredinfilesofrecords,possiblyimplementedbyanindexstructuresuchasaB+-tree.Insteadofrequiringaone-to-onecorrespondencebetweenlogicaldataconstructsandphysicalstoragestructures(e.g.,relation↔file),weallowthecontentsofeachfiletobedefinedasafunctionofthelogicalschema,specifiedbyarestrictedrelational-algebraexpression.Wecallthecombinationofafileanditsdefinitionagmap(pronounced“gee-map”andanacronymforgeneralizedmultilevelaccesspath).Inthesim-plestcases,gmapscorrespondtotraditionalstoragestruc-turessuchasanunorderedfileofthetuplesinarelationoranindexonthatfile.gmaps,however,canalsobeusedtopartitionthedatabaseverticallyandhorizontallyandaddmultipleaccesspaths,generalizingpathindices.Sincegmapsareallowedtocontainoverlappingdata,theycanalsocap-tureredundantstoragestructures.Gmapsareinvisibleatthelogicallayer,sotheirdefinitionsaffectonlytheperformanceofqueriesandnottheirsemantics.

Inthispaper,werestrictbothgmapdefinitionsandqueriestoproject-select-join(psj)expressionsoverasim-plesemanticdatamodel.Wedemonstratethatsuchexpres-sionsarepowerfulenoughtoexpressmostconventionalstor-agestructures,aswellasmore“exotic”techniquessuchaspathindices(BertinoandKim19;MaierandStein1986),fieldreplication(KatoandMasuda1992;ShekitaandCarey19),andmore.Wepresentanalgorithmtotranslateuser

102

Dnamenameworks-inFadvisesenrolledareanameteachesattendsSyearnameIsAlevelCattendedlevelassistsTAFig.1.Thelogicalschema

queries,expressedaspsj-queriesoverstructuresinthelogi-calschema,intorelationalexpressionsoverthegmaps.Wealsoshowhowthistranslationcanbeintegratedintoacon-ventionalqueryoptimizer.

Oneofthebenefitsofourapproachisthatgmapsmaystoreredundantdatatoimprovetheperformanceofqueries.Thus,updatesmayneedtochangemultiplegmapsinacon-sistentmanner.Weshowhowasimplemodificationofthequerytranslationalgorithmcanproduceplanstoperformtheseupdates.Wealsodemonstratehowthisflexibilitycanbeusedinseveralotherareas,e.g.,accelerationofbulkload-ingofthedatabaseandaccelerationofupdatesofcomplexobjects.

Allofthealgorithmspresentedinthispaperhavebeenimplementedinaprototypesystem.Wereportonexperi-mentswithatestdatabasethatillustratesthatforaplausiblemixofqueriesandupdates,ourtechniquesallowthephysi-calrepresentationtobetunedtoprovidebetterperformancethanwhatcouldbeachievedthroughstandardrelationalorobject-orientedmethods.Weconcludethispaperbyiden-tifyingtheareasthatneedfurtherdevelopmentbeforeourtechniquecanbewidelyusedinapracticalsetting.2Thegmapdefinitionlanguage

Inthissection,weintroduceourdatamodelandthecorre-spondingdatadefinitionlanguage(DLL).TheDDLhastwoparts,thelogicalDDL,whichdefinesthelogicalschemacapturingtheconceptualorganizationofthedata,andthephysicalDDL,whichdefinesthestoragestructurescontain-ingthedatathatinstantiatethelogicalschema.Wepresentthemodelintwonotations,asemanticone(resemblingtheERmodel)andaformalrelationalone.Thetwonotationsareequivalent;thesemanticnotationismoreintuitiveasauserinterface,butallofouralgorithmsmanipulatetherelationalformsofschemas.

2.1Thelogicaldatadefinitionlanguage

Inthesemanticnotation,schemasaredepictedasgraphs.Throughoutthispaperweillustrateourapproachwithanexampledatabasedescribingauniversityanditspersonnel(seeFig.1).ThetextualformoftheschemaisgiveninAp-pendixAusingODL(Catel1993).

NodesinFig.1representdomains,andsolidedgesrepre-sentrelationshipsbetweenthem.Leavesrepresentprimitive

domainssuchasintegers,characterstrings,orrealnum-bers.Internalnodesrepresentdomainspopulatedwithiden-titysurrogates(tupleorobjectidentifiers).Inourexampleschema,thesedomainsareDept(department),Faculty,Student,Course,andTA(teachingassistant).Toreduceclutterinthefigures,thesedomainnamesareabbreviatedtotheirinitials.Functionaldependenciesareindicatedbyarrowheads.Inclusiondependencies(formallydefinedinSect.3)canalsobeexpressedbutarenotshownforsimplicity.IsAassociationsaredenotedbydashedarcspointingtothesu-pertype.Forourpurposes,theyaresimplyrelationshipswithcertainfunctionalandinclusiondependenciesimpliedbyde-fault.AnameoftheformD.disusedtodenotebothaprim-itivedomainanditsrelationshiptoaninternaldomain.Forexample,Course.levelnamesbothaprimitivedomainofintegersanditsrelationshiptotheCoursedomain.

Intherelationalformofthedatamodel,eachedgeofaschemagraphfromdomainAtodomainBisrepresentedasabinaryrelationwithattributesAandB.Becauseofthiscorrespondence,weoftenusetheterm“attribute”asasynonymfor“domain”(nodeinthegraph)andtheterm“baserelation”asasynonymfor“relationship”(edge).Ouralgorithmsoperateonthe(binary)relationalformoftheschema,sotheyapplytoanysemanticmodelthatcanberepresentedbybinaryrelationswithfunctionalandinclusiondependencies.

2.2Thephysicaldatadefinitionlanguage

Inoursystem,allphysicalstoragestructuresaredefinedasgmaps.Agmapconsistsofasetofrecords(thegmapdata),aquerythatindicatesthesemanticrelationshipsamongtheattributesoftheserecords(thegmapquery),andadescrip-tionofthedatastructureusedtostoretherecords(thegmapstructure).Althoughtheactualdatabasestoresgmapdataratherthanthebaserelations,thegmapdatamaybethoughtofastheresultofrunningthegmapqueryonthebaserela-tions.

GmapqueriesareexpressedinasimpleSQL-likelan-guage.Forexample,thegmap

def_gmapcs_faculty_by_areaasbtreebygivenFaculty.areaselectFaculty

whereFacultyworks_inDeptand

Dept=cs_oidstoresasetofpairscontainingFacultyidentifiersandthecorrespondingareanames.Onlyfacultymembersinthecomputersciencedepartment(identifiedbytheconstantcsoid)areincluded.ThegmapstructureisaB+-treein-dexedbyFaculty.area.Theentirebyclausedefinesthegmapquery.Attributesfollowingthegivenandselectkeywordsarecalledinputandoutputattributes,respectively,andthepredicateDept=csoidiscalledaselection.In-putattributesformthesearchkeyforgmapstructuresthatallowassociativeaccess.

Thegmapquerycanalsobeexpressedgraphicallyasasubgraphoftheschemagraphcalledthequerygraph(seeFigs.2–9).ShadededgescorrespondtorelationshipsorIsAassociationsexplicitlymentionedinthewhereclauseor

implicitlymentionedaspartofprimitiveattributenames.Inputattributesareindicatedbysmallarrows,andoutputattributesareindicatedbydoublecirclesaroundnodes.Re-strictionsaredescribedasannotationsonthecorrespondingnodes.

Eachqueryexpressibleinthislanguageisequivalenttoarestrictedpsjqueryontherelationalformofthelogicalschema:

Q=πAσS(R1onR2on···onRn)

Intheaboveexample,Q=πF,F.areaσD=cs

oid(F.area

onworksin).

Expressiblequeriesobeythefollowingrestrictions:1.theyarerange-restricted,i.e.,allattributesinSandA

areattributesoftherelationsR2.eachattributeoftherelationsRi

intheprojectionlistA

iappearsatmostonce3.selectionsareconjunctionsofcomparisons(=,>,≥,<,≤)betweenattributesandconstants

4.joinsarenatural,i.e.,onlyattributeswiththesamenamearejoinedandallattributesmaintaintheirnameintheresult.Inparticular,self-joinsarenotallowed.Intheremainderofthepaper,weusethetermpsj-querytorefertoaquerythatconformstotheserestrictions.2.3Thequerylanguage

Weoftenusetheterm“logicalquery”torefertoqueriesposedonthelogicalschema.Inthispaper,weconsideronlylogicalquerieswritteninthesamelanguagethatisusedforgmapqueries.Thatis,theymustberestrictedpsj-queries.Inaddition,eachquerymustbetranslatableintoapsj-queryovergmapsorprojectionsofthem.Thus,wedonothan-dlecaseswherelogicalqueriesneedtobetranslatedintounionsorarbitrarysequencesofpsj-queries.Notethattrans-latedlogicalqueriesinvolverelationswitharbitraryarity(thegmapdata),whilegmapqueriesinvolvebinaryrelationsonly(correspondingtorelationships).2.4Examples

gmapscanbeusedtodefinearbitraryphysicalrepresenta-tions,includingthoseofaconventionalnormalizedrelationaldatabase,anobject-orienteddatabase,oranycombinationofthetwo.

Toillustratetheobject-orientedapproach,supposewewanttoclustertogetherallinformationabouteachfacultymember.GiventheobjectidentifierofaFacultyobject,weshouldbeabletoretrievepersonalinformationaswellastheobjectidentifiersofthefacultymember’sdepartment,advisees,andcoursestaught.

Agmapthatmeetsthesespecificationsmaybedefinedasfollows(Fig.2):

def_gmapfaculty_dataasheapby

givenFacultyselectStudent,Dept,Course,Faculty.area,Faculty.name

103

Dnameworks-inFareaadvisesSteachesCTAFig.2.Thefacultyclassextent

DFareaSCTAFig.3.Afacultyindexon”area”

whereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCourseAsecondaryindexinarelationalsystemcanalsobedefinedeasilyinourlanguage.Forexample,anindexonthefacultyareaisdefinedasfollows(Fig.3):

def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultyNotethattheindexisnotdefinedintermsofthepreviousgmap,aswouldbethecaseinarelationaldatabase,butintermsofthelogicalschema.

Inthefacultydataexample,itmightbedesirabletoincludeinafacultymember’srecordthedepartmentnameinadditiontothedepartmentid,becauseforexample,thede-partmentnameisfrequentlyprintedalongwiththenameofthefacultymember.Thedepartmentnameis,inthiscase,anestedattributeoftheFacultydomain.Thisessen-tiallyimplementsfieldreplication(KatoandMasuda1992;ShekitaandCarey19),whichhasbeenshowntoofferseveraladvantages.Theonlychangenecessaryistoadd“Dept.name”totheselectclause.Onlyaslightmodi-ficationoftheearliergmapdefinitionisrequired(Fig.4):def_gmapfaculty_field_replasheapbygivenFaculty

selectFaculty.name,Faculty.area,Dept,Student,Course,Dept.namewhereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCourseSimilarly,supposeapplicationsfrequentlyaskforthelistingofthefacultyofaspecificdepartment.Inthiscaseweneedafastaccesspathfromthedepartmentnameto

104Dnamenameworks-inFareaadvisesSteachesCTA Fig.4.ReplicationofanestedattributeDworks-innameFSCTA Fig.5.Anestedindexthefacultydomain.Astructureforacceleratingsuchpathexpressionsiscalledanestedindex(MaierandStein1986;BertinoandKim19),whichallowsindexingonanestedattributeofadomain.Suchanindexiseasilyspecifiedasagmap(Fig.5):

def_gmapfaculty_nested_indexasbtreebygivenDept.nameselectFacultywhereFacultyworks_inDept.Inthepreviousexamples,thegmapdataincludedallFacultyinstances.However,therearecaseswherewefrequentlyaccessonlysomeinstancesofadomain.Object-orientedsystemsthatstoreinstancesinexplicitcollectionsratherthanclassextents(Careyetal.1988;MaierandStein1986;Orensteinetal.1992)allowthecreationofcollectionindices,whichprovidefastaccesspathsonlytothesubsetsofthedomainsthatareincludedinthecollection.Ourgmapdefinitionlanguageispowerfulenoughtoexpresssuchin-dicesbyusingrestrictions.Forexample,ifwewouldliketomodifythepreviouslydefinedindexonfacultyareasothatitkeepsonlydataforfacultyinthecomputersciencedepartment,thedefinitionwouldbeasfollows(Fig.6):def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultywhereFacultyworks_inDeptand

Dept=cs_oidwherecsoidistheobjectidentifierofthecomputersci-encedepartment.

Fieldreplicationandpathindexingtechniquestypicallyimposerestrictionsrelatedtothelogicalschema.Forexam-ple,infieldreplication,thenestedattributeisrequiredtobeanestedpartoftheobjectinwhichitisreplicated.Fur-thermore,theobjectmustfunctionallydeterminethenested

D = csFworks-inareaSCTAFig.6.Acollectionbasedindex

Dnameworks-inFnameareaadvisesSteachesCTA Fig.7.Replicationofanon-functionalnestedattribute

attribute.Similarrestrictionsholdforpathindexing.Ourno-tionofqueriesenablesonetodescribeanyschemasubgraph,independentlyoftheedgeproperties.Theextrafreedomal-lowsspecificationofseveralusefulnovelstructures:–Astoragestructurethatreplicatesnestedattributesthatarenon-functionallydetermined.Forexample,wecanextendthefacultydatastructure(Fig.2)byrepli-catingthenamesoftheadvisedstudents(Fig.7):def_gmapfaculty_field_with_snamesasheapbygivenFaculty

selectFaculty.name,Faculty.area,Dept,Student,Course,Student.namewhereFacultyworks_inDeptand

FacultyadvisesStudentandFacultyteachesCourse–Acrossbetweenapathindexandanindexonacom-positekey.Itallowseachcomponentofthekeytobesuppliedbyaseparatepath.Forexample,thefollowinggmapbuildsanindexthatmapsarea/course-levelpairstofacultyinthatareathatteachessuchcourses(Fig.8):def_gmapfaculty_multi_fieldasbtreebygivenFaculty.area,Course.levelselectFaculty

whereFacultyteachesCourse.–Anarbitrarydecompositionofdataintheinheritancehierarchy.Forexample,thefollowinggmapclustersateachingassistant’snametogetherwithotherteachingassistantattributesthatdonotpertaintoarbitrarystu-dents(Fig.9):

def_gmapta_with_snameasheapbygivenTA

DFareaSteachesClevelTA Fig.8.AcompositekeypathindexDFSnameIsAClevelassistsTA Fig.9.AverticallydecomposedsubclassextentselectTA.level,Course,Student,Student.namewhereTAisaStudentand

TAassistsCourse

Acompletetaxonomyofexistingindexingschemesandotheradvancedstoragestructuresthatcanbedefinedbyus-inggmapsispresentedelsewhere(TsatalosandIoannidis1994).

3Querytranslation

Inthissectionwepresentthetranslationalgorithmofalog-icalqueryintoqueriesovergmaps.Wefirstintroducesomeadditionalnotationanddefinitions,andalsodiscusstwoaux-iliaryproblemsthatariseaspartofquerytranslation.3.1Notation

Forconvenience,πQσn···onweRrepresentalogicalpsj-queryQ=

k)asatriple󰀑QQisQ(Rthe1seto

r,Qs,Qp󰀒ofsets.isrofjoinedbinaryrelations{Rtheselectionpredicateσa1,...,Rsetofsimplerk},Qspredicates(comparisonsbetweenQrepresentedasattributesandconstants),andQpisthesetofattributesretainedbytheprojectionπWecallthesetQ,anditsmemberstar-Q.getattributes.ConsiderpthequerytargetforexamplethefollowingqueryQ(Fig.6):

givenFaculty.areaselectFacultywhereFacultyworks_inDeptand

Dept=cs_oidQisapsj-query:ifwecallRabandRbcthebaserela-tionscorrespondingtotheFaculty.areaandworksin

105

relationships,respectively,anda,b,ctheattributescorre-spondingtothedomainsFaculty.area,FacultyandDept,respectively,thenwecanrewritethequeryasQ=

πresentedabσc=csasoid(Rtheabtriple:no

Rbc).Usingournotation,Qcanberep-󰀑{Rab,Rbc},{c=csoid},{a,b}󰀒.(RecallthatFaculty.areaisoverloadedtoreferbothtoanattributeandtoarelationship.)

GivenaqueryQ,wefrequentlydealwithitspartthatincludesonlyrelationsinasetR,anditspartthatincludesonlyrelationsnotinR.ThesearedenotedQ[rel∈R]andQ[rel∈R]respectively.Forexample,Q[rel∈{Rb},{c=csoid},{Rσbc}]=󰀑{subsetofQbc}󰀒=πbc=csoidRbc.Similarly,thesthatmentionsonlyattributesinasetAisdenotedQs[attr∈A].Finally,thesetofattributesinasetofrelationsRisdenotedA(R).3.2Definitions

Thenaturaljoinoftwopsj-queriesPandQ,denotedPonQ,isthenaturaljoinoftheirresultrelations.Theadd-joinoftwopsj-queriesPandQ,denotedP⊕Q,isthepsj-queryP⊕Q=󰀑Pr∪Qr,Ps∪Qs,Pp∪Qp󰀒.Theadd-joindiffersfromthenaturaljoininthatitnaturallyjoinsthetwoqueryresultsbeforeanyprojectionhastakenplace.Theprojectionstepisappliedafterthejoinandretainsallcolumnsthatwouldhavebeenretainedbyeitherquery.Thus,thesamesetofcolumnsasinthenaturaljoinareproduced.

Notethat,ingeneral,ifweonlyhavethequeryresultsinhand,theadd-joincannotbecalculatedsinceitmayrequireknowledgeofsomeofthemissingcolumns.Considerfor

examplethequeriesQThenaturaljoinofQ1=RabandQ2=πac(Rabno

Rbc).whiletheiradd-joinis1andQsimplyR2isexpressionswillproducedifferentabno

RabresultsRonπac(RabonRbc),

bc.Ingeneralthetwoandtheadd-joinmaynotevenbeproduciblefromtheresultsofQ(sincethebattributeismissingfromtheresult1andQaloneofQ2LetR,Rrelationswithacommonattribute2).b.AninclusionabbcbetwodependencyfromRRabtoRbcexists,denotedRab.b⊆bc.b,ifeveryvalueofbinRabappearsalsoinRbc.

3.3Queryequivalence

Whentranslatingalogicalqueryintoaqueryovergmaps,weoftenneedtotesttheequivalenceofpsj-queries.Twopsj-queriesQ1andQ2areequivalent,denotedQ1≡Qproducethesameresultforanyvalidinstanceof2,iftheythedatabaseschema.Equivalencetestingofarbitraryconjunc-tivequeries,evenwithouttakingintoaccountanydependen-cies,isNP-complete(Ahoetal.1979;ChandraandMerlin1977).Ontheotherhand,wecanefficientlycomparetwopsj-queriessyntacticallytoseeiftheyareidentical(uptotrivialdifferencessuchastheorderingofthejointerms).Thisisasufficientconditionforequivalence,whichweuseinourtranslationalgorithm.Wearealsointerestedintwospecialcasesofequivalencetesting,wherepsj-queriesofspecificformsareinvolvedandvarioustypesofdependen-ciesaretakenintoaccount.Thesearediscussedinthenexttwosubsections,wheresufficientconditionsforequivalenceareprovided.

106

3.3.1Coverage

AqueryQcoversasetofrelationsRifQ[rel∈R]≡πA(R)(Q)

(1)

i.e.,whenthesubqueryinvolvingonlytherelationsinRisnotaffectedbyRab≡πab(RIngeneral,theableft-handno

theRrestofthequery.sideof(1)nisRForexample,if

bc),thenRaboabccovers{Rsupersetofab}.theright-handside.When(1)holds,thepartofthequerythatinvolvesrelationsnotinR(relationRbcinourexample)hasnoeffectonπA(R)(Q),inthesensethatitdoesnotfilteroutanytuplesproducedbytherestofthequery.Anal-gorithmthatimplementsnecessaryandsufficientconditionsfortestingcoverageispresentedelsewhere(Tsatalos1994).Thealgorithmmakesuseoftheinclusiondependenciesoftheschema.

3.3.2Naturaljoinversusadd-join

Ingeneral,ifQ.However,1andQinthe2aretwopsj-queries,Q1⊕Q2⊆Qstraints,1onQ2Q1⊕QqueriesQ=R2≡Q1no

presenceQofcertainintegritycon-2.Forexample,considerthe1abandQ2=πac(RabonRbc).Ingeneral,QQ1onQ2=Rab1⊕Q2=RnonRπac(RabonRbc)isnotequivalentto

bc.If,however,bisfunctionallyde-terminedbyaabino

Rab(thatis,aisakeyforRab)thetwojoinsareequal.Intuitively,theinformation“lost”bypro-jectingawaythebattributeinπac(RabonRbc)onQcanbecompletelyrecoveredfromtheremainingattributes(ainthiscase).

Detectingwhenthenaturaljoinoftwopsj-queriesisequivalenttoapsj-queryisveryimportantinourquerytrans-lationalgorithm,sinceitallowsustorewritethejoinoftwogmaps(whicharepsj-queries)asapsj-query.Thealgorithmiterativelyperformssuchrewritingsinordertoexpressthejoinofseveralgmapsasapsj-query,whichisthencheckedsyntacticallyforequivalencewiththeuserquery.Asasuf-ficientconditionforguaranteeingthatthenaturaljoinoftwoqueriesisapsj-query,wetestifitisequivalenttotheadd-joinofthequeriesinquestion.Analgorithmthatim-plementsasufficientconditionfortestingthisequivalenceispresentedelsewhere(Tsatalos1994).3.4Querytranslationalgorithm

Belowwepresentanalgorithmtotranslatealogicalpsj-queryintoqueriesovergmaps.ForthesakeofclaritywedefermostefficiencyconsiderationstoSect.4.

Algorithm1.Givenapsj-queryQandasetofpsj-queriesG,findsubsets{G1,...Q≡πQpσQs(πA(Qr)G1no

,G·n·}·on⊆πG,suchthat

A(Qr)Gn)1.letH={G∈GsuchthatGGp∩A(Qr∩Gr)=/∅and

2.Qs[attr∈A(Gr)]=s[attr∈A(Qr)]∪Qs[attr∈G3.p]and

GcoversQr}

4.foreachsubset{G1,...,Gn}ofHdo

5.letS={πA(Qr)σQsG1,...,πA(Qr)σQsGn}6.whilethereisG,H∈SsuchthatGonH≡G⊕H7.replaceGandHinSby8.

ifS={Q′

GonH

}whereQ≡πQp(Q′)thenaccept{G}asasolution

1,...,GnThealgorithmfirstnarrowsdownitssearchspacetogmapsthathavesomethingtodowiththequery(lines1–3).Morespecifically,agmapmusthaveatleastonerelationincommonwiththequery,withatleastoneattributeofthatrelationincludedinthegmapresult(line1);thequeryselectionsonattributesofthegmaprelationsmustbeeitheronthetargetattributesofthegmap(sothattheycanbeappliedonthem)oridenticaltoselectionsthatthegmapitselfhas(line2);thegmapmustcovertherelationsthatithasincommonwiththequery(otherwise,thegmapwillnothavealltheinformationneededbythequery;line3).

Eachpossiblesubsetoftherelevantgmaps(line4)givesrisetoasinglecandidatetranslation(assumingthatselec-tionsarealwayspushedthroughthejoins):πQp(πA(Qr)σQsG1no

···onπA(Qr)σQsGn)(2)

Therestofthealgorithmtestswhetherornotthisquery

expressionisequivalenttothegivenlogicalquery.Eachjoinoperandin(2)isaprojectionandaselectiononagmap.Sinceweverifiedearlier(line2)thatthequeryselectionscanbepushedthroughthegmapprojections,thejoinoperandsarepsj-queries.Thealgorithmtriestoexpresstheirjoinasapsj-queryaswell.Thejoinoperandsarescanned(line5)andanypairwhosenaturaljoinisequivalenttoitsadd-join(line6)isreplacedbyasinglejoinoperandwhichisagainapsj-query(line7).Thesetofjoinoperandsthuskeepsreducing.Atsomepoint,thesetcannolongerbereduced,eitherbecausethereisjustonepsj-queryleftorbecausethereisnopairthatsatisfiestheequivalencetest(line6).Intheformercase,theremainingpsj-queryisequivalenttotheinitialjoinexpression(2)afterperformingonefinalprojectionstep(πQp)andcanbesyntacticallycheckedforequivalence(line8)withthelogicalquery.Inthelattercase,thesubsetchoseninline4isrejected.

Proposition1.Givenapsj-queryQandasetofpsj-queriesG,eachsubset{G1,...,Gn}⊆GgeneratedbyAlgorithm1,satisfiesthecondition:

Q≡πQpσQs(πA(Qr)G1no

···onπA(Qr)Gn)BecausethereareexponentiallymanysubsetsofH,Al-gorithm1requiresexponentialtime.However,checkingif

agivensubsetofgmapscanformasolution(lines5–8)takespolynomialtime.Inthenextsection,weshowhowwecanrunthealgorithminconjunctionwithaconventionaloptimizertoavoidenumeratingallsubsets.

Anexamplemayhelpillustratethealgorithm.ConsideraqueryQthatasksforthenamesanddepartmentid’sofstudentsattendingall500-levelcourses:

def_queryQbyselectStudent.name,DeptwhereStudentattendsCourse

andStudentenrolledDeptandCourse.level=500

Inourformalrepresentation,

Qr={attends,enrolled,Course.level},Qs={Course.level=500}and

Qp={Student.name,Dept}.

Supposethedatabaseconsistsofthreegmaps:anindexGG1fromthenamesofstudentstotheirdepartments,anindex2fromthenamesofstudentstocoursesthattheyattendandthelevelsofthosecourses,andanindexGleveltocoursesatthatlevel,togetherwith3fromacourse-thedepartmentsthatsupplystudentstothosecourses.Theirdefinitionsfollow:def_gmapG1asbtreeby

givenStudent.nameselectDeptwhereStudentenrolledDeptdef_gmapG2asbtreeby

givenStudent.nameselectCourse,Course.level

whereStudentattendsCoursedef_gmapG3asbtreeby

givenCourse.levelselectDept,Course

whereStudentattendsCourseandStudentenrolledDept

Allthreegmapsarerelevanttothequery(theypassthetestsoflines1–3).Consider,forexample,G=Gasatriple,G=󰀑{Student.name,Course,Course.2.Writtenlevel},∅,{Student.name,attends}󰀒.Line1issat-isfiedbytheattributeCoursewhichappearsbothinGandintherelationshipattendssharedbyGQpline2,notethatGattr∈Grandr.ForCourse.levels=∅,andQ=500s[}.Tocheckp]=Qthats[attr∈A(Gr)]={Gcov-ersQr(line3),notethatG[Qr]isGmodifiedbythere-movalofStudent.name(bothasanattributeinGarelationshipinGpandaname,G[Qr).IfweassumethateveryStudenthasr]isequivalenttoπA(Qr)(G),whichistheresultofprojectingouttheStudent.nameattributefromG.

ThealgorithmconsiderssubsetsoftherelevantgmapsG1,G2,G3.Consider,forexample,thesubset{Gcandidatesolutioncorrespondingtothiscombination1,G2}.TheisthenaturaljoinofG1andG=500followed2followedbyaselec-tionforCourse.levelbyaprojection.Theloopoflines6and7willbeexecutedoncewhetherGStudent.name1no

Gtocheck

2≡Gfunctionally1⊕G2.ThistestwillfailunlessdeterminesStudent;oth-erwise,twotuplesthatjoinontheStudent.nameneednotjoinontheirStudentidaswell.IfStudent.namefunctionallydeterminesStudent,thenthejoinontheStudentidisirrelevant:wecanprojectoutthatattributebeforeperformingthejoin,whichimpliesthattheadd-joinisequivalenttothenaturaljoin(line6).Assumingthatthisisthecase,line8ofAlgorithm1eventuallyconcludesthatthecandidatesolutionisacorrectone.

Followingthesameprocess,thealgorithmrejectsthesubsets{G1,G3}and{G2,Gadd-joinandnaturaljoinaredifferent.3},becausethecorrespondingHowever,thecombi-nation{G1,G2,G3}isacorrectsolution.Duringthecourse

107

oftheloopoflines6and7,thealgorithmtestsallpairsofgmapsinthissubsettocheckwhetherornottheiradd-joinisequivalenttotheirnaturaljoin.Aswesaw,allthepairsfail,exceptforG(G1⊕G2.Inthenextiteration,thepair(G1⊕G2,G3)isconsideredandthealgorithmverifiesthat

1⊕G2)no

GInterestingly,3≡(Gthesolution1⊕G2)⊕Gusing3.allthreegmapsislikelytobemoreefficientthantheonethatusesonlyGbecausetheindexonCourse.levelinGaccelerate1andG2,theselectioninthequery.Thenextsection3willshowshowagmap-equippedoptimizeridentifiesandprunestheinferiorplan.

4Integrationwithaqueryoptimizer

ThepresentationofAlgorithm1emphasizesclarityattheexpenseofefficiency.Itimpliesthatallsubsetsofthegmapsareenumeratedinrandomorderandeachistestedtoseeifitprovidesasolutiontotheequation.Allsubsetsthatpassthetestarefeasibleplans.Theversionofthealgorithmthatisactuallyimplementedbyoursystemisconsiderablymoresophisticated.Itisintegratedwithaconventionaldynamic-programmingqueryoptimizer(Selingeretal.1979),whichcontrolstheorderinwhichsubsetsareevaluatedandusescostinformationandintermediateresultstoprunethesearchspace.

Aconventionaldynamic-programmingoptimizeritera-tivelyfindsoptimalaccessplansforincreasinglylargerpartsofaquery.Wefollowthesestepsinmoredetail,showingateachstepwhatneedstobechangedforagmap-equippeddatabase(Table1).WethenidentifythepiecesofAlgo-rithm1thatcorrespondtothesechanges.Inwhatfollows,forsimplicity,weavoidanydiscussionof“interestingor-ders”(Selingeretal.1979).Wealsousethetermcompletesolutiontorefertoagmapaccessplan(i.e.,aspecificse-quenceofjoins,togetherwiththemethodusedforeachjoin)thatisequivalenttothelogicalquery,andpartialsolutionforagmapaccessplanthatcouldpotentiallybeenhancedtobecomeacompletesolution.Apartialsolutiondoesnotnec-essarilyhavetobeapsj-query;itmaybethatnoreorderingofitsjoinsmakesthemequivalenttoadd-joins.

Likeaconventionaloptimizer,thegmapoptimizeronlyattemptstojoinapartialsolutionwithgmapsthatsharepro-jectedattributeswithit,thusavoidingCartesianproducts.EachstepinthegmapoptimizercorrespondstoapartofAlgorithm1.Step(a1)ofthefirstiterationcorrespondstolines1–3ofAlgorithm1;itfindsallgmapsthatarerelevanttothequery.Theremainingstepsofalliterationsrepresenttherestofthealgorithm.Movingfromiterationtoiteration,step(c)ofeachiterationcorrespondstoaspecificimple-mentationofline4,whereallsubsetsofrelevantgmapsthatarenotprunedonthewayareexploredinincreasingsize.Afterthefirstiteration,step(a1)formsthejoinsofthesesubsets(solutions)andstep(a2)correspondstolines6–8,wherethesesolutionsareexaminedforcompleteness.Step(a2)canbeimplementedincrementallytakingintoaccounttheresultsofearlieriterationsonsmallerpartialsolutions.Notethatstep(b)ofeachiterationhasnocounterpartinAlgorithm1becauseitdealsonlywithpruningthesearchspaceandnotwithtranslation.Implementingthisstepis

108

Table1.Stepbystepcomparisonofaconventionaloptimizerversusonedesignedforagmap-equippeddatabaseConventionaloptimizerIteration1

Foreachqueryrelation:

a)Findallpossibleaccesspaths.

b)Comparetheircostandkeeptheleastexpensive.

c)Ifthequeryinvolvesonlyonerelation,stop.c)Iftherearenopartialsolutions,stop.Iteration2

Foreachqueryjoin:

a)Considerjoiningtherelevantaccesspathsfoundinthepreviousiterationusingallpossiblejoinmethods.

b)Comparethecostoftheresultingjoinplansandkeeptheleastexpensive.

c)Ifthequeryinvolvesonlytworelations,stop.Iteration3···

Iteration2

a1)Considerjoiningallpartialsolutionsfoundinthepreviousiterationwithanothergmapusingallpossiblejoinmethods.

a2)Distinguishpartialandcompletesolutionsamongresultingjoins.b)Compareeachnewlygeneratedsolutiontoallothersolutions.Ifanysinglegmaporgmapcombinationhasneitheragreatercontributiontothequerynoralowercostthananother,pruneit.c)Iftherearenopartialsolutions,stop.Iteration3···

gmapoptimizerIteration1

a1)Findallgmapsthatarerelevanttothequery.

a2)Distinguishbetweenpartialandcompletesolutionsamongthem.b)Compareallpairsofgmaps.Ifonehasneitheragreatercontributiontothequerynoralowercostthantheother,pruneit.

notstraightforwardbecauseitinvolvesnotonlythecostbutalsothecontributionofsolutionstothequery.Contributionsofpartialsolutionscanbecomparedonthebasisoftheirpiecesthatcorrespondtopsj-queriesandthesetofattributesintheirresult.Wheneachpieceofapartialsolutionhassubsetsoftherelationsandprojectedattributesofapieceofanotherpartialsolution,thentheformercontributeslessandcan,therefore,beremovedfromfurtherconsiderationifitalsohasahighercost.Querysignatures,anencodingofthenamesofalltherelationsusedbythequery,canbeusedtoperformthesecomparisonsefficiently(Finkelstein1982).Itisinterestingtoseehowthenewalgorithmbehaveswhenitisgivenasetofgmapsthatrepresentsatraditionalrelationalphysicalschema.Assume,forexample,thatonegmapisafilecontainingtheextentoftheFacultyrelationwithallassociatedattributes,

def_gmapfaculty_relationasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept

whereFacultyworks_inDeptwhileanothergmapisasecondaryindexontheFaculty.areafield,

def_gmapfaculty_index_on_areaasbtreebygivenFaculty.areaselectFacultyAssumethatthelogicalqueryrequeststhenamesofallfacultyinthedatabasearea.Duringthefirstiterationbothgmapsareconsidered.Scanningtherelationextentwouldbefarmoreexpensivethanaccessingtheindex,butthetwosolutionsarenotcomparable.Sincetheindexsimplyre-turnsFacultyids,itisnotadequatetoanswerthequery,whiletheextentis.Duringtheseconditeration,theindex(theonlypartialsolutionleft)isconsideredforajoinwiththeFacultyextent.ThejoinwouldbelessexpensivethanscanningtheFacultyextent,whilebothplansareequiv-alenttothelogicalquery.Thus,thesolutionfoundduringthefirstiterationiseliminatedinthesecond.Atthispoint,

thereisnopartialsolutionleftandthealgorithmends.Thisexampledemonstratesthataccessplansthatareprunedinaconventionaloptimizerarealsoprunedinitsenhancedver-sion.However,sinceanaccessplanconsideredatiterationnintheconventionalversionmaycombinemorethanngmaps,itmaybeconsideredatalateriterationintheen-hancedversion,thusdelayingpotentialprunings.Ingeneral,weexpecttheperformanceofthemodifiedoptimizertobesimilartotheperformanceoftheconventionalone.Ourex-perienceobtainedbyusingtheoptimizerfortheexamplesshowninSect.8supportstheprediction.5Updatepropagation

Relationalsystemsmitigatedependenciesbetweenthelogi-calandthephysicalschemathroughtheuseofstoredqueriesdefining“virtual”relationscalledviews,andusersexpresstheirqueriesintermsoftheseviews.Withthisapproach,thelogicalschemabecomesa(relational)functionofthephys-icalschema.Viewupdates,however,aredifficultorimpos-sibletosupport.Theusualsolutionistorequireupdatestobeexpressedintermsoftheunderlyingschema.

Ourapproachistheinverseoftheabove.Wedefinethephysicalstructuresasfunctionsofthelogicalschema.Althoughquerytranslationbecomesmorecomplicated,wehaveshownabovethatitisstillfeasibleandcanbeinte-gratedwiththeoptimizationstageofaconventionalsystem,addinglittleoverheadtothepreparationofqueryplansandnooverheadtotheexecutionofthoseplans.Inaddition,up-datesbecomemuchsimpler.Translatingthemintothephys-icalschematurnsintothematerializedviewmaintenanceproblem,whichadmitssimplesolutions.

Asdiscussedelsewhere(Blakeleyetal.19),propagat-ingupdatesintomaterializedviewsrequirestheexecutionofqueriesoverthebaserelationsandtheinsertedordeletedtuples.However,herewedonotnecessarilyhavethebaserelationsstored,andtheactualdatamaybereplicatedinmanyplaces.

Inthis,sectionwefirstdescribehowalogicalupdatecanbespecified,andtherestrictionsthatitmustobey.Thenweshowhowthelogicalupdatecanbetranslatedintoaphysicalupdateovergmaps,andhowthistranslationcanbeintegratedwiththequeryoptimizertoofferageneralpur-poseupdatepropagationmechanism.Finally,weillustratetheupdatepropagationprocessusinganexample.5.1Specifyingupdates

Insertionsarespecifiedbysupplyingaquery(theupdatequery)andasetoftuplestobeinserted(theupdatedata),correspondingtothetargetattributesofthequery.Thedatabasemustbeupdatedinsuchawaythatthechangeintheresultsofthequerybetweentheoriginalandupdateddatabaseispreciselythesetoftuplesintheupdatedata.Deletionsaredefinedsimilarly,withtherolesof“original”and“updated”databasereversed.NotethedifferencefromthequeryusedwhenspecifyingupdatesinSQL-likelan-guages,inwhichthequeryisusedtogeneratetheupdatetuples.Thequeryheredescribesonlythe“schema”ofthetuples.Sincetheupdatedatacanbetheresultofanotherquery,nogeneralityislost.

Forexample,studentscanbecomeenrolledincoursesbysupplyingasetof(StudentId,CourseId)pairsandtheupdatequery

def_queryenroll_studentasselectStudent,Course

whereStudentattendsCourse

Describingtheschemaoftheupdateusingaquery,al-lowsustosupportupdatesthatdonotdependonaspecificphysicalschema.Forexample,relationaldatabasesallowtheinsertionofentityinstancesbecausetheyassumetheexis-tenceofanentityextent.Object-orientedsystemsalsohavesimilarrestrictions.Inourcase,bydescribingtheupdateasalogicalqueryweavoidanyphysicaldatadependence.How-ever,itisisnotourintentiontosupportupdatesdefinedbyarbitraryqueries,sincethatcouldleadintotheviewupdateproblem.Weonlywanttousethequerytodefineapatternonthelogicalschemawhichrepresentsthe“schema”oftheupdate.Thusweneedtoimposethefollowingrestrictions:Noselections.Ifselectionswereallowed,theycouldonlysupplyinformationthatwouldberedundanttoorin-consistentwiththeupdatedata.Forexample,theupdatequerydef_querydefine_undergraduate_coursesasselectCourse,Course.name,Course.level

whereCourse.level<700merelystatesthatalltuplesintheupdatedatahavealevelfieldcontainingavaluelessthan700.

Noprojections.Allattributesofthequeryrelationsmustbeincludedinitstarget.Forexample,aninsertionrequestdefinedbythequery

def_queryadd_graduate_studentasselectStudent.year,CoursewhereStudentattendsCourse

109

andtheupdatedata{[5,cs7]}requeststhataftertheupdate,thereshouldbeanadditional5th-yearstudenttakingthecoursewiththeindicatedid,butgivesinsuf-ficientinformationtoconstructsuchastudent.Suchanupdatehasambiguoussemanticsandthusshouldnotbeallowed.

Notuple-generatingdependencies.Therelationdefinedby

theupdatequeryshouldnotsatisfyanytuple-generatingdependenciesUllman1988).Forexample,therelationdefinedbytheupdatequery

def_queryassign_work_to_facultyasselectFaculty,Student,CoursewhereFacultyadvisesStudent

FacultyteachesCoursesatisfiesthemultivalueddependencyFaculty→→Student.IfthefacultymemberFwasteachingcourseC1beforetheupdate,thetuple[F,S,CS,C2]cannotbeaddedwithoutalsoadding[F,includedonlythefirstofthesetuples,1].Iftheupdatedatatherewouldbenowaytoperformtheupdateaccordingtothesemanticsdefinedabove.

Evenwiththeserestrictions,wecannotcompletelyguar-anteethevalidityofanupdate.Forexample,ifwehavenotdefinedanygmaptostoreFacultydata,aninsertionofaFacultytuplewouldbeinvalid.IfweuseasinglegmaptostoretheFacultyandtheDeptdata,thenwewillnotbeabletoinsertafacultymemberwhodoesnotworkinanydepartment.Althoughthesemanticsoftheupdatequerydependsonlyonthelogicalschema,itsvaliditymaydependonthechoiceofgmapsusedtodefinethephysicalschema.Inparticular,thephysicalschemamusthavesufficient“in-formationcapacity”toholdtheinserteddata(Milleretal.1993).Theseissuesarenotaddressedinthispaperandareasubjectoftheauthors’ongoingresearch.Furthermore,wedonotaddressthequestionofwhethertheupdateconformswiththeintegrityconstraintsofthedatabase.Specificgmapstructuresorgmapcombinationsmayhelpacceleratetestingtheintegrityconstraints.However,here,weassumethatanexternalmechanismverifiesthattheupdatesdonotconflictwithexistingfunctionalorinclusiondependencies.5.2Updatetranslation

ConsideranupdatequeryUandagmapqueryGthathave

somerelationsincommon,i.e.,U/∅.Ourgoaliscalculatetheeffectoftheupdater∩Gonther=

todatastoredinG.Giventherestrictionsthatweimposedontheupdatequery,notablythefactthattheupdatequeryUmaynotcontainanyprojectionsorselections,wecaneasilyfactorGasG=πGpσGs(Uno

I)(3)

whereIisthepartofthegmapthatisirrelevanttothe

update:

I=󰀑Gr−Ur,Gs[A(Gr−Ur)],A(Gr−Ur)∩(Gp∪A(Ur))󰀒

Whilecomplicated,thisformulaisessentiallythesameasG[rel∈Ur];theonlydifferenceisthattheprojectionstep

110

ofIretainsadditionalattributesinthetarget,tomakesurethatallneededcolumnsjoinwithU.

Equation(3)allowsustocalculatethetuplesneededtobeinsertedordeletedfromthegmapGusingonlythedatainthedatabasebeforetheupdateandtheupdateitself.IfweviewGandUasthemultisetsoftuplesthatareproducedbythequeriesGandU(assumethatmultisetprojection,selectionandjoinoperatorsareusedinthequeries),respec-tively,andweusethesubscriptsoldandnewtorefertosetsproducedbeforeandaftertheupdate,thenwecanrewriteEq.(3)as

Gnew=πGpσGs(Unewno

Inew)=πGpσGs((Uold+∆U)on(Iold+∆I))

where“+”representsunionofmultisetsiftheupdateisaninsertion,andmultisetdifferenceiftheupdateisadeletion.SinceIisirrelevanttotheupdate,∆I=∅.Furthermore,multisetprojection,selectionandjoin,distributesovermul-tisetunionanddifferenceyieldingthefollowing:Gnew=πGpσGs((Uold+∆U)onIold)

=Gold+πGpσGs(∆UonIold)

Finally,sinceIisapsj-query,itcanbetranslatedusingAlgorithm1intoaqueryexpressionovergmaps.Thus,thechange∆Gofthegmapcontentscanbeproducedbythequery

∆G=πGpσGs

(∆UonπIpσIs(πA(Ir)G1on···onπA(Ir)Gm)),

(4)

whichonlyusesexistinggmapsandtheupdatedata.

Equation(4)canbeusedfordeletionsaswellasinser-tionsifthegmapswhosetargetdoesnotfunctionallydeter-minetheirotherattributesmaintaintheirdataasmultisets(thatis,theyrecordduplicatesormultiplicities).5.3Updatepropagationalgorithm

Belowwepresenttheupdatepropagationalgorithmwhichusesthetranslationmechanismdescribedabove.Forthesakeofclaritywedefermostefficiencyconsiderationstotheendofthissection.

Algorithm2.GivenalogicalupdatespecifiedbytheupdatequeryUandasetofgmapsG,findtheupdatethatneedstobeperformedoneachofthegmapsinG.

1.foreachG∈Gdo

2.letI=󰀑Gr−Ur,Gs[A(Gr−Ur)],A(G))󰀒

r−Ur)∩(G∪A(Up

3.ifI≡Grthencontinue//noupdateisneeded4.translateIintoaqueryQovergmapsinGusing

Algorithm1

5.executethequeryπnQ)6.insert/deletealltuplesGpσgeneratedGs(Uo

bythequeryinto/from

GForeachgmapG,thealgorithmfirstidentifiesthepartIthatisirrelevanttotheupdate(line2).Ifthewholegmapisirrel-evant,noupdatepropagationisneeded(line3).Otherwise,

weuseAlgorithm1totranslateIintoaqueryexpression

overgmaps(line4).Theupdatedataarejoinedwiththeresultofthisquery,producingamultisetoftuples(line5).Iftheupdateisaninsertion,thetuplesareinsertedintoG;otherwisetheyaredeletedfromG.

Thereareafewperformanceissuesthatweneedtoad-dress.First,thetranslationofIisperformedthroughtheenhancedqueryoptimizer.Theoptimizerguaranteesthattheproducedplanwillbeoptimal:anyavailablegmapthatcanacceleratequeriescanalsobeusedtoacceleratetheper-formanceofupdates.Second,inmanyapplications,thereisafixedsetoflogicalupdatesthatareusedregularly(suchascoursesbeingcreatedorstudentsenrolling).Thesecom-monupdatequeriescanbetranslatedinadvanceandstoredasqueryplans.Theyonlyneedtoberecompiledwhentheupdatequeryorthephysicalschemachanges.

Finally,thealgorithmpresentedaboverequiresthattheupdategmapholdstheleftmostpositioninthejointree,whichmaynotbetheoptimaljoinorder.Forexample,ifweareinsertingalargenumberoftuples,itwouldnotbeadvisabletoloadtheupdategmapfirst.Since,however,wehavemodeledtheupdateasagmap,wecanletthequeryoptimizerselecttheappropriatejoinorderforthatgmapaswell.Forthis,thequeryoptimizerneedstobeextendedwiththenotionofarequiredgmapwhichshouldbeusedinanyplanproducedbytheoptimizer.Then,fortheupdatetranslation,wecanflagtheupdategmapasrequiredanduseittogetherwiththeremainingdatabasegmapstocompilethegmapqueryG.5.4Updateexample

Weillustratethealgorithmabovewithanexample.Considertheupdatequeryenrollstudentpresentedearlier:

def_queryenroll_studentbyselectStudent,Course

whereStudentattendsCourse

Assumethatourdatabaseconsistsoftwogmaps,onethatmapsfacultymemberstocoursesthattheyteach,

def_gmapFCasbtreeby

givenFacultyselectCoursewhereFacultyteachesCourse

andonethatrecordsthestudentsandteacherofeachcourse,def_gmapCFSasbtreeby

givenCourseselectFaculty,StudentwhereFacultyteachesCourseand

StudentattendsCourseTopropagatetheupdatetothedatabase,weconsidereachdatabasegmapseparately.ThegmapFCisnotaffectedbytheupdatesinceithasnocommonrelationswiththeupdatequery.TheupdatestogmapCFSdependbothontheupdatedataandontheexistingcontentsofFC.Weneedtoappendtoeach(CourseId,StudentId)pairintheupdatedatathefacultymemberwhoteachesthecoursebeforeitisaddedtoCFS.Thealgorithmconstructsthetuplestobeinsertedbyconsideringthepartofthegmapthatisnotaffectedbytheupdate,i.e.,thequery

def_queryQbyselectCourse,

FacultywhereFacultyteachesCourseandfindingatranslationforit.QueryQcandefinitelybeansweredbyusinggmapFC,andthusthetuplestobeinsertedintoCFSarefoundbyjoiningtheupdategmapenrollstudentandthegmapFC.gmapCFScannotbeusedasthesourceoftheneededinformationbecauseCFSwillnotcontain(CourseId,FacultyId)pairsforcoursesthatdonotyethaveanystudents.Line3ofAlgorithm1testswhetherCFScoverstherelationteaches.Aslongasthereisnoinclusiondependencyfromteachestoattends,theCFSgmapwillberejected.6Applications

InSect.2,wedemonstratedhowgmapssubsumethefacili-tiesofprimaryandsecondarystoragestructuresinconven-tionaldatabasesystems.Inthissection,weoutlineavarietyofotherapplicationsthatusetheintegratedquerytranslation-optimizationengineforqueriesandupdates.6.1Databaseloading

Existingdatabasesoftenprovidespecialfeaturestosupportbulkloadingofdata.Thesefacilitiestendtobeadhocandimposerestrictionsontheformatoftheimporteddata.Theusermustmanuallytranslateallimporteddataintofilesthatmatchtheprimarystoragestructuresandthenloadeachoneindividually.

Iftheimporteddatacanbedescribedasapsj-queryonthelogicalschema,initialloadingcanbeviewedasaspe-cialcaseofphysicalschemaevolution.Theimporteddatafilescanbeviewedasinefficientgmapstructures.The“real”gmapsusedtostorethedatapermanentlyareloadedbyrunningtheirqueriesagainsttheimportedfiles.Forexam-ple,toruntheexperimentdescribedinSect.8,wepopu-latedourdepartmentdatabasewithdatafromavarietyofsources.CourseenrollmentswereretrievedfromanIMSdatabasemaintainedbytheregistrar.FacultyadvisingdatawerefoundinanIngresdatabase.TAassignmentswerekeptinahand-maintainedASCIIfile,andsoon.AllthesedataweredumpedintoUnixfilesinasimpletextualformat.WethenwroteasetofinterfacefunctionsthatallowedustoviewtheseUnixfilesasanewgmapstructure.Thefunc-tionsallowediterationthroughthetuplesandstatisticsin-quiries(suchasacountoftuples)forthebenefitofthequeryoptimizer.Duringloading,alldatawereautomaticallycom-bined(e.g.,joinedandprojected)tomatchtheirfinalformatsandthenbulk-loadedintothestoragestructures.Thequery-processingengineguaranteedthatalldatawereprocessedoptimally.

6.2Databaseinteroperability

Inthepreviousparagraph,weshowedhowtocreateathin“veneer”overUnixtextfilestomakethembehavelikegmaps.Forpurposesofbulkloading,asimpleread-only

111

interfacesufficed,butwecouldhaveaddedfunctionsforin-serting,deletingandassociativelyaccessingtuplesinthesefiles.Inshort,withverylittlework,atextfilecanbemadetoemulateagmap.Thisideacanbeextendedtosupporthet-erogeneousstorageorganizationsbyhidingtheirdifferencesunderthegmapabstraction.Aslongasthedatacontentsofallthesedistinctsourcescanbedescribedbypsj-queriesoverasinglelogicalschema,thequeryoptimizercantrans-latelogicalqueriesintoaccessplansoverthem.ThisstrategyhasmanysimilaritieswiththeADMSsystem(Roussopou-los1993)whichusestheconceptofviewindexing(Rous-sopoulos1982)toefficientlyrepresentmaterializedviews,togetherwithasemanticqueryoptimizertosupportdatabaseinteroperability.

6.3Mainmemorycaching

Anotherapplicationofgmapsistosupportcacheddataintransientmain-memorydatastructuressuchasarraysorhashtables.Ifthecontentsofthestructurecanbedescribedbyapsj-queryonthelogicalschema,itcanbetreatedlikeagmap.Thefactthatitisnotpersistentsimplymeansthatitsdatamustbereplicatedinother(persistent)gmaps.Manydatabaseapplicationscachepartsofthedatabase,butbe-causeofrestrictionsindatareplication,thesedatacannotbepartofthephysicalschema,andthushavetobeaccessedandmaintainedmanually.

6.4Acceleratingcomplexstructureupdates

Althoughpathindicesareusefulforacceleratingcommonqueries,theyareexpensivetomaintain.Previousproposalsforbothnestedindicesandfieldreplicationsuggestincludinglinksalongtheindexedpathtobeusedduringupdateprop-agationtoacceleratethejoinsrequired.Inbothcases,thedatastructuresandthealgorithmsformaintainingthemarecustom-madefortheapplication.Wecanachievethesameeffectbyusingsimplegmapsalongthepathsthatneedtobetraversedduringtheupdatepropagation.Gmapsplacedatpointswhereexpensivejoinsareperformedactlikejoinacceleratorsinthesamewaythatinternallinksacceleratejoins.Inaddition,theyaremuchmoreversatileinthattheycanbeusedbyarbitraryqueriesthatinvolvethesejoinssincetheyarenottiedtoanyspecificlargerstructure.

Suppose,forexample,thatweuseaheaptostorecoursedataandalsoreplicatethenameanddepartmentnameofthefacultymemberwhoteacheseachcourse:

def_gmapcourse_dataasheapbygivenCourse

selectCourse.name,Course.level,Faculty.name,Dept.name

whereFacultyteachesCourseand

Facultyworks_inDeptWhenfacultymembersmovetoadifferentdepartment,weneedtoupdatethecoursedataforalltheircourses.ShekitaandCarey(19)suggestaddingbackwardlinksfromfacultytotheirrespectivecourses;ineffect,advocat-ingapointer-basedjoin.

112

Wepropose,instead,toaddagmapfromfacultytocourses:

def_gmapcourses_indexashash_tablegivenFacultyselectCoursewhereFacultyteachesCourseAlthoughtheperformanceofthisapproachmaynotmatchthatofapointerjoin[ithasbeenshownthatpointerjoinsaregenerallyfasterthanjoinsacceleratedthroughanexternalindex(Careyetal.1990)],itofferssignificantadvantages.First,thegmapscanbeusednotonlyforupdatingthestruc-turebutbyanyotherqueryaswell.Second,ifusagepatternschangesothatmaintenanceoftheacceleratorisnotbene-ficial,wecansimplyremoveit.Inthehardwiredapproach,wewillalwaysbestuckwiththesamemachinery.Finally,ourapproachallowstheusertoplacejoinacceleratorsatindividuallychosenpoints.Itismuchhardertoachievethisflexibilitywiththehardwiredapproach.6.5Supportforcomplexobjects

Usingupdatequeriestodescribetheschemaofthemodifieddatafacilitatesthehandlingofcomplexobjects.Acom-plexobjectinsertioncanbedescribedasanupdategmapwhosequerydescribesthecomplexobjectschema.Iftheinsertedobjectconsistsofseveraltuplesofdata,thetuplesarenotinsertedoneatatime.Instead,thequeryplantreatstheupdategmaplikeanyotherphysicaldatacontainerandperformsbulkoperations.Theresultofthequeryplanex-ecutionisasetoftuplestobeinsertedintoeachphysicalstoragestructure.Anyavailablebulk-loadinginterfacestothesestructurescanbeexploited.7Implementation

Toverifytheapplicabilityandpracticalityofouralgorithmsandobtainafeelingfortheirperformance,webuiltaproto-typeimplementationofoursystemontopofSHORE(Careyetal.1994).SHOREisanobject-orienteddatabasesystemunderconstructionattheUniversityofWisconsin-Madison.WeextendedSHOREwithfacilitiestosupportgmapsandtranslateandexecutequeriesandupdates.Logicalschemadefinitionsareparsedandstoredinalogical-schemacatalog.Physicalstoragestructuresarecreatedfromgmapdefinitions.Parsedgmapqueriesarestoredpersistentlyinasecondphysical-schemacatalog.Thedataorganization,keys,andrecordformatarealsodeterminedbythegmapdefinitions.Gmapsarecreatedandpopulatedbyprocessingupdaterequests.

WehavebuiltaqueryprocessorusingthealgorithmsinSects.3and4.Foralltheexamplespresentedinthispa-per,querytranslationaddsonlyanegligibleoverheadtotheoverallquerycost.Thequeryprocessoralsocontainshookstosupporttheupdateprocessor,asoutlinedinSect.5.Theupdateprocessoracceptsthreelistsofgmaps:updategmaps,targetgmapstobeupdated,anddatabasegmapsthatmaybeusedtosupplydatafortheupdates.Forsimpleupdates,thefirstlistcontainsjustonegmap,andthetargetanddatabase

Table2.ParametersofthedatabaseFacultyStudentsCoursesTAsDeptsInstances500050000100002000100KB/inst

1

0.8

1

0.85

3

listseachcontainallthegmapsinthephysical-schemacat-alog.Othercombinationsofargumentssupportotherappli-cationsdescribedinSect.6.

Wehavedesignedasimplecommoninterfacetoallowavarietyofstoragestructurestoemulategmaps.Thisinter-faceincludesoperationstostoreandretrievedataandmakecostinquiries.WehavecreatedimplementationforexistingSHOREfacilities(B+-treesandheaps)aswellasUnixfiles(forimportingandexportingdata)andmain-memorystruc-tures(forupdategmaps).Tosupporttheuseofthealgorithmin+Sect.5fordeletions,wehavealsomodifiedtheSHOREB-treefacilitytomaintainacountofduplicateinsertions.8Aperformancedemonstration

Inthissection,wedescribeexperimentswithatestdatabasethatillustratethatforaplausiblemixofqueriesandupdates,ourtechniquescanprovidebetterperformancethaneitherrelationalorobject-orienteddatabases.AsweobservedinSect.2,gmapscanbeusedtodescribetherelationsandtheprimaryandsecondaryindicesofrelationaldatabases,aswellastheclassextents,objectsets,andpathindicesofobject-orienteddatabases.Thus,weareabletouseoursystemtosimulatetwo“conventional”configurations;onebasedonanormalizedrelationaldesignandonefollowingatypicalobject-orienteddatabasedesign.AllofourresultsarereportedintermsofcountsofI/Ooperations,sinceabsoluteperformancein“real”databaseswouldbeaffectedbyava-rietyofimplementation-dependentfeaturesthatarebeyondourcontrol.

Intheexperiment,weusedanextendedversionoftheuniversitydatabasepresentedearlier.Wepopulatedonede-partmentwithactualdatadescribingthecomputersciencedepartmentattheUniversityofWisconsin-Madisonandgen-eratedsyntheticdatafor99moredepartmentstocreateadatabaseofreasonablesize.Whiletheactualdatabaseusedfortheexperimentincludesadditionalfields,forsimplic-ityweonlydiscussthelogicalschemaaspresentedinAp-pendixAandFig.1.AfewinterestingparametersofthedataarepresentedinTable2.8.1Theworkload

Theworkloadcontainsmultiplerunsofeightqueriesandfiveupdates.Theactualnumberofrunswasdeterminedbyvariousfactors.Wetriedtomaintainabalancebetweenexpensivequeriesandsimpleones,holdtheupdateloadataboutathirdofthetotalloadformostoftheconfigurations,andmaketherelativefrequenciesasrealisticforauniversityenvironmentaspossible.Tables3and4brieflydescribeeachqueryandupdate.Forexample,theworkloadcontainsthreerunsofqueryQ6.ThelasttwocolumnscontainthetotalcostinpageI/OsattributedtoallrunsofqueryQ6

Table3.QueriesintheworkloadNo.MixGivenFindRelOOQ110FacultynameField,department3030Q210FacultynameCoursestaught

3030Q310StudentnameYear,department,advisor5050Q410TAnameSupportlevel,course4040Q53FacultyareaStudentsadvised5757Q63StudentnameCourses,teachers3942Q72CoursenameStudentsattending8480Q81Departmentname

Coursestaught

59

Table4.UpdatesintheworkloadNo.MixUpdatedescriptionRelOOU11Facultyteachesanewcourse57U22Facultystartsadvisingastudent1014U320Studentaddsacourse160120U44TAisassignedtoacourse2020U56Studentenrollsinadepartment

24

24

whenitisprocessedontwodifferentconfigurationsofthedatabase:arelationalone(Sect.8.2)andanobject-orientedone(Sect.8.3).Notethateachupdateinsertsasinglepairofvalues.Forexample,whenastudentaddsacourse(U3),theinsertedpairwouldbe(StudentId,CourseId).

ToobtainI/Ocounts,weinstantiatedqueriesandup-datesusingrandomvaluesfortheinputparametersandtheinsertedpairs,usedtheactualplansobtainedbyoursystemtoperformthequeryorupdate,andrecordedthesizeoftheresult,aswellasthesizesofallintermediateresultsusedinjoins.Fromthesedata,+wecalculatedI/OcountsusingasimplemodeloftheB-treeandtheheapstoragestructures,basedontheassumptionsthatthepagesizeis8KBandthat4MBareusedforbuffers.Theanalyticalapproachwaschoseninfavorofmeasuringactualresponsetimes,sincetheunderlyingSHOREsystemwasstillunderdevelopmentatthetimeofexperimentation.

8.2Relationaldesign

Therelationalconfigurationfollowsatextbooktranslationofthelogicalschemaintorelations.Wecreatedarelationforeachinternalnodeoftheschemaandoneforthemany-to-manyrelationshipattends.Weassumedthattheat-tributenameisaprimarykeyforeachentityintheschema,thusplayingtheroleoftheidentitysurrogatefortherela-tionalconfiguration.Wedecomposedtheinheritanceasso-ciationvertically,i.e.,theTArelationincludesTA-specificdata(supportlevel),aswellasStudent.nameandCourse.namefortheassistedcourse.Tosimplifythedis-cussion,weassumedthatnestedloopsistheonlyavailablejoinmethod.Secondaryindiceswereaddedwhenneces-sarytoacceleratejoinsandselections,andweclusteredtherecordsfavorablyintheheaps.

Therewasonecasewheregmapscouldnotsimulatetheneededstructure.QueriesQ6andQ7needasecondaryindexontheattendsrelationonbothcoursenameandstudentname.Theseindiceswouldreturnaphysicalpointerintheattendsrelation.Theycannotbeexpressedasagmapbecausegmapdatacanonlybevaluesofnodesintheschemagraph,andattendsisanedgeandnotanode.

113

Note,however,thattheneedfortheseindicesisanarti-factoftherelationaldesignrestrictions:wewouldbebetteroffwithtwoindicesthatmapcoursesdirectlytostudentsandviceversa.Thisconstruct,calledajoinindex,hasbeenshowntoofferadvantagesovertheconventionalrelationalapproach(Valduriez1987).Thus,wereplacedtheattendsrelationwithtwogmapsthatsimulatethedefinitionofajoinindexovertheattendsrelationship.Theresultsshowedthatthejoinindexperformedbetterthanacombinationofarelationclusteredononeofthetwonamesandapairofsecondaryindices.ThecompletephysicalschemafortherelationalconfigurationispresentedinAppendixB.1.

Thetotaldatabasesizeforthisconfigurationis74MB.Runningthefullworkloadonthatdatabasecosts613pageI/Os,%causedbyqueriesand36%byupdates.TheexactcontributionofallrunsofeachqueryandupdateinthetotalcostisshowninTables3and4.8.3Object-orienteddesign

Thephysicalschemafortheobject-orientedconfigurationincludesoneextentfileforeachinternaldomaininthelog-icalschema.Therelationshipattendsisstoredbothaspartofstudentobjectsandaspartofthecourseobjects,i.e.,studentscontainpointerstoallcoursestheyattend,andcoursescontainpointerstothestudentsattendingthem.ThisduplicationallowsefficientexecutionofqueriesQ6andQ7.Theotherrelationshipsarestoredaspartofthedomainclos-esttotheedgelabelinFig.1.Forexample,theadvisesrelationshipisrepresentedbyanattributeoftheFacultyclass.Wealsoencounteredacaseforwhichwehadtoal-tertheplanneddesigntoconformwiththegmapdefinitionlanguage.Initially,wethoughtthatthestudentextentshouldcontainbothstudentsandTAs.However,creatingagmapthatcontainsmembersofbothdomains,i.e.,membersofaclassanditssubclass,requiresaquerythatallowssomeformofunion,andourrestrictednotionofpsj-querydoesnotsup-portsuchqueries.Instead,wefollowedthesamedesignasintherelationalconfigurationanddecomposedTAvertically.Forthegivenworkload,thisalterationhasnoperformanceeffect.ThegmapdefinitionsthatcreatethedesiredphysicalschemaarepresentedinAppendix10.Thetotaldatabasesizeforthisconfigurationis75MB.Runningthefullwork-loadonthedatabasecosts583pageI/Os,68%causedbyqueriesand32%byupdates.ThelastcolumnofTables3and4showthecostsforallrunsofeachqueryandupdate.8.4Object-orienteddesignwithcomplexaccesspathsTheprevioustwoconfigurationsdonotuseanycomplexaccesspathsuchaspathindices1orfieldreplication.Itisworthwhiletoestimatetheimprovementthatcanbeachievedbyequippingtheobject-orientedversionofourdatabase,withsuchaccesspaths.Forthegivenworkload,fieldrepli-cationcanbeappliedinthreecases,asshowninTable5.Eachrowofthetabledescribeswhatisreplicated,theaddi-tionalspacerequiredinmegabytes,theaffectedqueries(no

1

WeusethetermpathindicescollectivelyforwhatBertinoandKim(19)callpathindices,nestedindices,andmulti-indices

114

Table5.ConventionalcomplexpathtechniquesReplicateInSpaceNoSaved“Faculty.workin.name”Faculty0.2Q110“Student.enrolled.name”Student2Q310“TA.assists.name”

TA0.1Q4

20Totalsaved

40

Table6.ApplicationsofgmapsReplicateInSpaceNoSaved“Student.advises−1.name”Student2Q3,U218“Course.teaches−1.name”Course0.4Q6,U19“Faculty.advises.name”Faculty0.6Q551“Course.attends−1.name”Course12Q776“Faculty.teaches.name”

Faculty

0.4Q850Totalsaved

204

updateisaffectedbytheseadditions),andthesavingsthatcanbeattributedtothereplicationforallrunsoftheaffectedquery.

InAppendixB.3weshowthealtereddefinitionsofthefaculty,student,andTAgmaps,withthenewcomplexpathsadded.Therestofthephysicalschemaisidenticaltotheearlierobject-orientedschema.Thedatareplicationadds2MBofadditionalspace,bringingthetotaldatabasesizeto77MB.Thenewdatabaseoffersa7%performancegainoverthepreviousone.Itisinterestingtonotethatthetech-niqueoffieldreplicationasoriginallydescribed(KatoandMasuda1992;ShekitaandCarey19)cannotbeappliedtoanyotherfieldinourdatabasebecauseitisrestrictedtoedgesfromclassestosingle-valuedattributes.Similarre-strictionsalsoprohibitanyusefulapplicationofpathindicesinourdatabase.Therestrictionswereimposedbyprevi-ousauthorsforareason:theimplementationoftheseaccesspathsandthetaskofpropagatingupdateswouldbefarmorecomplexwithoutthem.8.5Aconfigurationwithgmaps

Next,weconsidertheresultsoffullyequippingthedatabasewithgmaps.Insteadofdesigningaphysicalorganizationfromscratch,weshowhowwecanmakeincrementalchangesontheobject-orientedphysicalschematoimproveitsperformance.

First,wereplicateattributesoverpathsthattraversere-lationshipsbothintheinversedirection(fromattributetoclass)andintheforwarddirection(fromclasstoattribute).SuchareplicationbringscostsavingsinqueriesQ3andQ7.Second,weconsiderreplicatingattributesoverpathsthatarenotfunctionalbyassociatingmultiplevalueswitheachinstanceoftherootofthepath.SuchreplicationisverybeneficialforourworkloadsinceitcaneliminatethemostexpensivestepofthemediumandlargequeriesQ5,Q7,andQ8.ThealteredgmapdefinitionsareshowninAp-pendixB.4.TheexactcostsavingsandoverheadsthatarecausedbyeachadditionalaccesspathareshowninTable6.Thesemodificationsofferasignificanttotalgainof204I/Os,or35%overthecostoftheinitialobject-orientedap-proach.Ifweaddtothesesavingsthoseoftheprevioussec-tion(withgmapssimulatingfieldreplication)thetotalgainclimbsto42%.Itisinterestingtonotethelowupdateover-

Table7.Summarycosts

SpaceQueryUpdateTotal(MB)costcostcostRelational74394219613Object-oriented75398185583OO+extensions77358185543gmaps(1)80.4227188415gmaps(2)

92.4

151

188

339

headofallthesereplications.Thereasonisthatwereplicatedattributes,likestudent.name,thatarestatic,i.e.,donotgetupdated.Asaresult,theonlyupdatesthatareaffectedarethoseontheintermediatelinksofthepath.Theattendsrelationship,forexample,isthemostvolatilerelationshipintheschema,soweexpectthattheslightestupdateoverheadwouldoverwhelmanysavingsgained.However,sincetheattendsrelationshipisbi-directional(storedasanattributeofboththestudentandthecourse),bothextentsneedtobeupdatedanyway.Readingandwritingonemoreattribute,student.name,doesnotaddanyoverhead.

Thespaceoverheadisalsolowinallbutonecase:repli-catingthestudentnamesineachcoursemorethandoublesthesizeofthecourseextent.Whetherornotthespace-timetrade-offisworthwhiledependsontheapplication.Theto-talincreaseinspaceusageis15.4MB,bringingthetotaldatabasesizeto92.4MB.Table7comparesthequerycost,updatecost,anddiskusageofallfourapproaches.Forthegmapconfiguration,weshowtheresultsbothwithandwith-outreplicationofstudentnames,inbothcasesincludingtheenhancementsoftheconfigurationwithcomplexpaths.8.6Achangeintheworkload

Finally,weconsidertheproblemofadaptingthedatabasetoachangingworkload.Assumethatourworkloadisaug-mentedbyasinglerunofoneadditionalqueryQ9thatasksforthenamesofallTAssupportedbyadepartment.Moreprecisely,Q9asksforallTAswhoassistacoursetaughtbyafacultymemberthatworksinagivendepartment.Thesimplestwaytohandlethenewqueryintheinitialobject-orientedconfigurationwouldbetoreclustertheTAextentsothatitisorderedbythesupportingdepartmentoftheTAs,andaddanindexfromcourseidentificationstoTAs.ThecostofthequeryQ9,however,remainshigh–1I/Os–andthetotalcontributionofthequerytotheload,takingintoaccounttheadditionalupdateoverheads,climbsto172I/Os.Weusethisconfigurationasthebaselineforthenewworkload.

Theproblemwiththenewqueryisthatitrequiresun-clusteredaccesstothelargestudentfileinordertoobtainthestudentname.ReclusteringthefiledoesnothelpsincequeryQ5dependsonthatspecificclusteringforitsperfor-mance.Keepingtwocopiesofthestudentfile,clusteredforQ5andQ9,respectively,isaunwisebecauseofthehighupdateratesoftheattendsrelationship.Suchasolutionwouldalsoresultinasignificantincreaseinthedatabasesizewithoutmuchhelpinprocessingcost.

gmapsofferadditionaloptions.Wecanselectivelyrepli-catethestudent.nameattributerequestedbythenewqueryfromthestudentextenttotheTAextent.Thislowers

thequerycostby95I/Os.Thenetgainafteraccountingfortheadditionalcostofupdatesis79I/Os.Table8comparesthethreenewconfigurations.Savingsandlossesaregivenrelativetothebaselineconfiguration.Noticethatwhilealloptionsareavailabletothegmap-equippeddatabase,mostconventionaldatabaseswouldonlyhavethefirstoption,andafewmaysupportthesecond.

Underthenewworkload,thegmap-equippedconfigura-tionhasaloadof422I/Oscomparedto755I/Osintheobject-orientedsystem,fortotalsavingsof44%.

9Relatedwork

Mostexistingdatabasesystemsdonotprovidetruephysi-caldataindependence,sinceeveryconstructofthelogicalschemacorrespondsdirectlytoaprimaryphysicalstructure.Forexample,everyrelationinmostrelationalsystems,everyclassextentinextent-basedOOsystems[e.g.,Orion(Kimetal.1990),andZoo(Ioannidisetal.1993)]andeverycollec-tionincollection-basedOOsystems[e.g.,GemStone(MaierandStein1986),Extra/Excess(Careyetal.1988),andOb-jectStore(Orensteinetal.1992)]isstoredinaseparatefile.Themainflexibilityatthephysicallevelcomesfromsec-ondaryaccesspathstothesefiles.

Severalextensionsofboththeprimaryphysicalstruc-tureandthesecondaryaccesspathshavebeenrecentlypro-posedintheliteratureandallowstoringtogetherdatafrommorethanonelogicalconstruct.InSects.2and8,wedis-cussedpathindices(BertinoandKim19;Kimetal.19;MaierandStein1986),joinindices(Valduriez1987),andfieldreplication(KatoandMasuda1992;ShekitaandCarey19),notingtheirrestrictionsandcomparingtheirperfor-mancetoourscheme.Anotherapproachtodecomposingthedatabaseishierarchicaljoinindices(Valduriezetal.1986),ageneralizationofjoinindicesthatallowsonetobuildanindexoveridentitysurrogatesthatpopulatetreesofthelog-icalschemagraph.Accesssupportrelations(ASR)offeradifferentgeneralizationofjoinindices(KemperandMo-erkotte1990a),whichallowsthedefinitionofindicesovertheinstancesofarbitrarychainsoflogicalschemanodes.Thisschemeoffersahigherdegreeofflexibilityandal-lowsthedefinitionofindicesthatstorebothcompleteandpartialinstancesofeachchain.Exceptforthelastfeature,thecontentsofbothhierarchicaljoinindicesandASRscanberepresentedaspsj-queries,andcanthusbedefinedasagmap.However,sincegmapqueriesdonotsupportunions,theycannotrepresentouterjoins,andthereforecannotstoreincompleteinstancesofchains.

Withrespecttothetranslationalgorithm,ourworkmostcloselyresemblesresearchattheUniversityofWaterlooonmaterializedviews(Blakeleyetal.19;YangandLarson1987).Ouralgorithmsupportsamorerestrictedquerylan-guage,butusesinformationaboutinclusionandfunctionaldependenciesaswellas“topological”informationimplicitinagraph-basedlogicalschema.Thisinformationallowsustoidentifysolutionsthatwouldbemissedbythemoregeneralalgorithm.Section5containsanexampleofasolu-tionthatcanonlybefoundwheninclusiondependenciesare

115

takenintoaccount.2Similarly,ourhandlingoffunctionaldependenciesismoregeneralthanthatofthealgorithmofYangandLarson(1987),whichsimplyusestheprimarykeyinformationforeachrelation.Unlessallnon-trivialdepen-denciesaregeneratedbysuperkeys(i.e.,unlessallrelationsareinatleast4thNormalForm),ourschemewillfindmoresolutions.

Withrespecttotheintegrationwiththerestofthequeryoptimizer,mostearliereffortsuseatwo-stageapproach,wherethequeriesarefirsttranslatedintoqueriesoverphys-icalstructures,andtheresultingqueriesarethenoptimizedone-by-onebyaconventionaloptimizer.Inadditiontotheworkonmaterializedviews(Blakeleyetal.19;YangandLarson1987),sucheffortsincluderesearchwhosegoalwasnotphysicaldataindependencebutsimplyprocessingefficiency.Examplesincluderesearchonreusingcommonsubexpressionswithinaquery(Hall1976)orbetweenmul-tiplequeries(Sellis1986),reusingresultsofpreviousqueries(Finkelstein1982),andusingintegrityconstraintsforseman-ticqueryoptimization(Chakravarthy1990).KemperandMoerkotte(1990b)optforaunifiedapproachoftranslationandoptimizationfortheASRsbyextendingarule-basedop-timizertoincludeappropriaterewritingrules.Ourapproachofenhancingaconventionaloptimizerwiththenecessarytranslationstepstakesadvantageoffullcostinformationavailabletotheoptimizertoperformearlypruningofin-feriorsolutions,whilekeepingtheoveralloptimizationcostlow.

10Conclusion

Wehavepresentedanewapproachtophysicalschemade-signthatusesadeclarativelanguagetodescribethecontentsofstoragestructures.Carefullyrestrictingthelanguageal-lowsefficientalgorithmstotranslatequeriesoverthelogicalschemaintoaccessplansusingthephysicaldatastructures.Wehaveshownhowtointegratethequerytranslationalgo-rithmintoaconventionalqueryoptimizer.Asimplemod-ificationofthequerytranslationalgorithmsupportspropa-gationofupdatestothedatabase.Aprototypesystemthatincorporatesthemajoraspectsofthisapproachiscurrentlyoperational.Wehaveusedittodemonstrateinarealisticenvironmenthowourapproachcanachievesignificantper-formancegainsovermoreconventionalschemes.

Furtherdevelopmentisnecessaryforourtechniquetoreachitsfullpotentialinapracticalsetting:

–Thispaperassumesthattheuserqueryhasalreadybeendecomposedinpsj-querycomponentsthatobeythesamerestrictionsasthegmapdefinitionlanguage(Sect.2.2).Itshouldbepossibletoextendourtechniquetotranslateuserqueriescontainingconstructsnotallowedingmapqueries,suchasarbitraryjoinsofgmaps,duplicateoc-currencesofgmapsinthewhereclauseandduplicateoccurrencesofattributesinthequerytarget.

2

IntheexampleofSect.5.4,thetranslationoftheupdatequeryforthegmapCFShasonesolutionifthereisnoinclusiondependencyfromteachestoattends,andtwosolutionsotherwise.Analgorithmthatdoesnotcheckforinclusiondependenciescannotdiscoverthesecondsolution

116

Table8.AlternativedatabasereconfigurationsDescriptionoftechniqueReclusterstudentextentKeeptwocopiesofstudentextentReplicate“Student.name”inTAextent

Spaceoverhead(MB)0400.1

ActivitiesaffectedQ5,Q9Q9,U3Q9,U4

Saved(lost)(5)(35)79

–AswediscussedinSect.5,wedonotguaranteethatthedatadescribedinthelogicalschemacanbestoredinthephysicalschema.Itshouldbepossibletodevisesufficientconditionstoensurethataproposedphysicalschema“covers”alogicalschema,inthesensethatithassufficient“informationcapacity”tostoreanydatabaseinstanceconsistentwiththelogicalschema.Whentheseconditionsarenotmet,thesystemshouldbeabletowarnaboutupdatesthatresultininformationloss.

–Thetranslationalgorithmshouldbeextendedtotakeintoaccountadditionalintegrityconstraintsandpermitthetranslationofqueriesintounionsofotherqueries.Itshouldalsobeabletoincorporatestoragestructuresthatofferahierarchicalinterface.

–Finally,theincreaseofavailablechoicesinthephysicalschemadesignputstheburdenonthedatabaseadmin-istratortomakethecorrectchoices.Weneedtodesigntoolsthatguidetheadministratorinchoosingtheappro-priatecombinationofstoragestructures.

Acknowledgements.OdysseasG.TsatalosandMarvinH.SolomonhavebeenpartiallysupportedbytheAdvancedResearchProjectAgency,ARPAordernumber018(formerly8230),monitoredbytheU.S.ArmyResearchLaboratoryundercontractDAAB07-91-C-Q518.YannisE.IoannidishasbeenpartiallysupportedbytheNationalScienceFoundationunderGrantsIRI-9113736,IRI-9224741,andIRI-9157368(PYIAward)andbygrantsfromDEC,IBM,HP,AT&T,andInformix.

Appendices

A.LogicalschemainODL

interfaceDept{public:

attributestringname;};

interfaceFaculty{attributestringname;attributestringareaattributerefworks_inattributesetadvises;attributesetteaches;;

interfaceStudent{attributestringname;attributeshortyear;attributerefenrolled;attributesetattends;};

interfaceTA:publicStudent{attributefloatsupport_level;attributerefassists;

};

interfaceCourse{attributestringname;attributeshortlevel;

};}

BPhysicalschema

B.1.Relationalphysicalschema

//Relations

def_gmapDeptRelationasheapbygivenDeptselectDept.namedef_gmapFacultyRelationasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept.name

whereFacultyworks_inDeptdef_gmapCourseRelationasheapbygivenCourseselectCourse.name,Course.level,Faculty.namewhereFacultyteachesCoursedef_gmapStudentRelationasheapbygivenStudent

selectStudent.name,Student.year,Dept.name,Faculty.name

whereStudentenrolledDeptandFacultyadvisesStudentdef_gmapTARelationasheapbygivenTAselectStudent.name,TA.support_level,Course.namewhereTAisaStudentandTAassistsCourse//NameIndices

def_gmapDept_name_indexas

btreebygivenDept.nameselectDeptdef_gmapFaculty_name_indexas

btreebygivenFaculty.nameselectFaculty

def_gmapCourse_name_indexas

btreebygivenCourse.nameselectCourse

def_gmapStudent_name_indexasbtreebygivenStudent.nameselectStudent

//OtherIndices

def_gmapFaculty_area_indexasbtreebygivenFaculty.areaselectFacultydef_gmapCourse_teaches_indexasbtreebygivenFaculty.nameselectCoursewhereFacultyteachesCoursedef_gmapStudent_advisor_indexasbtreebygivenFaculty.nameselectStudentwhereFacultyadvisesStudentdef_gmapTA_assists_indexasbtreebygiveCourse.nameselectTAwhereTAassistsCoursedef_gmapCourse2Student_join_indexasbtreeby

givenCourseselectStudentwhereStudentattendsCoursedef_gmapStudent2Course_join_indexasbtreeby

givenStudentselectCoursewhereStudentattendsCourse

B.2.Object-orientedphysicalschema//ClassExtents

def_gmapDeptExtentasheapbygivenDeptselectDept.name

def_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Student.CoursewhereFacultyworks_inDept

andFacultyadvisesStudentandFacultyteachesCoursedef_gmapCourseExtentasheapbygivenCourseselectCourse.name,Course.level,Student

whereCourseattendedStudentdef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year,Dept,Course

whereStudentenrolledDeptandStudentattendsCoursedef_gmapTAExtentasheapbygivenTAselectStudent,TA.support_level,CoursewhereTAisaStudentandTAassistsCourse//NameIndices

def_gmapDept_name_indexas

117

btreebygivenDept.nameselectDeptdef_gmapFaculty_name_indexas

btreebygivenFaculty.nameselectFaculty

def_gmapCourse_name_indexas

btreebygivenCourse.nameselectCourse

def_gmapStudent_name_indexas

btreebygivenStudent.nameselectStudent//OtherIndices

def_gmapFaculty_area_indexasbtreebygivenFaculty.areaselectFacultydef_gmapFaculty_advises_indexasbtreebygivenStudentselectFacultywhereFacultyadvisesStudentdef_gmapFaculty_teaches_indexasbtreebygivenCourseselectFacultywhereFacultyteachesCoursedef_gmapFaculty_works_indexasbtreebygivenDeptselectFacultywhereFacultyworks_inDept

B.3.Object-orientedschemawithcomplexpathsdef_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Dept.name,Student,Course

whereFacultyworks_inDeptandFacultyadvisesStudentandFacultyteachesCoursedef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year,

Dept,Dept.name,Course

whereStudentenrolledDeptandStudentattendsCoursedef_gmapTAExtentasheapbygivenTAselectStudent,TA.support_level,Course,Course.name

whereTAisaStudentandTAassistsCourseB.4.gmapSchema

def_gmapFacultyExtentasheapbygivenFacultyselectFaculty.name,Faculty.area,Dept,Dept.name,Student,Student.name,Course,Course.name

whereFacultyworks_inDeptand

118

FacultyadvisesStudentandFacultyteachesCourse

def_gmapCourseExtentasheapbygivenCourseselectCourse.name,Course.level,

Student,Student.name,Faculty.namewhereCourseattendedStudentandFacultyteachesCoursedef_gmapStudentExtentasheapbygivenStudentselectStudent.name,Student.year,

Dept,Course,Faculty.name

whereStudentenrolledDeptandStudentattendsCourseandFacultyadvisesStudentReferences

1.AhoA,SagivA,UllmanJ(1979)Equivalencesamongrelationalex-pressions.SIAMJComput8:218–247

2.BertinoE,KimW(19)Indexingtechniquesforqueriesonnestedobjects.IEEETransKnowlDataEng1:196–214

3.BlakeleyJ,CoburnN,LarsonP(19)Updatingderivedrelations.ACMTransDatabaseSyst14:369–400

4.CareyM,DeWittD,VandenbergS(1988)Adatamodelandquerylanguageforexodus.In:ProcACMSIGMODConf,June,pp413–423,Chicago,Ill

5.CareyM,ShekitaE,LapisG,LindsayB,McPhersonJ(1990)Anincrementaljoinattachmentforstarburst.In:ProcIntVLDBConf,pp662–673,Brisbane,Australia

6.CareyM,DeWittD,FranklinM,HallN,McAuliffeM,NaughtonJ,SchuhD,SolomonM,TanT,TsatalosO,WhiteS,ZwillingM(1994)Shoringuppersistentapplications.In:Proc.ACMSIGMODConf,May,pp383–394,Minneapolis,Minn

7.CatelRGG(1993)Theobjectdatabasestandard:ODMG-93.Morgan-Kaufman,SanMateo,Calif

8.ChakravarthyU(1990)Logic-basedapproachtosemanticqueryopti-mization.ACMTransDatabaseSyst15:163–2071990.

9.ChandraA,MerlinP(1977)Optimalimplementationofconjunctivequeriesinrelationaldatabases.In:ProceedingsoftheAnnualACMSymposiumonTheoryofComputing,May,pp77–90

10.FinkelsteinS(1982)Commonexpressionanalysisindatabaseappli-cations.In:ProcACMSIGMODConf,pp3–374,Orlando,Fla11.HallPV(1976)Optimizationofasinglerelationalexpressionina

RDBMS.IBMJResDev20

12.IoannidisY,LivnyM,HaberE,MillerR,TsatalosO,WienerJ(1993)

Desktopexperimentmanagement.IEEEDataEng16:19–23

13.KatoK,MasudaT(1992)Persistentcaching.IEEETransSoftware

Eng18:631–5

14.KemperA,MoerkotteG(1990a)Accesssupportinobjectbases.In:

ProcACMSIGMODConf,pp290–301,AtlanticCity,NJ

15.KemperA,MoerkotteG(1990b)Advancedqueryprocessinginobject

basesusingaccesssupportrelations.In:ProcIntVLDBConf,pp290–301,Brisbane,Australia

16.KimW,BertinoE,GarzaJ(19)Compositeobjectsrevisited.In:

ProcACMSIGMODConf,pp337–347,Portland,Ore

17.KimW,GarzaJF,BallouN,WoelkD(1990)Architectureofthe

ORIONNext-generationdatabasesystem.IEEETransKnowlDataEng2:109–124

18.MaierD,SteinJ(1986)Indexinginanobject-orientedDBMS.In:2nd

InternationalWorkshoponObject-OrientedDatabaseSystems,Septem-ber,pp171–182,PacificGrove,Calif

19.MillerR,IoannidisY,RamakrishnanR(1993)Theuseofinformation

capacityinschemaintegrationandtranslation.In:ProcIntVLDBConf,pp120–133,Dublin,Ireland

20.OrensteinJ,HaradhvalaS,MarguilesB,SakaharaD(1992)Querypro-cessingintheObjectStoredatabasesystem.In:ProcACMSIGMODConf,pp403–412,SanDiego,Calif

21.RoussopoulosN(1982)Viewindexinginrelationaldatabase.ACM

TransDatabaseSyst7:259–290

22.Roussopoulos,EconomouN,andStamenasA(1993)ADMS:atestbed

forincrementalaccessmethods.IEEETransKnowlDataEng5:762–773

23.SelingerPG,AstrahanMM,ChamberlinDD,LorieRA,PriceTG

(1979)Accesspathselectioninarelationaldatabasemanagementsys-tem.In:ProcACMSIGMODConf,pp23–34,Boston,Mass

24.SellisT(1986)Globalqueryoptimization.In:ProcACMSIGMOD

Conf,pp191–205,Washington

25.ShekitaE,CareyM(19)Performanceenhancementthroughrepli-cationinanobject-orientedDBMS.In:ProcACMSIGMODConf,pp325–336,Portland,Ore

26.TsatalosO(1994)Thegmap:aversatiletoolforphysicaldatainde-pendence.PhDthesis,UniversityofWisconsin-Madison,December27.TsatalosO,IoannidisY(1994)Aunifiedframeworkforindexingin

databasesystems.In:ProceedingsoftheInternationalConferenceonDatabaseandExpertSystemApplications,September,pp183–192,Athens,Greece

28.UllmanJ(1988)Principlesofdatabaseandknowledge-basesystems,

vol1.ComputerSciencePress,NewYork,pp423–425,Rockville,Md29.ValduriezP(1987)Joinindices.ACMTransDatabaseSyst12:218–24630.ValduriezP,KhoshafianS,CopelandG(1986)ImplementationTech-niquesofComplexObjects.In:ProcIntVLDBConf,pp101–109,Kyoto,Japan

31.YangHZ,LarsonPA(1987)QuerytransformationforPSJ-queries.In:

ProcIntVLDBConf,pp245–254,Brighton,UK

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- 7swz.com 版权所有 赣ICP备2024042798号-8

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务