您的当前位置：首页 Proving the correctness of a complete microprocessor

Proving the correctness of a complete microprocessor

来源：微智科技网

ProvingtheCorrectnessofaCompleteMicroprocessor

ChristianJacobi,DanielKroening

Dept.14:ComputerScience,UniversityofSaarlandPostBox151150,D-66041Saarbruecken,Germany

email:cj,kroening@cs.uni-sb.de

Abstract.Thispaperpresentsstatusresultsofamicroprocessorveriﬁcationproject.Theauthorsverifyacomplete32-bitRISCmicroprocessorincludingtheﬂoatingpointunitandthecontrollogicofthepipeline.Thepaperdescribesaformaldef-initionofa”correct”microprocessor.Thiscorrectnesscriterionisprovenforanimplementationusingformalmethods.AllproofsareveriﬁedmechanicallybymeansofthetheoremprovingsystemPVS.

1Introduction

Microprocessordesignisanerror-proneprocess.Withincreasingcomplexityofcurrentmicroprocessordesigns,formalveriﬁcationhasbecomecrucial.Inordertoachievecompletelyveriﬁeddesigns,adjustingthedesignprocessitselfplaysanimportantrole:themorehigh-levelinformationonthedesignisavailable,thefastertheveriﬁcationcanbedone.

Theauthorsre-designedasimpleRISCprocessor,theDLX[1],withrespecttoveriﬁability.Thedesignincludesthecompletepipecontrolandforwardinglogic.Thefunctionunitsarefullyfeaturedincludingaﬂoatingpointunit.Theyarenotabstractedbymeansofuninterpretedfunctions.Theproofsforthegluelogic,theALU,andﬂoat-ingpointunitareveriﬁedusingthetheoremprovingsystemPVS[2].

RelatedWorkRecentpapersshowthecorrectnessofcomplexdesignsorschedulersintheoremprovingsystemssuchasPVS.Hosabettuetal.[3]provebothsafetyandlivenessofTomasulo’salgorithmusingPVS.SwadaandHunt[4]provideanACL2[5]proofofacompletedesignimplementingaTomasuloschedulerwithreorderbuffer.

Henzingeretal.[6]verifyasimplepipelinedprocessorusingamodelchecker.McMillan[7]partlyautomatestheproofbyreﬁnementofTomasulo’salgorithmpre-sentedin[8]withthehelpofcompositionalmodelchecking.Thistechniqueisimprovedin[9]bytheoremprovingmethodstosupportanarbitraryregistersizeandnumberoffunctionunits.

Therearemanypublicationsontheveriﬁcationof(partsof)ﬂoatingpointunits.Bryantandhisgroupveriﬁeddifferentfunctionunitsusingmodel-checking[10–12].AagaardandSegerveriﬁedamultiplierusingmodel-checkingcombinedwiththeoremproving[13].Claesenet.al.andO’Learyet.al.haveusedtheoremproverstoverify

anSRTintegerdivider[14],andanSRTintegersquarerootcircuit[15],respectively.Russinoffhasproventhecorrectnessofthemultiplication,divisionandsquarerootalgorithmsoftheAMDK7processor[16].Mostofthepublicationsciteddonotcoverdenormalnumbers.

ProjectStatusTheveriﬁcationofthepipelineandforwardinglogichasreachedahighlevelofautomation.However,theprocessofverifyingthefunctionunitsisnotauto-matedatall.Thefundamentalsoftheﬂoatingpointmathematicsareveriﬁedalready.Theveriﬁcationoftheindividualﬂoatingpointcircuitsiswork-in-progress.

2TheSpeciﬁcationMachine

2.1HardwareModel

Boththespeciﬁcationdesignandthehardwarearemodeledasmathematicalmachine.Mathematicalmachinesareacommonmethodtomodelthebehaviorofarbitrarymi-croprocessorsystems.Forthispaper,thedeﬁnitionofthemathematicalmachinefrom

whichconsistsofthe[17]isused:amathematicalmachineisatriple

followingcomponents:

isthesetofallpossibleconﬁgurationsof.Anelementofiscalledcon-ﬁgurationorstateofthemachine.

isaconﬁgurationof.–Theinitialconﬁguration

mapsaconﬁgurationtoitssuccessor.–Thetransitionfunction–

Asequence

holds.

ofconﬁgurationsiscalledcomputationof

iff

NotationRegistersareusedinboththespeciﬁcationandtheimplementationofami-beaﬁnitesetofregisters.Eachregistercancroprocessor.Let

.haveavaluewithinaﬁnitedomain

Theconﬁgurationsetconsistsofthedomainsoftheregisters:

Theprojectionfunction

.Letbe

extractsthevalueofaregister

fromaconﬁguration.

Letfor

bepartofacomputationofamathematicalmachine.

.Letbeashorthandforthefollowingprojectionon:

isashorthand

Inanalogytothat,let

transitionfunction:

beashorthandforthefollowingprojectiononastate

Asignalisdeﬁnedasamappingfromthesetofconﬁgurationsintoanarbitrary

:domain

2.2DLXArchitecture

OurdesignimplementstheDLXarchitecture.TheDLXarchitecture[1]featuresaRISCinstructionsetincludedbothintegerandﬂoatingpointinstructions.Theintegercoreistakenfrom[17]andextendedbyaﬂoatingpointregisterﬁle(FPR)andﬂoatingpointinstructionsasdescribedin[18].

2.3CorrectIEEEFloatingPointArithmetic

Ourprimarygoalistheveriﬁcationofacompleteprocessor.Thus,weformallyverifythecorrectnessofaﬂoatingpointunit(FPU).Intheprocessorframework,theFPUisamulti-cyclefunctionunit,andcan(almost)beseenasablackbox.TheFPUsupportstheoperationsaddition,subtraction,multiplication,divisionandsquareroot.TheFPUhandlesnormalanddenormalnumbers,specialvalues,traps,andinterrupts.Thisisincontrasttomostpreviousresults,wheredenormalnumbers,trapsandinterruptsaredisregarded.

ThecorrectnesscriterionsfortheFPUaregivenbytheIEEEstandard754[19].ThestandardisinformalwhichmakesitunusablefortheformalveriﬁcationoftheFPU.OnethereforehastoformalizetheIEEEstandard;thisformalizationhastopreservethenotionofthestandard.Inherently,onecannotprovetheequivalenceoftheformalandtheinformalspeciﬁcation.Theformalspeciﬁcationshavetoconvinceanybodyoftheircorrectness.WewillgivethespeciﬁcationoftheIEEEroundingmodeto

up,roundzeroarenot

ascomplicatedasthemodeto

WecallanIEEE-factoringsemi-representable,ifisrepresentable.Wecallareal(semi-)representable,ifis(semi-)representable.

Representablenumbersexactlycorrespondtotherepresentablenumbersasdeﬁnedinthestandard.Inthefollowing,wewillonlyinvestigatesemi-representablefactorings.Inorderto“round”semi-representablefactoringstorepresentableones,onejusthastodecidewhetheronehastoroundtoinﬁnityornot.Thiscanbasicallybedonebya

.comparisonofwith

Weproceedwiththedeﬁnitionoftheroundingfunction.Thestandarddeﬁnesthe

roundingmodeto

nearestasfollows:

Nowwehaveadeﬁnitionrelativelyclosetothehardwarebutfarawayfromthespeciﬁcationinthestandard.Ononehand,thisenablessimplerimplementationandveriﬁcationoftherounder,aswewillseeinsection3.5.Ontheotherhand,itisnotobviousthatthesedeﬁnitionsconformtothetheIEEEstandard.Wegivethreetheoremswhichjustifythisclaim.Theorem1.Foranyreal,

issemi-representable.

Thenexttheoremstatesthattheresultoftheroundingfunctionindeedisanearestrepresentablenumber.

Theorem2.Foranyreal,andanysemi-representableIEEE-factoring

.holds

,it

Thethirdtheoremstatesthatanumberwithleastsigniﬁcantbitzeroischosenin

caseofatiebetweenthetwonearestrepresentablenumbers.Thus,weﬁrstboundthe

.Wethenshowthatthesigniﬁcantisevenifthemaxi-distancebetweenandmumdistanceisreached.

Theorem3.Foranyreal,itholds

and,then

.If

iseven.

Wewillgiveatheoreminsection3.5whichsimpliﬁestheveriﬁcationoftheround-ingunitbydecomposingitintosmallerparts.Thistheoremwillseemfairlyobvious

justbecauseweinvestedreasonableeffortinthedeﬁnitionoftheroundingfunction.Incontrasttoourdeﬁnition,theroundingresultin[18]isdeﬁnedas“arepresentablenumberclosestto.Iftherearetwosuchnumbers,onechoosesthenumberwithevensigniﬁcant”.ThiscoincidesobviouslywiththeIEEEstandard.Nevertheless,itisimpracticaltoverifytherounderwiththisinformaldeﬁnition.Theeffortwehavespentonthedeﬁnitionoftheroundingfunctionpaysoffwhenverifyingthehardwareimplementation.

3ImplementingtheProcessor

3.1ForwardingandStallingLogic

Thedesignusesacommonﬁvestagepipelineaspresentedin[1,18].Thepipelinedma-chineisgeneratedbyanautomatictransformationfromasequentialpreparedmachineasdescribedin[17].

fullfull

Fig.1.Theregistersofa-stagepipeline.Thefunctions

representthedatapaths.

Ourdesignfeaturesacompletestallengine[21,18].Incontrasttothestallenginepresentedin[18],itallowsstallingallstagesindividually.Thestallengineistakenfrom[17]withsmallchanges:aclockenablesignalisnolongerused.Thefullbitsareupdatedineverycycleinstead(ﬁgure1).

Thetransitionfunctionforthefullbitsischangedaccordingly;thefullbitofeachstageissetiffthestageisupdatedorstalled.

full

Thecalculationofthesignalsandisnotchangedandtakenfrom[17].The

istheclockenablesignaloftheoutputregistersofstage:theregistersaresignal

updatediffthestageisfullandnotstalled:

full

Lemma3.Givenacycle,thevaluesoftheschedulingfunctionsoftwoadjoiningstagesareeitherequalorthevalueoftheschedulingfunctionofthelaterstageisgreaterbyone.

Lemma4.Thevaluesareequaliffthefullbitofthelaterstageisnotset.

Negatingbothsidesofthelastequationandapplyinglemma3resultsin:

ProofTheproofofthelemmasabovedependsonthestallengine.Itisaninvariant

.proofbyinduction.Lemma2forcycleisshownusinglemma4forcycle

.Lemma3forcycleisshownusinglemma2incycleandlemma4incycle

Lemma4isshownusinglemma2and3incycle.

Duetolackofspace,onlytheinductionstepforlemma2isshownhere:Theclaim

holdsbydeﬁnition.Lethold.Forthecase,forthecase

andtheclaimisshowntheclaimfollowsfromthedeﬁnitionof.For

,whichstatesthattheclaimisequivalentto.usinglemma4forcycle

signals.Thisistruebecauseofthedeﬁnitionofthe

Theorem4isthenshownbyinductionon:theclaimisobviousforstageswhich

),thecorrectnessarenotupdatedinagivencycle.Ifthestageisupdated(i.e.,

ofthesevaluesisarguedbyshowingthecorrectnessoftheinputvaluesofthestage.Anexampleproofusingthelemmasabovefortheinstructionfetchstageisin[17].

3.3Liveness

oftheThelivenesscriterionisformalizedasfollows:foranygivenconﬁguration

speciﬁcationmachine,weprovethattheimplementationmachinecalculatestheseval-holds.ueswithinaﬁniteamountoftime,i.e.,thereisaﬁnitesuchthat

Theproofismadebyshowingthatanyactivestallsignalbecomesde-activewithinaﬁniteamountoftime.Thisisaproofbyinductiononthenumberofstagesbeginningwiththelaststage.

3.4IntegerUnit

Ourdesignfeaturesanintegerunit(ALU).Itsupportsaddition,subtraction,shiftandcompareoperations,andbit-wiseoperations(AND,OR,XOR).TheALUisveriﬁedcompletelywiththetheoremprovingsystemPVS.Thisincludesanarbitrary-sizedcarrylookaheadadder.However,theimplementationandtheproofforthecarrylookaheadadderisincludedonlyinordertoachievecompleteness.Inordertocreatehardware,apre-deﬁnedadderfromthevendorlibraryisused.

AUnpackA’BUnpack

Operand BuseNormShift

B’SigRd

Mul / Div

Sqrt

PostNorm

Rounder

Result Bus

Add / Sub

ExpRd

Result Bus

Fig.2.ToplevelschematicsoftheFPUFig.3.Toplevelschematicsoftherounder

3.5FloatingPointUnit

Figure2showsthetop-levelschematicoftheFPU.TheprocessorfeedspackedIEEEnumbers[19]andintotheFPU.Theunpackercircuitconvertsthesenumbersintothefactoringformatdescribedinsection2.3.Dependingontheoperation,theoperandsandarefedintooneofthefunctionunits.ThelaststageroundstheresultoftheoperationtoarepresentableandpackedIEEEnumber,andplacestheresultontheresult-busoftheprocessor.

Thedesignispipelined,i.e.,thedesignincludesregisterswhichstoreintermediateresults.ThedivisioniscarriedoutusingtheNewton-Raphsonmethod.Thus,thefunc-tionunitformultiplicationanddivisioncontainsloopstofeedbackintermediateresultsforthenextNewton-Raphsoniteration.

Completehardwareschematicsatthegatelevelcanbefoundin[18].Wewillfocusontherounder.Wedemonstrateapartoftheveriﬁcationoftheroundingunitexemplary.Wegiveatheoremwhichdecomposestheroundingfunctionintothreesimplerfunc-tionswhichthenserveasabasisfortheimplementationoftherounder.Thethreefunc-andthepost-normalizationtionsarethenormalizationshift,thesigniﬁcantround

.Figure3showsadecompositionoftheroundinghardwareincorrespondingsub-circuits.Thesub-circuit“ExpRd”roundstoinﬁnity,ifanoverﬂowoccurs.Thispartisnotyetformalized.

wasdeﬁnedastheuniqueIEEE-factoringwithForreals,

insection2.3.

Lemma5.Foranyreal1.2.3.

and

,itholds

iff

,and

Lemma6.Foranyfactoring(notnecessarilyIEEE-factoring)

,itholds

with

1.2.3.

,and

Weassumethattheinputtotherounderisencodedasafactoring,butnotnecessarily

asanIEEE-factoring.Thenormalizationshiftcanthenbecomputedinhardwarebyaleadingzerocountertocomputethelogarithmof,aleft/rightshiftertocompute,andanaddertoadjusttheexponent.

ForIEEE-factorings

,wedeﬁnethesigniﬁcantround

asthesigniﬁcantroundedtofractionalbinarydigits.Themultiplicationwiththesignisnecessarysincetheroundingdecisiondependsonthesign.Inhardware,thesigniﬁcantroundiscomputedbytheexaminationofthelow-orderbitsofthesigniﬁcant.Thistechniqueiscalledstickybitcomputation[18].

Lemma7.ForanyIEEE-factoring1.2.

,itholds

,if,if

isdenormal,isnormal.andanyreal

with

,itholds

Thelemmaisprovenbyunfoldingthedeﬁnitions,andapplyingthefollowinglemma:Lemma8.Foranyintegers

InPVS,thislemmaisprovenautomaticallybythepowerfulproof-strategygrind.beanIEEE-factoring,andlet.IfthesigniﬁcantroundLet

,theresulthastobepost-normalized;thesigniﬁcantissettoyieldsasigniﬁcant

1,andtheexponentisincremented.Thisisaccomplishedbythefunction:

ifif

Thevalueofthefactoringsisobviouslypreservedbythefunction.Thefunctionisimplementedbyanincrementerfortheexponentandanmultiplexerforthesigniﬁ-cant.

Assumethatthesub-circuitsinﬁgure3indeedcomputethecorrespondingfunc-tions.Thenthecorrectnessofthewholerounderfollowsfromthenexttheorem:Theorem5.Foranyfactoring

(notnecessarilyanIEEE-factoring),itholds

Thistheoremisprovenbydeﬁnitionunfolding,theuseofthelemmasabove,andsomerulesonexponentiation.

Theorem5decomposestheveriﬁcationproblemintosmallersub-problemssuchthatthesub-circuitsfromﬁgure3canbeveriﬁedseparately.Thesesub-circuitsarefurtherdecomposedin[18].

4ConvertingMathematicalMachinestoVerilogHDL

TheimplementationaboveisspeciﬁedasmathematicalmachineinthePVSlanguage.Allproofsrelyonthisspeciﬁcation.Thisspeciﬁcationisconvertedintoasynthesiz-ablesubsetofVerilogHDL[22].Thisisdoneautomaticallybyaprogram.Asimilarapproachismadein[23].

Theprogramislimitedtoconvertmathematicalmachines,i.e.,ittakesaconﬁgu-rationset,aninitialconﬁguration,andatransitionfunctionasinputs.Thistoolisnotlimitedtoin-orderdesigns.

5FutureWork

Weareinprogressofextendingthedesignwithamechanismforspeculativeexecutionandpreciseinterrupts.Furthermore,out-of-orderexecutioncapabilitiesareaddedbymeansofaTomasuloscheduler.

Themathematicsoftheﬂoatingpointarithmetichavebeenveriﬁedcompletely.Ourfutureworkistoverifythecorrespondingcircuits.

Acknowledgment

TheauthorswouldliketothankMichaelBosch,MichaelKlein,andJochenPreissforvaluablediscussions.

References

1.J.L.HennessyandD.A.Patterson.ComputerArchitecture:AQuantitativeApproach.Mor-ganKaufmannPublishers,INC.,SanMateo,CA,2ndedition,1996.

2.D.Cyrluk,S.Rajan,N.Shankar,andM.K.Srivas.Effectivetheoremprovingforhardwareveriﬁcation.In2ndInternationalConferenceonTheoremProversinCircuitDesign,1994.3.RaviHosabettu,GaneshGopalakrishnan,andMandayamSrivas.AproofofcorrectnessofaprocessorimplementingTomasulo’salgorithmwithoutareorderbuffer.InCorrectHard-wareDesignandVeriﬁcationMethods:IFIPWG10.5InternatinalConferenceonCorrectHardwareDesignandVeriﬁcationMethods(CHARME),pages8–22.Springer,1999.

4.JunSawadaandWarrenA.Hunt.Resultsoftheveriﬁcationofacomplexpipelinedmachinemodel.InCorrectHardwareDesignandVeriﬁcationMethods:IFIPWG10.5InternatinalConferenceonCorrectHardwareDesignandVeriﬁcationMethods(CHARME),pages313–316.Springer,1999.

5.MattKaufmannandJ.S.Moore.ACL2:Anindustrialstrengthversionofnqthm.InProc.oftheEleventhAnnualConferenceonComputerAssurance,pages23–34.IEEEComputerSocietyPress,1996.

6.ThomasA.Henzinger,ShazQadeer,andSriramK.Rajamani.Youassume,weguarantee:Methodologyandcasestudies.InProc.10thInternationalConferenceonComputer-aidedVeriﬁcation(CAV),1998.

7.K.L.McMillan.VeriﬁcationofanimplementationofTomasulo’salgorithmbycompositionmodelchecking.InProc.10thInternationalConferenceonComputerAidedVeriﬁcation,pages110–121,1998.

8.W.DammandA.Pnueli.Verifyingout-of-orderexecutions.InH.F.LiandD.K.Probst,editors,AdvancesinHardwareDesignandVeriﬁcation:IFIPWG10.5InternatinalCon-ferenceonCorrectHardwareDesignandVeriﬁcationMethods(CHARME),pages23–47.Chapmann&Hall,1997.

9.M.L.McMillan.Veriﬁcationofinﬁnitestatesystemsbycompositionalmodelchecking.InCorrectHardwareDesignandVeriﬁcationMethods:IFIPWG10.5InternatinalConfer-enceonCorrectHardwareDesignandVeriﬁcationMethods(CHARME),pages219–233.Springer,1999.

10.Y.-A.ChenandR.E.Bryant.Veriﬁcationofﬂoating-pointadders.LectureNotesinCom-puterScience,1427,1998.

11.Y.-A.ChenandR.E.Bryant.PHDD:Anefﬁcientgraphrepresentationforﬂoatingpoint

circuitveriﬁcation.InIEEE/ACMInternationalConferenceonComputerAidedDesign;Di-gestofTechnicalPapers(ICCAD’97),pages2–7,Washington-Brussels-Tokyo,November1997.IEEEComputerSocietyPress.

12.Y.-A.Chen,E.Clarke,P.-H.Ho,andY.Hoskote.Veriﬁcationofallcircuitsinaﬂoating-point

unitusingword-levelmodelchecking.LectureNotesinComputerScience,1166,1996.13.M.D.AagaardandC.-J.H.Seger.Theformalveriﬁcationofapipelineddouble-precision

IEEEﬂoating-pointmultiplier.InInternationalConferenceonComputerAidedDesign,pages7–10,LosAlamitos,Ca.,USA,November1995.IEEEComputerSocietyPress.

14.L.Claesen,D.Verkest,andH.DeMan.Aproofofthenon-restoringdivisionalgorithmand

itsimplementationonanALU.InFormalMethodsinSystemDesign,vol.5,pages5–31,1994.

15.J.O’Leary,M.Leeser,J.Hickey,andM.Aagaard.Non-restoringintegersquareroot:Acase

studyindesignbyprincipledoptimization.LectureNotesinComputerScience,901,1995.16.DavidM.Russinoff.AmechanicallycheckedproofofIEEEcomplianceoftheﬂoatingpoint

multiplication,divisionandsquarerootalgorithmsoftheAMD-K7processor.LMSJournalofComputationandMathematics,1:148–200,1998.17.DanielKr¨oning,WolfgangPaul,andSilviaM.M¨uller.Provingthecorrectnessofpipelined

micro-architectures.InKlausWaldschmidtandChristophGrimm,editors,Proc.ofITG/GI/GMM-Workshop”MethodenundBeschreibungssprachenzurModellierungundVer-iﬁkationvonSchaltungenundSystemen”,pages–98.VDEVerlag,2000.18.SilviaM.M¨ullerandWolfgangPaul.ComputerArchitecture:ComplexityandCorrectness.

Springer,2000.

19.InstituteofElectricalandElectronicsEngineers.ANSI/IEEEstandard754–1985,IEEEStan-dardforBinaryFloating-PointArithmetic,1985.forareadableaccountseethearticlebyW.J.Codyetal.intheIEEEMICROJournal,Aug.1984,84–100.

20.PaulS.Miner.DeﬁningtheIEEE-854ﬂoating-pointstandardinPVS.Technicalreport,

NASA,LangleyResearchCenter,1995.

21.WolfgangPaul.RecherarchitekturIISS98,1998.LectureNotes.

22.DonaldE.ThomasandPhilipMoorby.TheVerilogHardwareDescriptionLanguage.

Kluwer,Boston;Dordrecht;London,1991.

23.JamesC.HoeandArvind.Hardwaresynthesisfromtermrewritingsystems.InProc.of

VLSI’99,Lisbon,Portugal,1999.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文