ChristianJacobi,DanielKroening
Dept.14:ComputerScience,UniversityofSaarlandPostBox151150,D-66041Saarbruecken,Germany
email:cj,kroening@cs.uni-sb.de
Abstract.Thispaperpresentsstatusresultsofamicroprocessorverificationproject.Theauthorsverifyacomplete32-bitRISCmicroprocessorincludingthefloatingpointunitandthecontrollogicofthepipeline.Thepaperdescribesaformaldef-initionofa”correct”microprocessor.Thiscorrectnesscriterionisprovenforanimplementationusingformalmethods.AllproofsareverifiedmechanicallybymeansofthetheoremprovingsystemPVS.
1Introduction
Microprocessordesignisanerror-proneprocess.Withincreasingcomplexityofcurrentmicroprocessordesigns,formalverificationhasbecomecrucial.Inordertoachievecompletelyverifieddesigns,adjustingthedesignprocessitselfplaysanimportantrole:themorehigh-levelinformationonthedesignisavailable,thefastertheverificationcanbedone.
Theauthorsre-designedasimpleRISCprocessor,theDLX[1],withrespecttoverifiability.Thedesignincludesthecompletepipecontrolandforwardinglogic.Thefunctionunitsarefullyfeaturedincludingafloatingpointunit.Theyarenotabstractedbymeansofuninterpretedfunctions.Theproofsforthegluelogic,theALU,andfloat-ingpointunitareverifiedusingthetheoremprovingsystemPVS[2].
RelatedWorkRecentpapersshowthecorrectnessofcomplexdesignsorschedulersintheoremprovingsystemssuchasPVS.Hosabettuetal.[3]provebothsafetyandlivenessofTomasulo’salgorithmusingPVS.SwadaandHunt[4]provideanACL2[5]proofofacompletedesignimplementingaTomasuloschedulerwithreorderbuffer.
Henzingeretal.[6]verifyasimplepipelinedprocessorusingamodelchecker.McMillan[7]partlyautomatestheproofbyrefinementofTomasulo’salgorithmpre-sentedin[8]withthehelpofcompositionalmodelchecking.Thistechniqueisimprovedin[9]bytheoremprovingmethodstosupportanarbitraryregistersizeandnumberoffunctionunits.
Therearemanypublicationsontheverificationof(partsof)floatingpointunits.Bryantandhisgroupverifieddifferentfunctionunitsusingmodel-checking[10–12].AagaardandSegerverifiedamultiplierusingmodel-checkingcombinedwiththeoremproving[13].Claesenet.al.andO’Learyet.al.haveusedtheoremproverstoverify
anSRTintegerdivider[14],andanSRTintegersquarerootcircuit[15],respectively.Russinoffhasproventhecorrectnessofthemultiplication,divisionandsquarerootalgorithmsoftheAMDK7processor[16].Mostofthepublicationsciteddonotcoverdenormalnumbers.
ProjectStatusTheverificationofthepipelineandforwardinglogichasreachedahighlevelofautomation.However,theprocessofverifyingthefunctionunitsisnotauto-matedatall.Thefundamentalsofthefloatingpointmathematicsareverifiedalready.Theverificationoftheindividualfloatingpointcircuitsiswork-in-progress.
2TheSpecificationMachine
2.1HardwareModel
Boththespecificationdesignandthehardwarearemodeledasmathematicalmachine.Mathematicalmachinesareacommonmethodtomodelthebehaviorofarbitrarymi-croprocessorsystems.Forthispaper,thedefinitionofthemathematicalmachinefrom
whichconsistsofthe[17]isused:amathematicalmachineisatriple
followingcomponents:
isthesetofallpossibleconfigurationsof.Anelementofiscalledcon-figurationorstateofthemachine.
isaconfigurationof.–Theinitialconfiguration
mapsaconfigurationtoitssuccessor.–Thetransitionfunction–
Asequence
holds.
ofconfigurationsiscalledcomputationof
iff
NotationRegistersareusedinboththespecificationandtheimplementationofami-beafinitesetofregisters.Eachregistercancroprocessor.Let
.haveavaluewithinafinitedomain
Theconfigurationsetconsistsofthedomainsoftheregisters:
Theprojectionfunction
.Letbe
extractsthevalueofaregister
fromaconfiguration.
Letfor
bepartofacomputationofamathematicalmachine.
.Letbeashorthandforthefollowingprojectionon:
isashorthand
Inanalogytothat,let
transitionfunction:
beashorthandforthefollowingprojectiononastate
Asignalisdefinedasamappingfromthesetofconfigurationsintoanarbitrary
:domain
2.2DLXArchitecture
OurdesignimplementstheDLXarchitecture.TheDLXarchitecture[1]featuresaRISCinstructionsetincludedbothintegerandfloatingpointinstructions.Theintegercoreistakenfrom[17]andextendedbyafloatingpointregisterfile(FPR)andfloatingpointinstructionsasdescribedin[18].
2.3CorrectIEEEFloatingPointArithmetic
Ourprimarygoalistheverificationofacompleteprocessor.Thus,weformallyverifythecorrectnessofafloatingpointunit(FPU).Intheprocessorframework,theFPUisamulti-cyclefunctionunit,andcan(almost)beseenasablackbox.TheFPUsupportstheoperationsaddition,subtraction,multiplication,divisionandsquareroot.TheFPUhandlesnormalanddenormalnumbers,specialvalues,traps,andinterrupts.Thisisincontrasttomostpreviousresults,wheredenormalnumbers,trapsandinterruptsaredisregarded.
ThecorrectnesscriterionsfortheFPUaregivenbytheIEEEstandard754[19].ThestandardisinformalwhichmakesitunusablefortheformalverificationoftheFPU.OnethereforehastoformalizetheIEEEstandard;thisformalizationhastopreservethenotionofthestandard.Inherently,onecannotprovetheequivalenceoftheformalandtheinformalspecification.Theformalspecificationshavetoconvinceanybodyoftheircorrectness.WewillgivethespecificationoftheIEEEroundingmodeto
up,roundzeroarenot
ascomplicatedasthemodeto
WecallanIEEE-factoringsemi-representable,ifisrepresentable.Wecallareal(semi-)representable,ifis(semi-)representable.
Representablenumbersexactlycorrespondtotherepresentablenumbersasdefinedinthestandard.Inthefollowing,wewillonlyinvestigatesemi-representablefactorings.Inorderto“round”semi-representablefactoringstorepresentableones,onejusthastodecidewhetheronehastoroundtoinfinityornot.Thiscanbasicallybedonebya
.comparisonofwith
Weproceedwiththedefinitionoftheroundingfunction.Thestandarddefinesthe
roundingmodeto
nearestasfollows:
Nowwehaveadefinitionrelativelyclosetothehardwarebutfarawayfromthespecificationinthestandard.Ononehand,thisenablessimplerimplementationandverificationoftherounder,aswewillseeinsection3.5.Ontheotherhand,itisnotobviousthatthesedefinitionsconformtothetheIEEEstandard.Wegivethreetheoremswhichjustifythisclaim.Theorem1.Foranyreal,
issemi-representable.
Thenexttheoremstatesthattheresultoftheroundingfunctionindeedisanearestrepresentablenumber.
Theorem2.Foranyreal,andanysemi-representableIEEE-factoring
.holds
,it
Thethirdtheoremstatesthatanumberwithleastsignificantbitzeroischosenin
caseofatiebetweenthetwonearestrepresentablenumbers.Thus,wefirstboundthe
.Wethenshowthatthesignificantisevenifthemaxi-distancebetweenandmumdistanceisreached.
Theorem3.Foranyreal,itholds
and,then
.If
iseven.
Wewillgiveatheoreminsection3.5whichsimplifiestheverificationoftheround-ingunitbydecomposingitintosmallerparts.Thistheoremwillseemfairlyobvious
justbecauseweinvestedreasonableeffortinthedefinitionoftheroundingfunction.Incontrasttoourdefinition,theroundingresultin[18]isdefinedas“arepresentablenumberclosestto.Iftherearetwosuchnumbers,onechoosesthenumberwithevensignificant”.ThiscoincidesobviouslywiththeIEEEstandard.Nevertheless,itisimpracticaltoverifytherounderwiththisinformaldefinition.Theeffortwehavespentonthedefinitionoftheroundingfunctionpaysoffwhenverifyingthehardwareimplementation.
3ImplementingtheProcessor
3.1ForwardingandStallingLogic
Thedesignusesacommonfivestagepipelineaspresentedin[1,18].Thepipelinedma-chineisgeneratedbyanautomatictransformationfromasequentialpreparedmachineasdescribedin[17].
1
fullfull
fullfull
fullfull
Fig.1.Theregistersofa-stagepipeline.Thefunctions
to
representthedatapaths.
Ourdesignfeaturesacompletestallengine[21,18].Incontrasttothestallenginepresentedin[18],itallowsstallingallstagesindividually.Thestallengineistakenfrom[17]withsmallchanges:aclockenablesignalisnolongerused.Thefullbitsareupdatedineverycycleinstead(figure1).
Thetransitionfunctionforthefullbitsischangedaccordingly;thefullbitofeachstageissetiffthestageisupdatedorstalled.
full
Thecalculationofthesignalsandisnotchangedandtakenfrom[17].The
istheclockenablesignaloftheoutputregistersofstage:theregistersaresignal
updatediffthestageisfullandnotstalled:
full
Lemma3.Givenacycle,thevaluesoftheschedulingfunctionsoftwoadjoiningstagesareeitherequalorthevalueoftheschedulingfunctionofthelaterstageisgreaterbyone.
Lemma4.Thevaluesareequaliffthefullbitofthelaterstageisnotset.
Negatingbothsidesofthelastequationandapplyinglemma3resultsin:
ProofTheproofofthelemmasabovedependsonthestallengine.Itisaninvariant
.proofbyinduction.Lemma2forcycleisshownusinglemma4forcycle
.Lemma3forcycleisshownusinglemma2incycleandlemma4incycle
Lemma4isshownusinglemma2and3incycle.
Duetolackofspace,onlytheinductionstepforlemma2isshownhere:Theclaim
holdsbydefinition.Lethold.Forthecase,forthecase
andtheclaimisshowntheclaimfollowsfromthedefinitionof.For
,whichstatesthattheclaimisequivalentto.usinglemma4forcycle
signals.Thisistruebecauseofthedefinitionofthe
Theorem4isthenshownbyinductionon:theclaimisobviousforstageswhich
),thecorrectnessarenotupdatedinagivencycle.Ifthestageisupdated(i.e.,
ofthesevaluesisarguedbyshowingthecorrectnessoftheinputvaluesofthestage.Anexampleproofusingthelemmasabovefortheinstructionfetchstageisin[17].
3.3Liveness
oftheThelivenesscriterionisformalizedasfollows:foranygivenconfiguration
specificationmachine,weprovethattheimplementationmachinecalculatestheseval-holds.ueswithinafiniteamountoftime,i.e.,thereisafinitesuchthat
Theproofismadebyshowingthatanyactivestallsignalbecomesde-activewithinafiniteamountoftime.Thisisaproofbyinductiononthenumberofstagesbeginningwiththelaststage.
3.4IntegerUnit
Ourdesignfeaturesanintegerunit(ALU).Itsupportsaddition,subtraction,shiftandcompareoperations,andbit-wiseoperations(AND,OR,XOR).TheALUisverifiedcompletelywiththetheoremprovingsystemPVS.Thisincludesanarbitrary-sizedcarrylookaheadadder.However,theimplementationandtheproofforthecarrylookaheadadderisincludedonlyinordertoachievecompleteness.Inordertocreatehardware,apre-definedadderfromthevendorlibraryisused.
AUnpackA’BUnpack
Operand BuseNormShift
fs
B’SigRd
Mul / Div
Sqrt
PostNorm
Rounder
Result Bus
Add / Sub
ExpRd
Result Bus
Fig.2.ToplevelschematicsoftheFPUFig.3.Toplevelschematicsoftherounder
3.5FloatingPointUnit
Figure2showsthetop-levelschematicoftheFPU.TheprocessorfeedspackedIEEEnumbers[19]andintotheFPU.Theunpackercircuitconvertsthesenumbersintothefactoringformatdescribedinsection2.3.Dependingontheoperation,theoperandsandarefedintooneofthefunctionunits.ThelaststageroundstheresultoftheoperationtoarepresentableandpackedIEEEnumber,andplacestheresultontheresult-busoftheprocessor.
Thedesignispipelined,i.e.,thedesignincludesregisterswhichstoreintermediateresults.ThedivisioniscarriedoutusingtheNewton-Raphsonmethod.Thus,thefunc-tionunitformultiplicationanddivisioncontainsloopstofeedbackintermediateresultsforthenextNewton-Raphsoniteration.
Completehardwareschematicsatthegatelevelcanbefoundin[18].Wewillfocusontherounder.Wedemonstrateapartoftheverificationoftheroundingunitexemplary.Wegiveatheoremwhichdecomposestheroundingfunctionintothreesimplerfunc-tionswhichthenserveasabasisfortheimplementationoftherounder.Thethreefunc-andthepost-normalizationtionsarethenormalizationshift,thesignificantround
.Figure3showsadecompositionoftheroundinghardwareincorrespondingsub-circuits.Thesub-circuit“ExpRd”roundstoinfinity,ifanoverflowoccurs.Thispartisnotyetformalized.
wasdefinedastheuniqueIEEE-factoringwithForreals,
insection2.3.
Lemma5.Foranyreal1.2.3.
and
,itholds
iff
,
,and
.
Lemma6.Foranyfactoring(notnecessarilyIEEE-factoring)
,itholds
with
1.2.3.
,
.
,and
Weassumethattheinputtotherounderisencodedasafactoring,butnotnecessarily
asanIEEE-factoring.Thenormalizationshiftcanthenbecomputedinhardwarebyaleadingzerocountertocomputethelogarithmof,aleft/rightshiftertocompute,andanaddertoadjusttheexponent.
ForIEEE-factorings
,wedefinethesignificantround
asthesignificantroundedtofractionalbinarydigits.Themultiplicationwiththesignisnecessarysincetheroundingdecisiondependsonthesign.Inhardware,thesignificantroundiscomputedbytheexaminationofthelow-orderbitsofthesignificant.Thistechniqueiscalledstickybitcomputation[18].
Lemma7.ForanyIEEE-factoring1.2.
,itholds
,if,if
isdenormal,isnormal.andanyreal
with
,itholds
Thelemmaisprovenbyunfoldingthedefinitions,andapplyingthefollowinglemma:Lemma8.Foranyintegers
.
InPVS,thislemmaisprovenautomaticallybythepowerfulproof-strategygrind.beanIEEE-factoring,andlet.IfthesignificantroundLet
,theresulthastobepost-normalized;thesignificantissettoyieldsasignificant
1,andtheexponentisincremented.Thisisaccomplishedbythefunction:
ifif
Thevalueofthefactoringsisobviouslypreservedbythefunction.Thefunctionisimplementedbyanincrementerfortheexponentandanmultiplexerforthesignifi-cant.
Assumethatthesub-circuitsinfigure3indeedcomputethecorrespondingfunc-tions.Thenthecorrectnessofthewholerounderfollowsfromthenexttheorem:Theorem5.Foranyfactoring
(notnecessarilyanIEEE-factoring),itholds
Thistheoremisprovenbydefinitionunfolding,theuseofthelemmasabove,andsomerulesonexponentiation.
Theorem5decomposestheverificationproblemintosmallersub-problemssuchthatthesub-circuitsfromfigure3canbeverifiedseparately.Thesesub-circuitsarefurtherdecomposedin[18].
4ConvertingMathematicalMachinestoVerilogHDL
TheimplementationaboveisspecifiedasmathematicalmachineinthePVSlanguage.Allproofsrelyonthisspecification.Thisspecificationisconvertedintoasynthesiz-ablesubsetofVerilogHDL[22].Thisisdoneautomaticallybyaprogram.Asimilarapproachismadein[23].
Theprogramislimitedtoconvertmathematicalmachines,i.e.,ittakesaconfigu-rationset,aninitialconfiguration,andatransitionfunctionasinputs.Thistoolisnotlimitedtoin-orderdesigns.
5FutureWork
Weareinprogressofextendingthedesignwithamechanismforspeculativeexecutionandpreciseinterrupts.Furthermore,out-of-orderexecutioncapabilitiesareaddedbymeansofaTomasuloscheduler.
Themathematicsofthefloatingpointarithmetichavebeenverifiedcompletely.Ourfutureworkistoverifythecorrespondingcircuits.
Acknowledgment
TheauthorswouldliketothankMichaelBosch,MichaelKlein,andJochenPreissforvaluablediscussions.
References
1.J.L.HennessyandD.A.Patterson.ComputerArchitecture:AQuantitativeApproach.Mor-ganKaufmannPublishers,INC.,SanMateo,CA,2ndedition,1996.
2.D.Cyrluk,S.Rajan,N.Shankar,andM.K.Srivas.Effectivetheoremprovingforhardwareverification.In2ndInternationalConferenceonTheoremProversinCircuitDesign,1994.3.RaviHosabettu,GaneshGopalakrishnan,andMandayamSrivas.AproofofcorrectnessofaprocessorimplementingTomasulo’salgorithmwithoutareorderbuffer.InCorrectHard-wareDesignandVerificationMethods:IFIPWG10.5InternatinalConferenceonCorrectHardwareDesignandVerificationMethods(CHARME),pages8–22.Springer,1999.
4.JunSawadaandWarrenA.Hunt.Resultsoftheverificationofacomplexpipelinedmachinemodel.InCorrectHardwareDesignandVerificationMethods:IFIPWG10.5InternatinalConferenceonCorrectHardwareDesignandVerificationMethods(CHARME),pages313–316.Springer,1999.
5.MattKaufmannandJ.S.Moore.ACL2:Anindustrialstrengthversionofnqthm.InProc.oftheEleventhAnnualConferenceonComputerAssurance,pages23–34.IEEEComputerSocietyPress,1996.
6.ThomasA.Henzinger,ShazQadeer,andSriramK.Rajamani.Youassume,weguarantee:Methodologyandcasestudies.InProc.10thInternationalConferenceonComputer-aidedVerification(CAV),1998.
7.K.L.McMillan.VerificationofanimplementationofTomasulo’salgorithmbycompositionmodelchecking.InProc.10thInternationalConferenceonComputerAidedVerification,pages110–121,1998.
8.W.DammandA.Pnueli.Verifyingout-of-orderexecutions.InH.F.LiandD.K.Probst,editors,AdvancesinHardwareDesignandVerification:IFIPWG10.5InternatinalCon-ferenceonCorrectHardwareDesignandVerificationMethods(CHARME),pages23–47.Chapmann&Hall,1997.
9.M.L.McMillan.Verificationofinfinitestatesystemsbycompositionalmodelchecking.InCorrectHardwareDesignandVerificationMethods:IFIPWG10.5InternatinalConfer-enceonCorrectHardwareDesignandVerificationMethods(CHARME),pages219–233.Springer,1999.
10.Y.-A.ChenandR.E.Bryant.Verificationoffloating-pointadders.LectureNotesinCom-puterScience,1427,1998.
11.Y.-A.ChenandR.E.Bryant.PHDD:Anefficientgraphrepresentationforfloatingpoint
circuitverification.InIEEE/ACMInternationalConferenceonComputerAidedDesign;Di-gestofTechnicalPapers(ICCAD’97),pages2–7,Washington-Brussels-Tokyo,November1997.IEEEComputerSocietyPress.
12.Y.-A.Chen,E.Clarke,P.-H.Ho,andY.Hoskote.Verificationofallcircuitsinafloating-point
unitusingword-levelmodelchecking.LectureNotesinComputerScience,1166,1996.13.M.D.AagaardandC.-J.H.Seger.Theformalverificationofapipelineddouble-precision
IEEEfloating-pointmultiplier.InInternationalConferenceonComputerAidedDesign,pages7–10,LosAlamitos,Ca.,USA,November1995.IEEEComputerSocietyPress.
14.L.Claesen,D.Verkest,andH.DeMan.Aproofofthenon-restoringdivisionalgorithmand
itsimplementationonanALU.InFormalMethodsinSystemDesign,vol.5,pages5–31,1994.
15.J.O’Leary,M.Leeser,J.Hickey,andM.Aagaard.Non-restoringintegersquareroot:Acase
studyindesignbyprincipledoptimization.LectureNotesinComputerScience,901,1995.16.DavidM.Russinoff.AmechanicallycheckedproofofIEEEcomplianceofthefloatingpoint
multiplication,divisionandsquarerootalgorithmsoftheAMD-K7processor.LMSJournalofComputationandMathematics,1:148–200,1998.17.DanielKr¨oning,WolfgangPaul,andSilviaM.M¨uller.Provingthecorrectnessofpipelined
micro-architectures.InKlausWaldschmidtandChristophGrimm,editors,Proc.ofITG/GI/GMM-Workshop”MethodenundBeschreibungssprachenzurModellierungundVer-ifikationvonSchaltungenundSystemen”,pages–98.VDEVerlag,2000.18.SilviaM.M¨ullerandWolfgangPaul.ComputerArchitecture:ComplexityandCorrectness.
Springer,2000.
19.InstituteofElectricalandElectronicsEngineers.ANSI/IEEEstandard754–1985,IEEEStan-dardforBinaryFloating-PointArithmetic,1985.forareadableaccountseethearticlebyW.J.Codyetal.intheIEEEMICROJournal,Aug.1984,84–100.
20.PaulS.Miner.DefiningtheIEEE-854floating-pointstandardinPVS.Technicalreport,
NASA,LangleyResearchCenter,1995.
21.WolfgangPaul.RecherarchitekturIISS98,1998.LectureNotes.
22.DonaldE.ThomasandPhilipMoorby.TheVerilogHardwareDescriptionLanguage.
Kluwer,Boston;Dordrecht;London,1991.
23.JamesC.HoeandArvind.Hardwaresynthesisfromtermrewritingsystems.InProc.of
VLSI’99,Lisbon,Portugal,1999.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- 7swz.com 版权所有 赣ICP备2024042798号-8
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务