Saturday, January 8, 2011

DigitalLibraryArchitectures

SupportingInformationAccessinNextGeneration
DigitalLibraryArchitectures
_
IngoFrommholz,PredragKneˇzevi´c,BhaskarMehta,ClaudiaNieder´ee,ThomasRisse,
andUlrichThiel
FraunhoferIPSI
IntegratedPublicationandInformationSystemsInstitute
Dolivostrasse15,64293Darmstadt,Germany
{
frommholz

knezevic

mehta

niederee

risse

thiel
}
@ipsi.fhg.de
Abstract. CurrentdevelopmentsonService-orientedArchitectures,Peer-to-Peer
and Gridcomputing promisemoreopenandflexiblearchitecturesfordigitalli-
braries.TheywillopenDLtechnologytoawiderclientele,allowfasteradaptabil-
ityandenable theusageoffederativemodels oncontentand serviceprovision.
ThesetechnologiesrisenewchallengesfortherealizationofDLfunctionalities,
whicharerootedintheincreasedheterogeneityofcontent,servicesandmetadata,
inthehigherdegreeofdistributionanddynamics,aswellasintheomissionofa
centralcontrolinstance.Thispaperdiscussestheseopportunitiesandchallenges
forthreecentraltypesofDLfunctionalityrevolvingaroundinformationaccess:
metadatamanagement,retrievalfunctionality,andpersonalizationservices.
1
Introduction
Currently,thereisaconsiderableamountofR&Dactivityindevelopingviablestrate-
giestouseinnovativetechnologiesandparadigmslikePeer-to-PeerNetworking,Grid,
andService-orientedArchitecturesindigitallibraries(seee.g.theEuropeanIntegrated
Projects BRICKS [1] and DILIGENT).The promise is that these efforts will lead to
moreopenandflexibledigitallibraryarchitecturesthat:
– openupdigitallibrary(DL)technologytoawiderclientelebyenablingmorecost-
effectiveandbettertailoreddigitallibraries,
– allowfasteradaptabilitytodevelopmentsinDLservicesandITtechnologies,and
– enableusageofdynamicfederativemodelsofcontentandserviceprovisioninvolv-
ingawiderangeofdistributedcontentandserviceproviders.
The use of Service-oriented Architectures, Grid infrastructures, and the Peer-to-
Peerapproachforcontentandserviceprovisionhasimplicationsfortherealizationof
enhancedDLfunctionality.Theseimplicationsaremainlyrootedinincreasedhetero-
geneityofcontent,servicesandmetadata,inthehigherdegreeofdistributionanddy-
namics,aswellasintheomissionofacentralcontrolinstance.Ononehand,theseare
opportunitiesforbetterandmoremultifariousDLservices;ontheotherhand,theseare
_
ThisworkispartlyfundedbytheEuropeanCommissionunderBRICKS(IST507457),COL-
LATE(IST-1999-20882),DILIGENTandVIKEF(IST-507173)


2
newchallengestoensuringlong-term,reliable,andquality-ensuredDLserviceprovi-
sionthatalsoexploitsthetechnologypromises.Thispaperdiscussestheseopportunities
andchallengesforthreecentraltypesofDLfunctionalityrevolvingaroundinformation
access:metadatamanagement,retrievalfunctionality,andpersonalizationservices.
Therestofthispaperisstructuredasfollows:Section2presentsthekeyideasof
nextgenerationDLarchitecturesbasedonexemplaryRTDprojects.Section3discusses
howthesenewideasinfluenceinformationaccessintheareasofmetadatamanagement,
informationretrieval,andpersonalizationsupport.Relatedworkintheseareasiscon-
sideredinsection4.Thepaperconcludeswithasummaryofthepaper’skeyissues.
2
NextGenerationDigitalLibraryArchitectures
CurrentplansfornextgenerationDLarchitecturesareaimingforatransitionfromthe
DLasanintegrated,centrallycontrolledsystemtoadynamicconfigurablefederationof
DLservicesandinformationcollections.Thistransitionisinspiredbynewtechnology
trends and developments.This includes technologies like Web services and the Grid
as well as the success of new paradigms like Peer-to-Peer Networking and Service-
orientedArchitectures.Thetransitionisalsodrivenbytheneedsofthe”DLmarket”:
– betterandadaptivetailoringofthecontentandserviceofferofaDLtotheneeds
oftherespectivecommunityaswellastothecurrentserviceandcontentoffer;
– more systematic exploitation of existing resources like information collections,
metadatacollections,services,andcomputationalresources;
– openingupofDLtechnologytoawiderclientelebyenablingmorecost-effective
digitallibraries.
TomaketheseideasmoretangiblewediscussthreeRTDprojectsinthefieldand
discusstherelationshiptoupcominge-Scienceactivties.
2.1
VirtualDigitalLibrariesinaGrid-basedDLInfrastructure
DILIGENT is an Integrated Project within the IST 6th Framework Programme.It’s
1
objectiveis ”to create an advancedtest-bedthat will allow membersof dynamicvir-
tuale-Scienceorganizationstoaccesssharedknowledgeandtocollaborateinasecure,
coordinated,dynamicandcost-effectiveway.”
TheDILIGENTtestbedwillenablethedynamiccreationandmanagementofVir-
tual Digital Libraries (VDLs) on top of a shared Grid-enabledDL infrastructure,the
DILIGENTinfrastructure.VDLsareDLstailoredtothesupportofspecifice-Science
communitiesandworkgroups.ForcreatingaVDL,DLservices,contentcollections,
metadatacollectionsareconsideredasGridresourcesandareselected,configured,and
integratedintoprocessesusingtheservicesoftheDILIGENTinfrastructure.Thisin-
frastructurebuilds uponan advancedunderlyingGrid infrastructureas it is currently
evolving e.g. in the EGEE project . Such a Grid infrastructure will already provide
2
partsofthefunctionalityrequiredforDILIGENT.Thisincludesthedynamicallocation
1
DILIGENT-ADIgitalLibraryInfrastructureonGridENabledTechnology
2
http://public.eu-egee.org


3
of resources, support for cross-organizational resource sharing, and a basic security
infrastructure.ForeffectivelysupportingDLs, additionalserviceslike supportforre-
dundantstorageandautomaticdatadistribution,metadatabroker,metadataandcontent
management,advancedresourcebrokers,approachesfor ensuringcontentsecurityin
distributed environments and the management of content and community workflows
arerquiredinadditiontoservicesthatsupportthecreationandmanagementofVDLs.
AfurtherprojectchallengearesystematicmethodtomakethetreasureofexistingDL
servicesandcollectionsutilizableasGridresourcesintheDILIGENTinfrastructure.
TheDILIGENTprojectwillresultinaGrid-enabledDLtestbedthatwillbevali-
datedbytwocomplemtaryreal-lifeapplicationscenarios:onefromtheCulturalHer-
itagedomainandonefromtheenvironmentale-Sciencedomain.
2.2
Service-orientedandDecentralizedDLInfrastructure
TheaimoftheBRICKS IntegratedProject[1]istodesign,developandmaintainauser
3
andservice-orientedspacetoshareknowledgeandresourcesintheCulturalHeritage
domain. The target audience is very broad and heterogeneous and involves cultural
heritageandeducationalinstitutions,researchcommunity,industry,andcitizens.
Such high level of heterogeneity cannot be handled with the existing centralized
DLarchitectures.TheBRICKSarchitecturewillreducethecosttojointhesystem,i.e.
thesystemwillreuseexistingcommunicationchannelsandcontentofalreadyinstalled
DLs.Also,theBRICKSmembershipwillbeflexible,suchthatpartiescanjoinorleave
thesystematanypointintimewithoutadministrativeoverheads.TheBRICKSproject
willdefineadecentralized,service-orientedinfrastructurethatusesInternetasaback-
boneandfulfillstherequirementsofexpandability,scalabilityandinteroperability.
Withrespecttoaccessfunctionality,BRICKSprovidesappropriatetask-basedfunc-
tionalityforindexing/annotationandcollaborativeactivities e.g.forpreparinga joint
multimediapublication.An automaticannotationservice will enableusers to request
backgroundinformation,evenifitemshavenotbeenannotatedbyotherusersyet.By
selectingappropriateitems,suchasdefinitionsofconcepts,surveyarticlesormapsof
relevantgeographicalareas, the service exploitsthe currentlyfocussed items andthe
user’sgoalsexpressedintheuserprofile.Inaddition,thelinkinginformation,whichis
generateddynamically,mustbeintegratedintothedocuments.Thedesignoftheaccess
functionalityisinfluencedbyourexperiencesinthe5thFrameworkprojectCOLLATE.
2.3
COLLATE:AWeb-basedenvironmentfordocument-centeredcollaboration
Designed as a content- and context-based knowledge working environment for dis-
tributed user groups, the COLLATE system supports both individual work and col-
laborationofdomainexpertswithmaterialinthedatarepository.Theexampleapplica-
tionfocusesonhistoricfilmdocumentation,butthedevelopedtoolsaredesignedtobe
genericandassuchadaptabletoothercontentdomainsandapplicationtypes.Thisis
achievedbymodel-basedmodules.
3
BRICKS-BuildingResourcesforIntegratedCulturalKnowledgeServices


4
Thesystemsupportscollaborativeactivitiessuchascreatingajointpublicationor
assembling and creating material for a (virtual) exhibition, contributing unpublished
partsofworkintheformofextendedannotationsandcommentaries.Automaticindex-
ingoftextualandpictorialpartsofadocumentcanbeinvoked.Automaticlayoutanaly-
sisforscanneddocumentscanbeusedtolinkanannotationofindividualsegments.As
amultifunctionalmeansofin-depthanalysis,annotationscanbemadeindividuallybut
alsocollaboratively,forexampleintheformofannotationofannotations,collaborative
evaluation,andcomparisonofdocuments.Throughinterrelatedannotationsuserscan
enterintoadiscourseontheinterpretationofdocumentsanddocumentpassages.
The COLLATE collaboratoryis a multifunctionalsoftware package integratinga
largevarietyoffunctionalitiesthatareprovidedbycooperatingsoftwaremodulesre-
siding on different servers. It can be regardedas a prototypical implementation of a
decentralized,Service-orientedDLarchitecturewhichservesasatestbedforthecollab-
orativeuseofdocumentsandcollectionsintheHumanities.Thecollaborativecreation
ofannotationcontextsfordocumentsoffersnewopportunitiesforimprovingtheaccess
functionality,aswewillillustratelateron.
2.4
NextGenerationDLArchitecturesande-Science
Scientific practice is increasingly reliant on data-intensive research and international
collaboration enabled by computer networks. The technology deployed in such sce-
nariosallowsforhighbandwidthcommunicationnetworks,andbylinkingcomputers
in”Grids”placesconsiderablymorepowerfulcomputingresourcesisattheirdisposal
thanasingleinstitutioncouldafford.Ifweviewe-Scienceasbeingprimarilymotivated
uptonowbynotionsofresourcesharingforcomputationallyintensiveprocesses(e.g.
simulations,visualisation,datamining)aneedisemergingfornewapproaches,brought
up by ever more complex procedures, which, on the one hand, assume the reuse of
workflows,dataandinformationand,ontheotherhand,shouldbeabletosupportcol-
laborationinvirtualteams.Futureconceptsofe-Sciencewillbelessfocussedondata
andcomputingresources,butwillincludeservicesontheknowledgeandorganizational
levelsaswell.EmbeddingfutureDLarchitecturesinanemerginge-Scienceinfrastruc-
ture will meet these requirementsby providingaccess to informationand knowledge
sources,andappropriatecollaborationsupportontopoftheGrid-basedinfrastructure.
3
InformationAccessinNextGenerationDLArchitectures
Adecentralized,service-orientedarchitectureposesnewchallengestothetechnologies
employedforinformationaccess.DLsbasedonsuchanarchitectureshould,forexam-
ple,notonlyprovideaccessandretrievalfunctionalityforthedocumentsresidingon
thelocalpeer,butshouldalsoconsiderotherpeerswhichmighthostrelevantdocument
w.r.t.aquery.Inthefollowing,wewilloutlinepossibleapproachesforenhancedser-
vicesforinformationaccess.Suchserviceswillutilizethefunctionsofadecentralized
metadatamanagementensuringtheavailabilityofalldocuments(andtheirparts)while
reducingoverheadcosts. Retrieval functionscan be improvedbytaking into account


5
P2P-DOM
DHT Abstraction Layer
Index Manager
DHT
Network Layers
Query Engine
Applications
Fig.1.DecentralizedXMLStorageArchitecture
the annotationalcontexts of documentsemergingfor the collaborativeprocess of in-
terpretinganddiscussingitemsofinterestsbyagroupofusers.Inaddition,individual
users‘contextscanbeusedtopersonalizetheaccessservices.
3.1
DecentralizedMetadataManagement
DLsusuallyliketokeepcontentundercontrolintheirlocalrepositories.Onthecon-
trary,metadatashouldbeavailableforallparties,storedinsomecentralplaceaccessible
foreverybody.Decentralizedarchitecturesby definitionsavoid havingcentralpoints,
for the followingreasons: theyare candidatesingle pointof failure andperformance
bottleneck.Therefore,metadata must be spread in the community.A na¨ıve approach
formetadatasearchingwouldbetodistributequeriestoallmembers,butitisobvious
thatthesolutionisunscalable.Hence,efficientmetadataaccessandqueryingarevery
importantchallengeswithingthenewdecentralizedsettings.
OurproposaltothesechallengesisadecentralizedPeer-to-Peerdatastorethatwill
beused formanagingXML-encodedmetadata.It balancesresourceusagewithinthe
community,hashighdataavailability(i.e.dataareaccessibleevenifcreatordisappears
fromthesystem,e.g.systemfault,networkpartitioning,orgoingoffline),isupdateable
(i.e.storeddatacanbemodifiedduringthesystemlifetime),andsupportsapowerful
querylanguage(e.gXPath/XQuery).
XML documentsare split into finer pieces that arespread withinthe community.
Thedocumentsarecreatedandmodifiedbythecommunitymembers,andcanbeac-
cessed from any peer in a uniform way, e.g. a peer does not have to know anything
aboutthedataallocation.Uniformaccessandbalancedstorageusageareachievedby
usingaDHT(DistributedHashTable)Overlay[2]andhavinguniqueIDsfordifferent
documentparts.
Figure1 shows the proposedstorage architecture,whereall layers existon every
peer.ThedatastoreisaccessedthroughtheP2P-DOMcomponentorbyusingthequery
enginethatcouldbesupportedbyanoptionalindexmanager.Amoredetaileddiscus-
sionabouttheproposedapproach,challengesandopenissuescanbefoundin[3].
Inthe rest of the subsection,we are givingmoredetails howthe proposeddatas-
torecouldbeusedformanagingservicemetadata,whichareanadditionaltypeofDL
metadataintroducedbyService-orientedArchitectures.


6
Service metadata describe service functionalities, interfaces and other properties.
These meta-informationare usuallyencodedby usingWSDL (Web ServiceDescrip-
tionLanguage[4])andpublishedto anUDDI(UniversalDescription,Discoveryand
Integration[5])servicedirectory.Servicediscoveryqueriesareusuallymorecomplex
thansimplenamematching,i.e.theycontainqualified,rangeand/orbooleanpredicates.
In order to realize a decentralized service directory with advancedquery mecha-
nisms,thecommunityofserviceproviderswillcreateandmaintaininthedecentralized
P2Pdatastoreapooloftheservicedescriptions.Everyservicewillbeabletomodify
its descriptionduringthelifetimeandtosearchforneededservices.Queryexecution
willbespreadatmanypeers,thequeryoriginatorwillonlygetthefinalresultback.
Atthesametime,duetouniformdataaccess,newcommunitymemberscanstartus-
ingtheservicedirectoryimmediatelyafterjoiningthesystemwithoutadditionalsetup
andadministration.Amemberdecisiontoleavethecommunitywillnotmakeanyinflu-
encefortherestofthesystem,becausedataarereplicated.Evenifnetworkpartitioning
happens,theservicedirectorywouldprovideaccesstoservicemetadataavailableinthe
partitionallowingsomepartiestocontinuewithworkwithoutinterruption.
Fordetailsabouttheuseofthedecentralizeddatastoreinotherscenariossee[6].
3.2
DecentralizedContext-basedInformationRetrieval
DLsbasedonadecentralisedarchitectureshouldnotonlyprovideaccessandretrieval
functionalityforthedocumentsresidingonthelocalpeer,butshouldalsoconsiderother
peerswhichmighthostrelevantdocumentw.r.t.aquery.Itisclearthatforascenario
likedescribedaboveappropriatesearchfunctionalityhastobedefined.Inthefollowing,
wewilloutlinepossibleapproachesforenhancedretrievalservices.
ServicesInordertobeabletoabstractfromtheunderlyinginfrastructure,retrieval
functionalityshouldbeimplementedasaservicewithapredefinedAPIandbehaviour.
This has the advantagethat other peers are able to query the local repository,which
isanimportantfeatureforenablingP2PIR.AnexampleWebServicespecificationfor
searchandretrievalisSRW .Itconsiderscontent-basedretrievalfunctionality,butlacks
4
context-basedfeaturesasproposedabove.Whenperformingretrievalbasedonthean-
notationcontext(seebelow),suchcontextinformationshouldbecontainedintheresult
setinordertoelucidatewhyanitemwasretrieved.SoacommonAPIforqueries,re-
sults and indexingrequests has to be identified which is capable of taking advanced
queriesandcontextinformationintoaccount.
Annotation Context Annotations are a certain kind of metadata providingsome
information about the annotated document. They can contain content about content
(e.g., interpretations, comments), other informationlike judgements, or referencesto
otherdocuments[7].Annotationscanbeeithermanuallyorautomaticallycreated.
Manualannotationsrangefrompersonaltosharedtopublicones.Theycaninclude
personalnotes,e.g.,forcomprehension,andwholediscussionsaboutdocuments[8,9].
Annotationsarebuildingblocksforcollaboration.Inadistributed,decentralizedenvi-
ronment,especiallysharedandpublicannotationsposeachallengetotheunderlying
4
http://www.loc.gov/z3950/agency/zing/srw/


7
services.Userscancreatesharedandpublicannotationsresidingontheirpeers,butthis
datahastobespreadtootherpeersaswell.
Byautomaticannotations,wemeantheautomaticcreationandmaintenanceofan-
notationsconsistingoflinkstoandsummariesofdocumentsonotherpeerswhichare
similar to documents residing on the local peer. Such annotations constitute a con-
textinwhichdocumentsonapeerareembedded.Foreachdocument,agentscouldbe
triggered to periodically update the information at hand, similar to the internal link-
ingmethodslikesimilaritysearch,enrichmentandquerygenerationproposedin[10].
P2PIR methods can possibly be applied for this. The underlying assumption is that
a user stores potential interesting documents on her peer and is interested in similar
publications.Automaticannotationscanbecreatedw.r.t.severalaspects.Forinstance,
topical similar documentscan be sought after. Another interesting kind of automatic
annotationcanbeextractedfromthesurroundingsofacitation.Ifadocumentresiding
on another peer cites a document on the local peer, the surroundingsof this citation
usuallycontainsomecommentsabouttheciteddocument(similarasreportedin[11]).
Sinceonlyannotationstodocumentsresidingonthepeerarecreated,storagecostscan
bekeptlow.Regularupdatesperformedbyagentskeeptheuserinformed.
Annotations, either manual or automatic ones, constitute a certain kind of docu-
mentcontext.Annotation-basedretrievalmethods[8]canemploytheannotationcon-
textwithouttheneedto actualaccess otherpeers.Since annotations,beingmanually
orautomaticallycreated,containadditionalinformationaboutthedocument,weassert
thatannotation-basedretrievalfunctionsboostretrievaleffectiveness.Futureworkwill
showifthisassumptionholds.Usingannotationsforinformationretrievalinadecen-
tralizedenvironmenthastheadvantagethatannotationsarelocallyavailable,butreflect
informationlyingonotherpeers.Inthisway,annotationscreatenewaccessstructures
which help adressing problems arising when performing information retrieval on an
underlyingP2Pinfrastructure.
3.3
Cross-ServicePersonalization
PersonalizationapproachesinDLsdynamicallyadaptthecommunity-orientedservice
andcontentofferingsofaDLtothepreferencesandrequirementsofindividuals[12].
They enable more targeted informationaccess by collecting information about users
andbyusingtheseusermodels(alsocalleduserprofiles)ininformationmediation.
Personalizationtypicallycomesasanintegralpartofalargersystem.Userprofiles
arecollectedbasedonagoodknowledgeaboutthemeaningofuserbehaviorandper-
sonalizationactivitiesaretailoredtothefunctionalityoftherespectivesystem.Within
anext-generationdistributedDLenvironment,whichisratheradynamicfederationof
libraryservicesthanauniformsystem,thereareatleasttwowaystointroduceperson-
alization.Inthesimplecase,eachservicecomponentseparatelytakescareofitsperson-
alizationindependentlycollectinginformationaboutusers.A morefruitfulapproach,
however,istoachievepersonalizationacrosstheboundariesofindividualservices,i.e.,
cross-systemor,moreprecisely,cross-servicepersonalization.Inthiscase,personaliza-
tionreliesonamorecomprehensivepictureoftheusercollectedfromhisinteraction
withdifferentlibraryservices.


8
Cross-servicePersonalizationChallengesCross-servicepersonalizationraisesthe
followingchallenges:Howtobringtogethertheinformationaboutauserandhisinter-
actionscollectedbythedifferentservicesinacomprehensivewayandmakeup-to-date
information about the user available? How to manage, update, and disseminate user
models to make them accessible to the different services? How to support (at least
partial) interpretationof the user model in a heterogeneous,and dynamically chang-
ing DL service environment?This requiresa sharedunderlyingunderstandingof the
usermodel.Furthermore,itraisesissuesofprivacyandsecurity,sincepersonaldatais
movedaroundinadistributedsystem.
Approaches to Cross-Service Personalization We identiÞed two principle ap-
proacheswhichdifferfromeachotherin theirarchitecture.Aflexibleandextensible
user model that can capture various characteristics of the user and his/her context is
inthecoreofbothapproaches.Wecalltheoperationalizationofsuchamodelcontext
passport [13] in what follows, implying that it is accompanies the user and is Ópre-
sented”toservicestoenablepersonalizedsupport.Theideaofthecontextpassportis
discussedinmoredetailafterpresentingthetwoapproaches:
Adaptorapproach: Theadaptorapproachreliesontheideasofwrapperarchitectures.
Akindofwrapperisusedtotranslateinformationaccessoperationsintoperson-
alizedoperationsbasedontheinformationcollectedinthecontextpassport.The
advantageofthis approachis thatpersonalizationcanalso beappliedto services
thatthemselvesdonotsupportpersonalization.Thedisadvantageisthateveryser-
vice will need its own wrapper. Unless there is a high degree of standardization
in service interfaces, creating wrappers for every individual services may not be
practicalanddoesnotscalewellindynamicserviceenvironments.
Connectorapproach: Incontrasttotheadaptorapproach,theconnectorapproachre-
liesonthepersonalizationcapabilitiesoftheindividualservices.Itenablesthebi-
directionalexchangeofdatacollectedabouttheuserbetweenthecontextpassport
andthepersonalizationcomponentoftherespectiveservice.Thecontextpassport
is synchronizedwithindividualusermodels/profilesmaintainedbyservices.The
advantagehereisthatpersonalizationofoneservicecanbenefitfromthepersonal-
izationeffortsofanother.
Thecontextpassport[13]ispositionedasatemporalmemoryforinformationabout
theuser.Itcoversanextensiblesetoffacetsmodelingdifferentusermodeldimensions,
includingcognitivepattern,task,relationship,andenvironmentdimension.Thecontext
passportactsasanaggregatedservice-independentuserprofilewithservicesreceiving
personalizationdatafromthecontextpassport.Servicesalsoreporttothecontextpass-
portbasedonrelevantuserinteractionwhichaddup-to-dateinformationtotheuser’s
context.Thecontextpassportis maintainedbyan activeuser agentwhichcommuni-
cateswiththeservicesviaaspecificprotocol.
Aflexibleprotocolisrequiredforthiscommunicationbetweencontextpassportand
theservice-specificpersonalizationcomponent.Suchaprotocolhastosupportthene-
gotiationoftheusermodelinformationtobeexchangedandthebidirectionalexchange
ofuserinformation.Astheservicesrequiredifferentmetadataaboutauser,therehas
tobeanegotiationandanagreementbetweentheserviceandthecontextpassportabout


9
whatinformationisrequired.Inordertokeepthecontextpassportup-to-date,theser-
vicesneedstoinformthecontextpassportabouttnewknowledgegainedabouttheuser.
Thereisthusarequirementfrombidirectionalinformationexchangesothatotherser-
vicesmaybenefitfromup-to-dateinformationabouttheuser.
4
RelatedWork
Metadata Management Decentralized and peer-to-peer systems can be considered
as a furthergeneralizationofdistributedsystems. Therefore,decentralizeddataman-
agementhasmuchincommonwithdistributeddatabases,whicharealreadywellex-
plored[14,15].However,someimportantdifferencesexist.Distributeddatabases are
madetoworkinstable,wellconnectedenvironments(e.g.LANs)withtheglobalsys-
temoverview,whereeverycrashednodeis eventuallyreplacedbyanewproperone.
Also,theyneedsomesortofadministrationandmaintenance.
Onthecontrary,theP2PsystemsaredeployedmostlyonthehighlyunreliableInter-
net.Somelinkscanbedown,networkbandwidthsarenotguaranteed.TheP2Psystems
allowdisconnectionofanypeeratanytime,withoutaneedforreplacement,andnone
ofthepeersisawareofthecompletesystemarchitecture.Therefore,thesystemmust
self-organizeinordertosurvivesuchsituations.
ManydistributeddatabaseslikeTeradata,TandemsNonStopSQL,InformixOnline
Xps, Oracle Parallel Server and IBM DB2 Parallel Edition [16] are available on the
market.ThefirstsuccessfuldistributedfilesystemwasNetworkFileSystem(NFS)suc-
ceededbyAndrewFileSystem(AFS),CodaandxFS,etc.
CurrentpopularP2Pfile-sharingsystems(e.g.KaZaA,Gnutella,eDonkey,Past[2])
mightbeagoodstartingpointforenablingdecentralizeddatamanagement.However,
these systems have some important drawbacks: file-level granularity and write-once
access,i.e.filesarenon-updateableafterstoring.Storinganewversionrequiresanew
filename.Usually,afilecontainsmanyobjects.Asaconsequence,retrievingaspecific
object would require getting the whole file first. If a object must be updated, then a
wholenewfileversionmustbecreatedandstored.Incurrentsystemsitisnotpossible
to search for a particular object inside the files. The query results contain the whole
files,notonlyrequestedobjects.Advancedsearchingmechanismlikequalified,range
orbooleanpredicatessearchisnotsupported.Usually,metadatahaverichandcomplex
structure and queries on them are more than simple keyword match. Also, metadata
shouldbeupdateable.Thus,thepresentedP2Psystemsarenotsuitablefordecentralized
metadatamanagement.
Therearesomeattempts[17]toextendGnutellaprotocolstosupportothertypesof
queries.ItwouldbequitepossibletocreateaGnutellaimplementationthatunderstands
somevariantofSQL,XPathorXQuery.However,suchnetworkswouldhaveproblems
withsystemload,scalabilityanddataconsistency,e.g.onlylocallystoreddatacouldbe
updatedandmechanismsforupdatingotherreplicasdonotexist.
InformationRetrieval TypicalPeer-to-peerinformationretrieval(P2PIR)methodsare
workingdecentralized,asproposedbytheP2Pparadigm[2].Noserverisinvolvedas
itwouldbeinahybridorclient-serverarchitecture.CommonP2PIRapproachesletthe


10
requesting peer contact other peers in the network for the desired documents. In the
worstcase,thequeryisbroadcasttothewholenetworkresultinginlotsofcommuni-
cation overhead.Another approachwould be to store all indexinformationon every
peerandsearchforrelevantdocumentslocally.Peerswouldrequesttherequiredinfor-
mationduringtheinitalintroductionphase,andupdateswouldbespreadfromtimeto
time.However,thisapproachisnotfeasiblesincetheexpectedstoragecostswouldbe
quitehigh.Intermediateapproacheswhichtrytobalancecommunicationandstorage
costsworkwithpeercontentrepresentationsliketheclusteringapproachdiscussedin
[18].Suchapeercontentrepresentationdoesnotneedtheamountofdataasnapshot
ofthewholedistributedindexwouldneed,butconveysenoughinformationtoestimate
theprobabilitythatadocumentsrelevanttothequerycanbefoundonacertainpeer.
Someannotationsystems [19]providesimplefull-textsearchmechanismsonan-
notations.TheYawassystem[20]offerssomemeanstouseannotationsfordocument
search,e.g.byenablinguserstosearchforaspecificdocumenttypeconsideringanno-
tations.Golovchinskyetal.[21]useannotationsasmarkingsgivenbyuserswhojudge
certainpartsofadocumentasbeingimportantwhenemphasizingthem.Theirapproach
gainedbetterresultsthanclassicrelevancefeedback,asexperimentsshowed.Agostiet
al.
[7]discussfacetsofannotationsandproposeanannotation-basedretrievalfunction
basedonprobabilisticinference.Theideaofautomaticannotationsismotivatedbythe
internallinkingmethodsdescribedin[10]byThieletal.
PersonalizationSupport Themost popularpersonalizationapproachesin digitalli-
braries or more general in information and content management systems are recom-
mendersystemsandmethodsthatcanbesummarizedunderthetermpersonalizedinfor-
mationaccess.Recommendersystems(seee.g.[22])giveindividualrecommendations
forinformationobjectsfollowinganinformationpushapproach,whereaspersonalized
informationaccess(personalizednewspapers,etc.)isrealizedaspartoftheinformation
pullprocess,e.g.byfilteringretrievalresultsorrefiningthequeriesthemselves.
Personalizationmethodsarebasedonmodelingusercharacteristics,mainlycogni-
tivepatternlikeuserinterests,skillsandpreferences[23].Moreadvancedusermodels
alsotakeintoaccountusertasks[24]basedontheassumptionthatthegoalsofusers
influencetheirneeds.Suchextendedmodelsarealsoreferredtoasusercontextmod-
els[25].Aflexibleusercontextmodelthatisabletocaptureanextensiblesetofuser
model facets as it is required for cross-service personalization can be found in [13].
Informationfor the user models (also called user profiles) are collected explicitly or
implicitly[26],typicallybytrackinguserbehavior.Theseuserprofilesareusedforper-
sonalizedfilteringininformationdissemination(push)aswellasininformationaccess
(pull)services.Animportantapplicationareaispersonalizedinformationretrieval.The
informationaboutthe useris used forqueryrewriting [27],forthe filtering ofquery
results[28]aswellasforapersonalizedrankingofqueryresults[29].
5
ConclusionsandFutureWork
Inthispaper,wediscussedopportunitiesandchallengesforinformationaccesssupport
resultingfromthetransitionofmoretraditional,centrallycontrolledDLarchitectures


11
toDLsasdynamicfederationsofcontentcollectionsandDLservices.Thediscussion
focussedonmetadatamanagement,informationretrieval,andpersonalizationsupport.
In addition to discussing the central challenges, an advancedapproachhas beendis-
cussed for each of the three aspects: For metadata management a decentralized P2P
datastoresolvestheproblemofsystematicandefficientdecentralizedmetadataman-
agement.Applicationsofannotationsandannotation-basedretrievalintheP2Pcontext
is considerdasa way toimprovedinformationretrivalsupportin a decentralizeden-
vironment. Finally, cross-service personalization is discussed as an adequate way to
handlepersonalizationinadynamicservice-orientedenvironment.
The list ofthe consideredinformationaccess issues discussed is not meant to be
exhaustive. Further challenges raise within next-generationDL architectures like ef-
fective metadata brokering and advanced methods for ensuring content security and
quality.Theenvisagedsupportforinformationaccessneedstocombinetheapproaches
mentioned abovein a balanced way to ensure that users will benefit from decentral-
ized architectures,while at the same time,maintainingthe highlevelof organization
and reachability that users of DL systems are used to. Such issues are addressed in
theBRICKSandtheDIIGENTprojectinwhichourinstituteisinvolvedtogetherwith
partnersfromotherEuropeancountries.
References
1.
BRICKS Consortium: BRICKS - Building Resources for Integrated Cultural Knowledge
Services(IST507457).(2004)
http://www.brickscommunity.org/
.
2.
Milojiˇci´c, D., Kalogeraki, V., Lukose, R., Nagaraja, K., Pruyne, J., Richard, B., Rollins,
S.,
Xu, Z.:
Peer-to-peer computing.
Technical report (2002)
http://www.hpl.hp.com/
techreports/2002/HPL-2002-57.pdf
.
3.
Kneˇzevi´c, P.: Towards a reliable peer-to-peer xml database. In Lindner, W., Perego, A.,
eds.:ProceedingsICDE/EDBTJointPhDWorkshop2004,P.O.Box1527,71110Heraklion,
Crete,Greece,CreteUniversityPress(2004)41–50
4.
W3C: Web Services Description Language (WSDL) 1.1. (2001)
http://www.w3.org/TR/
wsdl
.
5.
OASIS:
Universal Description, Discovery and Integration (UDDI). (2001)
http://www.
uddi.org/
.
6.
Risse,T.,Kneˇzevi´c,P.: Datastoragerequirements fortheserviceorientedcomputing. In:
SAINT2003-WorkshoponServiceOrientedComputing.(2003)67–72
7.
Agosti,M.,Ferro,N.,Frommholz,I.,Thiel,U.: Annotationsindigitallibrariesandcollab-
oratories –facets,models andusage. In:Proc.8thEuropean Conference onResearch and
AdvancedTechnologyforDigitalLibraries(ECDL).(2004)Toappear.
8.
Frommholz,I.,Brocks,H.,Thiel,U.,Neuhold,E.,Iannone,L.,Semeraro,G.,Berardi,M.,
Ceci,M.: Document-centeredcollaborationforscholarsinthehumanities-theCOLLATE
system. [30]434–445
9.
Agosti,M.,Ferro,N.: Annotations:EnrichingaDigitalLibrary. [30]88–100
10.
Thiel,U.,Everts,A.,Lutes,B.,Nicolaides,M.,Tzeras,K.: Convergentsoftwaretechnolo-
gies: The challenge of digital libraries. In: Proceedings of the 1st Conference on Digital
Libraries:ThePresentandFutureinDigitalLibraries,Seoul,Korea(1998)13–30
11.
Attardi,G.,Gull´ı,A.,Sebastiani,F.:AutomaticWebpagecategorizationbylinkandcontext
analysis. In Hutchison, C., Lanzarone, G., eds.: Proceedings of THAI-99, 1st European
SymposiumonTelematics,HypermediaandArtificialIntelligence,(Varese,IT)


12
12.
Neuhold, E.J.,Nieder´ee, C.,Stewart,A.: Personalization in digitallibraries: An extended
view. In:ProceedingsofICADL2003.(2003)1–16
13.
Nieder´ee,C.,Stewart,A.,Mehta,B.,Hemmje,M.:Amulti-dimensional,unifiedusermodel
for cross-system personalization. In: Proceedings of Advanced Visual Interfaces Interna-
tionalWorkingConference(AVI2004)-WorkshoponEnvironmentsforPersonalizedInfor-
mationAccess,Gallipoli(Lecce),Italy,May2004.(2004)
14.
¨
Ozsu,M.T.,Valduriez,P.: PrinciplesofDistributedDatabaseSystems. PrenticeHall(1999)
15.
Bernstein, P.A., Hadzilacos, V., Goodman, N.:
Concurency Control and Recovery in
DatabaseSystems. Addison-Wesley(1997)
16.
Brunie,L.,Kosch,H.: Acommunications-oriented methodologyforloadbalancinginpar-
allelrelationalqueryprocessing. In:AdvancesinParallelComputing,ParCoConferences,
Gent,Belgium.(1995)
17.
GPU: Agnutellaprocessingunit(2004)
http://gpu.sf.net
.
18.
M¨uller,W.,Henrich,A.: Fastretrievalofhigh-dimensionalfeaturevectorsinP2Pnetworks
usingcompactpeerdatasummaries. In:Proceedingsofthe5thACMSIGMMinternational
workshoponMultimediainformationretrieval,ACMPress(2003)79–86
19.
Ovsiannikov, I.A., Arbib, M.A., McNeill, T.H.:
Annotation technology.
Int. J. Hum.-
Comput.Stud.50(1999)329–362
20.
Denoue,L.,Vignollet,L.: Anannotationtoolforwebbrowsersanditsapplicationstoinfor-
mationretrieval. In:ProceedingsofRIAO2000,Paris,April2000.(2000)
21.
Golovchinsky,G.,Price,M.N.,Schilit,B.N.: Fromreadingtoretrieval:Freeforminkanno-
tations asqueries. InGey,F.,Hearst,M.,Tong,R.,eds.:Proceedings ofthe22nd Annual
International ACM SIGIR Conference on Research and Development in Information Re-
trieval,NewYork,ACMPress(1999)19–25
22.
Bouthors,V.,Dedieu,O.: Pharos,acollaborativeinfrastructureforwebknowledgesharing.
InAbiteboul,S.,Vercoustre,A.M.,eds.:ResearchandAdvancedTechnologyforDigitalLi-
braries,ProceedingsoftheThirdEuropeanConference,ECDL’99,Paris,France,September
1999.
VolumeLNCS1696ofLectureNotesinComputerScience.,Springer-Verlag(1999)
215
ff.
23.
McTear, M.: User modeling for adaptive computer systems: A survey of recent develop-
ments. In:ArtificialIntelligenceReview.Volume7.(1993)157–184
24.
Kaplan, C.,Fenwick, J.,Chen,J.: Adaptive hypertext navigation based on user goals and
context. In:UserModelingandUser-AdaptedInteraction3. KluwerAcademicPublishers,
TheNetherlands(1993)193–220
25.
Goker,A.,Myrhaug,H.: Usercontextandpersonalization. In:ProceedingsoftheEuropean
Conference on Case Based Reasoning (ECCBR 2002) - Workshop on Personalized Case-
BasedReasoning,Aberdeen,Scotland,4-7September2002.VolumeLNCS2416ofLecture
NotesinArtificialIntelligence.,Springer-Verlag(2002)
26.
Pretschner,A.,Gauch,S.:Personalizationontheweb. TechnicalReportITTC-FY2000-TR-
13591-01,InformationandTelecommunicationTechnologyCenter(ITTC),TheUniversity
ofKansas,Lawrence,KS(1999)
27.
Gulla,J.A.,vanderVos,B.,Thiel,U.: Anabductive,linguisticapproachtomodelretrieval.
Data&KnowledgeEngineering23(1997)17–31
28.
Casasola,E.:Profusionpersonalassistant:Anagentforpersonalizedinformationfilteringon
thewww. Master’sthesis,TheUniversityofKansas,Lawrence,KS(1998)
29.
Meng, X., Chen, Z.: Personalize web search using information on client’s side. In: Pro-
ceedingsoftheFifthInternationalConferenceofYoungComputerScientists,August17-20,
1999,Nanjing,P.R.China,InternationalAcademicPublishers(1999)985–992
30.
Koch, T.,Sølvberg,I.T.,eds.: Proc.7th European Conference on Research and Advanced
TechnologyforDigitalLibraries(ECDL),LectureNotesinComputerScience(LNCS)2769,
Springer,Heidelberg,Germany(2003)