Filters








68 Hits in 2.8 sec

The SHARC framework for data quality in Web archiving

Dimitar Denev, Arturas Mazeika, Marc Spaniol, Gerhard Weikum
2011 The VLDB journal  
This paper presents the SHARC framework for assessing the data quality in Web archives and for tuning capturing strategies toward better quality with given resources.  ...  Data quality is crucial for these purposes.  ...  Acknowledgments This work is supported by the 7th Framework IST programme of the European Union through the Living Web Archives (LiWA) project.  ... 
doi:10.1007/s00778-011-0219-9 fatcat:dr5dqzb455glthgspensughgbi

SHARC

Dimitar Denev, Arturas Mazeika, Marc Spaniol, Gerhard Weikum
2009 Proceedings of the VLDB Endowment  
This paper presents the SHARC framework for assessing the data quality in Web archives and for tuning capturing strategies towards better quality with given resources.  ...  Data quality is crucial for these purposes.  ...  Acknowledgements This work is supported by the 7 th Framework IST programme of the European Union through the small or medium-scale focused research project (STREP) on Living Web Archives (LiWA).  ... 
doi:10.14778/1687627.1687694 fatcat:ro5dt5eemnanlbusmjyzaxqp7e

The SHARC framework

Trien V. Do, Keith Cheverst
2015 Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems - EICS '15  
In this paper we present the design of the SHARC framework and in particular focus on the utilization of personal Dropbox accounts to provide a scalable solution to the storage and sharing of community  ...  The emergence of personal cloud storage services provides a new paradigm for storing and sharing data.  ...  The cloud storage solution is suitable for the SHARC framework because all data and files are submitted in order to be shared with others.  ... 
doi:10.1145/2774225.2774841 dblp:conf/eics/DoC15 fatcat:soijyphmsjf3xdjnbtj5p4k7fu

Quality Matters: A New Approach for Detecting Quality Problems in Web Archives

Brenda Reyes Ayala, Jennifer McDevitt, James Sun, Xiaohui Liu
2020 Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI  
Since the practice of web archiving, or the act of preserving websites as historical, legal, and informational records, become more commonplace in the 2000s, web archives have become valuable sources for  ...  If applied to the Quality Assurance process of an institution, this similarity metric could help web archivists quickly detect quality problems in their web archives, and fix them in order to create high-quality  ...  In their paper, Denev, Mazeika, Spaniol, and Weikum (2011) introduced the Sharp Archiving of Website Captures (SHARC) framework for data quality in web archiving. examined the importance of missing elements  ... 
doi:10.29173/cais1145 fatcat:toyxoh4ydrfnzkt54pisutwm3m

Enhanced memento's aggregator framework to browse the past web

Ahmed Alsum
2012 Bulletin of IEEE Technical Committee on Digital Libraries  
Browsing the past Web in an easy, a complete, and consistent way has become an essential need in the recent years.  ...  In this research, we propose the "Enhanced Memento Aggregator" framework which is capable of collecting, filtering, ranking the archived copies in a distributed environment.  ...  Nelson for his efforts and guidance in this proposal. Also, I would like to thank Prof.  ... 
dblp:journals/tcdl/Alsum12 fatcat:tps5qgijfbfejhm576ybprj2by

A Grounded Theory of Information Quality in Web Archives

Brenda Reyes Ayala
2018 Bulletin of IEEE Technical Committee on Digital Libraries  
I would also like to thank Lori Donovan and Jefferson Bailey of the Internet Archive.  ...  Jiangping Chen and the members of my committee, Dr. Oksana Zavalina, Dr. Cornelia Caragea, Dr. Shawne Miksa, and Dr. Kathryn Masten-Cain.  ...  In a later paper, Denev, Mazeika, Spaniol, and Weikum [10] introduced the SHARC framework for data quality in web archiving.  ... 
dblp:journals/tcdl/Ayala18 fatcat:ulnx463e75cz3g2xat4czcryte

Improving the Quality of Web Archives through the Importance of Changes [chapter]

Myriam Ben Saad, Stéphane Gançarski
2011 Lecture Notes in Computer Science  
A major issue encountered by archivists is to preserve the quality of web archives.  ...  Due to the growing importance of the Web, several archiving institutes (national libraries, Internet Archive, etc.) are harvesting sites to preserve (a part of) the Web for future generations.  ...  Hence, archive systems will avoid wasting time and space for indexing/storing unimportant pages versions.  ... 
doi:10.1007/978-3-642-23088-2_29 fatcat:yquufdecqrfxjasyy4pramrozq

Archiving the web using page changes patterns

Myriam Ben Saad, Stéphane Gançarski
2011 Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries - JCDL '11  
Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations.  ...  However, to the best of our knowledge, patterns have never been used in the context of web archiving.  ...  In other studies [15] , a framework (SHARC) was implemented to maximize the sharpness of web archives.  ... 
doi:10.1145/1998076.1998098 dblp:conf/jcdl/SaadG11 fatcat:mspoptzdujhkxkcd26eo6wx6cq

Archiving the web using page changes patterns: a case study

Myriam Ben Saad, Stéphane Gançarski
2012 International Journal on Digital Libraries  
Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations.  ...  However, to the best of our knowledge, patterns have never been used in the context of web archiving.  ...  In other studies [15] , a framework (SHARC) was implemented to maximize the sharpness of web archives.  ... 
doi:10.1007/s00799-012-0094-z fatcat:xp57rz32prgc5iu7mstknq6dvm

Coherence-Oriented Crawling and Navigation Using Patterns for Web Archives [chapter]

Myriam Ben Saad, Zeynep Pehlivan, Stéphane Gançarski
2011 Lecture Notes in Computer Science  
We point out, in this paper, the issue of improving the coherence of web archives under limited resources (e.g. bandwidth, storage space, etc.).  ...  Coherence measures how much a collection of archived pages versions reflects the real state (or the snapshot) of a set of related web pages at different points in time.  ...  In another study, they define two quality measures (blur and sharp) and propose a framework, coined SHARC, to optimize pages captures.  ... 
doi:10.1007/978-3-642-24469-8_42 fatcat:5upkbj4thrfhjih2hyesgyvvpy

FAIRness Literacy: The Achilles' Heel of Applying FAIR Principles

Romain David, Laurence Mabile, Alison Specht, Sarah Stryeck, Mogens Thomsen, Mohamed Yahia, Clement Jonquet, Laurent Dollé, Daniel Jacob, Daniele Bailo, Elena Bravo, Sophie Gachet (+6 others)
2020 Data Science Journal  
The SHARC Interest Group of the Research Data Alliance was established to improve research crediting and rewarding mechanisms for scientists who wish to organise their data (and material resources) for  ...  This requires that data are findable and accessible on the Web, and comply with shared standards making them interoperable and reusable in alignment with the FAIR principles.  ...  Acknowledgements This work was partly supported by the Research Data Alliance, especially the "RDA Europe 4.0" project (H2020 grant Nº777388), the EPPN2020 project (H2020 grant Nº731013), the 'Infrastructure  ... 
doi:10.5334/dsj-2020-032 fatcat:atsasrisljfyxotvjxnpjj6fj4

Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing Web Archives

Scott G. Ainsworth
2013 Bulletin of IEEE Technical Committee on Digital Libraries  
Public web archiving on a large scale began in the late 1990s with archives such as Australia's Pandora and the Internet Archive.  ...  Thus the archives are incomplete, which leads to temporal discrepancies when browsing the archives and recomposing web pages. When browsing, the user-selected target datetime drifts without notice.  ...  We are grateful to the Internet Archive for their continued support of Memento access to their archive.  ... 
dblp:journals/tcdl/Ainsworth13 fatcat:55txdc2s45gerkp44ynfqhod3i

Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

Scott G. Ainsworth, Michael L. Nelson
2014 International Journal on Digital Libraries  
This examination of web archive content is concerned with the latter. For web archives, most quality issues stem from the difficulties inherent in obtaining content using HTTP [16] .  ...  Spaniol's work, while presenting an a posteriori measure, concerns the quality of entire crawls. Denev et al. present the SHARC framework [10] , which introduces a stochastic notion of sharpness.  ... 
doi:10.1007/s00799-014-0120-4 fatcat:pm5a62rbazbmho56cilunr2stq

Only One Out of Five Archived Web Pages Existed as Presented

Scott G. Ainsworth, Michael L. Nelson, Herbert Van de Sompel
2015 Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15  
When a user retrieves a page from a web archive, the page is marked with the acquisition datetime of the root resource, which effectively asserts "this is how the page looked at a that datetime."  ...  The completeness and temporal coherence achieved using a single archive was compared to the results achieved using multiple archives.  ...  We are grateful to the Internet Archive for their continued support of Memento access to their archive.  ... 
doi:10.1145/2700171.2791044 dblp:conf/ht/AinsworthNS15 fatcat:4awso4agprbghlfoi7ueh7b5jm

Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

Scott G. Ainsworth, Michael L. Nelson
2013 Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL '13  
From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages.  ...  The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive's Wayback Machine.  ...  This examination of web archive content is concerned with the latter. For web archives, most quality issues stem from the difficulties inherent in obtaining content using HTTP [16] .  ... 
doi:10.1145/2467696.2467718 dblp:conf/jcdl/AinsworthN13 fatcat:cue4dkpghrcplmaldhkzco2gmi
« Previous Showing results 1 — 15 out of 68 results