Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Banos, V.; Kim, Y.; Ross, S.; Manolopoulos, Y.
Languages: English
Types: Other
Subjects: Z665, QA76
Web archiving is crucial to ensure that cultural, scientific\ud and social heritage on the web remains accessible and usable\ud over time. A key aspect of the web archiving process is optimal data extraction from target websites. This procedure is\ud diļ¬ƒcult for such reasons as, website complexity, plethora of\ud underlying technologies and ultimately the open-ended nature of the web. The purpose of this work is to establish\ud the notion of Website Archivability (WA) and to introduce\ud the Credible Live Evaluation of Archive Readiness (CLEAR)\ud method to measure WA for any website. Website Archivability captures the core aspects of a website crucial in diagnosing whether it has the potentiality to be archived with completeness and accuracy. An appreciation of the archivability\ud of a web site should provide archivists with a valuable tool\ud when assessing the possibilities of archiving material and in-\ud \ud uence web design professionals to consider the implications\ud of their design decisions on the likelihood could be archived.\ud A prototype application, archiveready.com, has been established to demonstrate the viabiity of the proposed method\ud for assessing Website Archivability.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] A. Avizienis, J.-C. Laprie, and B. Randell. Fundamental concepts of computer system dependability. In Proceedings of IARP/IEEE-RAS Workshop on Robot Dependability: Technological Challenge of Dependable, Robots in Human Environments, 2001.
    • [2] V. Banos, N. Baltas, and Y. Manolopoulos. Trends in blog preservation. In Proceedings of the 14th International Conference on Enterprise Information Systems (ICEIS), Wroclaw, Poland, 2012.
    • [3] D. Brickley and L. Miller. Foaf vocabulary speci cation 0.98. Namespace Document, 9, 2010.
    • [4] D. P. Coalition. Institutional strategies - standards and best practice guidelines. http://www.dpconline.org/advice/ preservationhandbook/institutional-strategies/ standards-and-best-practice-guidelines, 2012. [Online; accessed 18-April-2013].
    • [5] D. Denev, A. Mazeika, M. Spaniol, and G. Weikum. The sharc framework for data quality in web archiving. The VLDB Journal, 20(2):183{207, 2011.
    • [6] M. Donnelly. Jstor/harvard object validation environment (jhove). Digital Curation Centre Case Studies and Interviews, 2006.
    • [7] M. Faheem and P. Senellart. Intelligent and adaptive crawling of web applications for web archiving. In Proceedings of the 21st International Conference Companion on World Wide Web (WWW), pages 127{132, Lyon, France, 2012.
    • [8] V. D. Glenn. Preserving government and political information: The web-at-risk project. First Monday, 12(7), 2007.
    • [9] Y. He, D. Xin, V. Ganti, S. Rajaraman, and N. Shah. Crawling deep web entity pages. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM), pages 355{364, Rome, Italy, 2013.
    • [10] H. Hockx-Yu, L. Crawford, R. Coram, and S. Johnson. Capturing and replaying streaming media in a web archive-a british library case study, 2010.
    • [11] U. o. S. Lorna Campbell. Learning object metadata, curation reference manual. http://www.dcc.ac.uk/ resources/curation-reference-manual/ completed-chapters/learning-object-metadata, 2007. [Online; accessed 18-April-2013].
    • [12] S. Mans eld-Devine. Simple website footprinting. Network Security, 2009(4):7{9, 2009.
    • [13] B. McBride et al. The resource description framework (rdf) and its vocabulary description language rdfs. Handbook on Ontologies, pages 51{66, 2004.
    • [14] D. Michael Day. Metadata, curation reference manual. http://www.dcc.ac.uk/resources/ curation-reference-manual/completed-chapters/ metadata, 2005. [Online; accessed 18-April-2013].
    • [15] G. Mohr, M. Stack, I. Rnitovic, D. Avery, and M. Kimpton. Introduction to heritrix. In Proceedings of the 4th International Web Archiving Workshop (IWAW), Vienna, Austria, 2004.
    • [16] J. Niu. An overview of web archiving. D-Lib Magazine, 18(3):2, 2012.
    • [17] L. of Congress. Sustainability of digital formats planning for library of congress collections: External dependencies. http://www.digitalpreservation. gov/formats/sustain/sustain.shtml\#external, 2013. [Online; accessed 18-April-2013].
    • [18] G. Pant, P. Srinivasan, and F. Menczer. Crawling the web. In Web Dynamics, pages 153{177. Springer, 2004.
    • [19] G. Paynter, S. Joe, V. Lala, and G. Lee. A year of selective web archiving with the web curator tool at the national library of new zealand. D-Lib Magazine, 14(5):2, 2008.
    • [20] M. Pennock and R. Davis. Archivepress: A really simple solution to archiving blog content. In Proceedings of the 6th International Conference on Preservation of Digital Objects (IPres), San Francisco, CA, 2009.
    • [21] M. Pennock and B. Kelly. Archiving web site resources: a records management view. In Proceedings of the 15th International Conference on World Wide Web (WWW), pages 987{988, Edinburgh, UK, 2006.
    • [22] N. Press. Understanding metadata. National Information Standards, 20, 2004.
    • [23] F. C. f. L. A. Priscilla Caplan, Digital Library Services. Preservation metadata, curation reference manual. http://www.dcc.ac.uk/resources/ curation-reference-manual/completed-chapters/ preservation-metadata, 2006. [Online; accessed 18-April-2013].
    • [24] U. Schonfeld and N. Shivakumar. Sitemaps: above and beyond the crawl of duty. In Proceedings of the 18th International Conference on World Wide Web (WWW), pages 991{1000, Madrid, Spain, 2009.
    • [25] M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and P. Senellart. Data quality in web archiving. In Proceedings of the 3rd Workshop on Information Credibility on the Web (WICOW), pages 19{26, Madrid, Spain, 2009.
    • [26] Y. Sun, Z. Zhuang, and C. L. Giles. A large-scale study of robots. txt. In Proceedings of the 16th International Conference on World Wide Web (WWW), pages 1123{1124, Banf, Canada, 2007.
    • [27] W. D. . M. van Ballegooie. Archival metadata, curation reference manual. http://www.dcc.ac.uk/ resources/curation-reference-manual/ completed-chapters/archival-metadata, 2006. [Online; accessed 18-April-2013].
    • [28] S. Weibel, J. Kunze, C. Lagoze, and M. Wolf. Dublin core metadata for resource discovery. Internet Engineering Task Force RFC, 2413:222, 1998.
  • No related research data.
  • No similar publications.
  • BioEntity Site Name
    Google Code

Share - Bookmark

Download from

Funded by projects


Cite this article