|
William Y. Arms,
Professor of Computer Science, Cornell University
Title: "Humanities and Social Science Research Using Vast Amounts of Web Data"
Much research in the humanities and social sciences is built around information that has been extracted from document collections. Analyzing even a moderate sized collection by hand can be a slow and tedious task. With vast collections of electronic documents analysis has to be automatic.
This talk will describe a large scale project to apply the techniques of supercomputing and artificial intelligence in such contexts. The corpus is the Internets Archive's Web collection of complete snapshots of the Web, captured every second month since 1996, more than 40 billion different pages. Methods of natural language processing and machine learning are being used to analyze this data. Perhaps the greatest challenge, however, is cultural. Computer scientists must understand the research methodologies of other disciplines in-depth.
Biography
William Y. Arms is professor of computer science at Cornell University. He has degrees from Oxford University, the London School of Economics, and the University of Sussex. His career includes appointments at the British Open University, Dartmouth College, and Carnegie Mellon University. At Cornell, he was the first director of the Information Science program. He has more than thirty years experience applying computing to academic activities, notably educational computing, computer networks, and digital libraries. His book "Digital Libraries" was published by MIT Press in 2000.
|