Serviços Personalizados
Journal
Artigo
Indicadores
- Citado por SciELO
- Acessos
Links relacionados
- Similares em SciELO
Compartilhar
Tékhne - Revista de Estudos Politécnicos
versão impressa ISSN 1645-9911
Tékhne n.9 Barcelos jun. 2008
Satisfying Information Needs on the Web: a Survey of Web Information Retrieval*
Nuno Filipe Escudeiro[1], Alípio Mário Jorge[2]
nfe@isep.ipp.pt, amjorge@fep.up.pt
(recebido em 20 de Março de 2008; aceite em 22 de Abril de 2008)
Resumo. Desde muito cedo que a espécie Humana sentiu a necessidade de manter registos da sua actividade, para que possam ser facilmente consultados futuramente. A nossa própria evolução depende, em larga medida, deste processo iterativo em que cada iteração se baseia nestes registos. O aparecimento da web e o seu sucesso incrementaram significativamente a disponibilidade da informação que rapidamente se tornou ubíqua. No entanto, a ausência de controlo editorial origina uma grande heterogeneidade sob vários aspectos. As técnicas tradicionais em recuperação de informação provam ser insuficientes para este novo meio. A recuperação de informação na web é a evolução natural da área de recuperação de informação para o meio web. Neste artigo apresentamos uma análise retrospectiva e, esperamos, abrangente desta área do conhecimento Humano.
Palavras-chave: Recuperação de informação na web, motores de pesquisa.
Abstract. Human kind felt, since early ages, the need to keep records of its achievements that could persist through time and that could be easily retrieved for later reference. Our own evolution depends largely on this iterative process, where each iteration is based on these records. The advent of the web and its attractiveness highly increased the availability of information which rapidly becomes ubiquituous. However, the lack of editorial control originates high heterogeneity in several ways. The traditional information retrieval techniques face new, challenging problems and prove to be inefficient to deal with web characteristics. In this paper we present a comprehensive and retrospective overview of web information retrieval.
Keywords: Web information retrieval, search engines.
Texto completo disponível apenas em PDF.
Full text only available in PDF format.
References
Aas, K., Eikvil, L. (1999), Text Categorization: A Survey, Norwegian Computing Center [ Links ]
Aggarwal, C.C., Al-Garawi, F., Yu, P. (2001), Intelligent crawling on the World Wide Web with arbitrary predicates, Proceedings of the 10th World Wide Web Conference
Aggarwal, C.C. (2004), On Leveraging User access Patterns for Topic Specific Crawling, Data mining and Knowledge Discovery, 9, pp 123-145, Kluwer Academic Publishers
Apostolico, A., Baeza-Yates, R., Melucci, M. (2006), Advances in information retrieval: an introduction to the special issue, Journal of Information Systems, Elsevier Science Ltd., 31(7), p.569-572
Arcot, H.G.A. (2004) Perception-based fuzzy information retrieval. United States -- California: San Jose State University
Baeza-Yates, R. (2003), Information Retrieval in the Web: beyond current search engines, Elsevier International Journal of Approximate Reasoning, 34, pp 97-104
Baeza-Yates, R., Ribeiro-Neto, B. (1999), Modern Information Retrieval. ACM Press
Baldi, P., Frasconi, P., Smyth, P. (2003), Modeling the Internet and the Web. Probabilistic Methods and Algorithms, Wiley
Beitzel, Steven M. (2006) On understanding and classifying web queries, PhD dissertation USA, Illinois, Illinois Institute of Technology
Bennet, K.P., Demiriz, A. (1998), Semi-Supervised Support Vector Machines, Proceeding of Neural Information Processing Systems
Berners-Lee, T. (1989), Information Management: a proposal., CERN
Berners-Lee, T., Hendler, J., Lassila, O. (2001), The Semantic Web. Scientific American
Blum, A., Mitchell, T. (1998), Combining labelled and unlabelled data with Co‑training, Proceedings of the 11th Annual Conference on Computational Learning Theory, pp 92-100
Borges, J.L.C.M. (2000), A Data Mining Model to Capture User Web Navigation Patterns, PhD dissertation, University of London
Brin, S., Page, L. (1998), The anatomy of a large-scale hypertextual web search engine, Proceedings of the 7th World Wide Web Conference, pp 107-117
Broder, A. (2002) A taxonomy of web search. SIGIR Forum. 36:2. p. 3-10
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J. (2000), Graph structure in the web World Wide Web Conference, Amsterdam, Holand
Broder, A., Maarek, Y., Bharat, K., Dumais, S., Papa, S., Pedersen, J., Raghavan, P.(2005), Current Trends in the Integration of Searching and Browsing, Special interest tracks and posters of the 14th World Wide Web Conference , Chiba, Japan, p.793
Bruza, P., McArthur, R., Dennis, S. (2000), Interactive Internet search: keyword, directory and query reformulation mechanisms compared, Research and Development in Information Retrieval
Bush, V. (1945), As We May Think, The Atlantic Monthly, July
Carey, M., Kriwaczek, F., Ruger, S.M. (2000), A Visualization Interface for Document Searching and Browsing, Proceedings of the NPIVM 2000
Chakrabarti, S. (2003), Mining the Web. Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P. (1998a), Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies, The VLDB Journal, 7, pp 163-178
Chakrabarti, S., Dom, B., Indyk, P. (1998b), Enhanced hypertext categorization using hyperlinks, Proceedings of ACM SIGMOD International Conference on Management of data, pp 307-318
Chakrabarti, S., Byron, E., Kumar, S., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J. (1999a), Mining Web Link Structure, IEEE Computer, 32(8), pp 60-67
Chakrabarti, S., Berg, M., Dom, B. (1999b), Focused crawling: a new approach to topic‑specific resource discovery, Proceedings of the 8th World Wide Web Conference
Chewar, C.M., Krowne, A., O´Laughlen, M. (2001), User Object Collections: Visualization Concepts by collection-Insight Need, CITIDEL project
Cho, J., Garcia-Molina, H. (2000), Estimating Frequency of Change, Technical report, Stanford University
Cleverdon, C.W. (1991), The significance of the Cranfield tests on index languages, Proceedings of the ACM SIGIR, p. 3-12
Cleverdon, C.W. (1962), Comparative Efficiency of Indexing Systems, Cranfield
Cleverdon, C.W., Aitchison, J. (1963), Test of the Index of Metallurgical Literature, Cranfield
Cleverdon, C.W., Thorne, R.G. (1954), An Experiment with the Uniterm System, R.A.E. Cranfield, 7
Codd, E.F. (1970), A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377-387
Cooley, R., Mobasher, B., Srivastava, J.(1997), Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE International conference on tools with Artificial Intelligence, pp 558-567
Cormack, G.V., Palmer, C.R, Clarke, C.L.A. (1998), Efficient Construction of Large Test Collections, Proceedings of the ACM SIGIR 1998 Conference
Crestani, F., Shengli, W. (2006), Testing the cluster hypothesis in distributed information retrieval, Information Processing and Management. 42, p. 1137-1150
Crestani, F., Ruthven (2007), I., Introduction to special issue on contextual information retrieval systems. Information Retrieval. 10, p. 111-113
Croft, W.B. (2003), Information retrieval and computer science: an evolving relationship, ACM SIGIR Conference, Toronto, Canada, p.2-3
Cugini, J., Piatko, C., Laskowski, S. (1996), Interactive 3D Visualization for Document Retrieval, Proceedings of the ACM Conference on Information and Knowledge Management
Dao, T. (1998), An Indexing Model for Structured Documents to Support Queries on Content, Structure and Attributes, Proceedings of IEEE ADL Conference, Santa Barbara, California, USA
Dewey, M. (2004), A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library, Project Gutenberg EBook
Domingos, P. (2007), What's missing in AI: The Interface Layer, University of Washington, Washington, USA
Domingos, P., Kok, S., Poon, H., Richardson, M., Singla, P. (2006), Unifying Logical and Statistical AI, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, Massachusetts, USA
Donato, D., Laura, L., Millozi, S. (2000), A beginners guide to the Webgraph: Properties, Models and Algorithms, Proceedings of the 41st FOCS, pp.57-65
Escudeiro, N., Jorge, A., (2006) Semi-automatic Creation and Maintenance of Web Resources with webTopic. Semantics, Web and Mining. LNCS, vol. 4289, pp. 82-102, Springer, Heidelberg
Glover, E.J., Flake, G.W., Lawrence, S., Birmingham, P., Kruger, A., Giles, C.L., Pennock, D.M. (2001), Improving Category Specific Web Search by Learning Query Modifications, Symposium on Applications and the Internet, IEEE Computer Society, pp 23-31
Gulli, A., Signorini A. (2005), The Indexable Web is More than 11.5 billion pages. In: WWW 2005, Chiba, Japan
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M. (2003), Thesus: Organizing Web document collections based on link semantics, The VLDB Journal, 12, pp 320-332
Hammouda, K.M., Kamel, M.S. (2004), Efficient Phrase-Based Document Indexing for Web Document Indexing. IEEE Transactions on Knowledge and Data Engineering. 16:10, p. 1279-1296
Haveliwala, T.H. (2005), Context-sensitive Web search, PhD dissertation, Stanford University, California, USA
Henzinger, M., Motwani, R., Silverstein, C. (2003), Challenges in Web Search Engines, 18th International Joint Conference on Artificial Intelligence
Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalaim, M., Ur, S. (1998), The Shark-search algorithm. An application: tailored web site mapping, Computer Networks 30(1-7), pp 317-326
Hu, W., (2002), World Wide Web Search Technologies, Architectural Issues of Web‑Enables Electronis Business, edited by Shi Nansi for Idea Group Publishing
Ifrim, G., Theobald, M., Weikum, G. (2005), Learning Word-to-Concept Mappings for Automatic Text Classification, International Conference on Machine Learning
Jardine, N., Rijsbergen, C.J. (1971), The use of hierarchic clustering in information retrieval, Information Storage and Retrieval, 7(5), pp. 217-240
Joachims, T. (1998), Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Research Report of the unit no. VIII(AI), Computer Science Department of the University of Dortmund
Kandogan, E. (2001), Visualizing Multi-dimensional Clusters, Trends, and Outliers using Star Coordinates, Proceedings of the KDD Conference, San Francisco, Califormia, USA
Kahle, B. (1997), Preserving the internet, Scientific American. 276:3, p. 82-83
Kleinberg, J. (1998), Authoritative sources in a hyperlinked environment, Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, pp 668-677
Koller, D., Sahami, M. (1996), Toward Optimal Feature Selection, Proceedings of the 13th International Conference on Machine Learning, pp. 284-292, Morgan Kaufmann
Kosala, R., Blockeel, H. (2000), Web Mining Research: A Survey, SIGKDD Explorations, Vol. 2, No. 1, pp 1-13
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E. (2000), The Web as a graph, Proceedings of the 19th ACM SIGACT-SIGMOD-AIGART Symp. Principles of Database Systems
Lafferty, J., McCallum, A., Pereira, F. (2001), Conditional random fields: Probabilistic models for segmenting and labeling sequence data , 18th International Conference on Machine Learning, 2001
Lawrence, S., Bollacker, K., Giles, C.L. (1999), Indexing and Retrieval of Scientific Literature, Proceedings of the 8th International Conference on Information and Knowledge Management, pp 139-146
Lewandowski, D. (2005), Web searching, search engines and Information Retrieval. Information Services and Use. 25:3-4/2005, p. 137-147
Li, X., Liu, B. (2003), Learning to classify text with positive and unlabelled data, Proceeding of IJCAI 2003
Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R. (2001), Characterizing Web Document Change, Lecture notes in Computer Science
Liu, R. L., Lin, W. J.(2005), Incremental mining of information interest for personalized web scanning, Information Systems journal, 30(8), p. 630-648
Lu, S., Dong, M., Fotouhi, F. (2002), The semantic web: opportunities and challenges for next generation web applications, Information Research, 7 (4)
Mitra, M., Singhal, A., Buckley, C. (1998), Improving automatic query expansion, Proceedings of the 21st ACM SIGIR Conference
Nelson, T. (1965), A file structure for the complex, the changing, and the indeterminate, ACM National Conference, 84-100
Nicola, C., Gaussier, E., Goutte, C., Renders, J. M. (2003), Word-Sequence Kernels, Journal of Machine Learning Research, Nº 3, pp 1053-1082
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M. (2000), Text classification from labeled and unlabeled documents using EM, Machine Learning, 39, pp 103-134
Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G. (1992), Visualization of a Document Collection: The VIBE System, Information Processing & Management, Vol. 29, No. 1, pp 69-81
Orengo, V., Huyck, C. (2001), A Stemming Algorithm for the Portuguese Language, Proceedings of the 8th SPIRE
OReily (2004), Web 2.0
Porter, M.F. (1980), An algorithm for suffix stripping, Program, 14, No. 3, pp 130‑137
Richardson, M., Domingos, P. (2004) Combining Link and Content Information in Web Search, Washington University, Washington, USA
Rijsbergen, K. (1979), Information Retrieval, Butherworth
Sahami, M. (2004),The happy searcher: Challenges in the web information retrieval, Pacific Rim International Conference on Artificial Intelligence, 3157, p.3-12
Salton, G., Lesk, M.E. (1965), The SMART automatic document retrieval systems - an illustration, Communications of the ACM, 8:6 (June 1965), p.391-398
Salton, G., McGill, M. (1983), Introduction to Modern Information Retrieval, McGraw-Hill
Salton, Wong, Yang (1975), A vector space model for automatic indexing. Communications of the ACM. 18:11 (1975). p. 613-620
Shen, G. (2005), Formal concepts and applications, PhD dissertation, Case Western Reserve University, Ohio, USA
Shwarzkopf, E. (2003), Personalized Interaction with Semantic Information Portals, German Research Center for Artificial Intelligence
Siddiqui, Tanveer, J. (2006), Intelligent Techniques for Effective Information Retrieval (A Conceptual Graph Based Approach), ACM SIGIR Forum. 40:2
Spangler, S., Kreulen, J.T., Lessler, J. (2003), Generating and Browsing Multiple Taxonomies Over a Document Collection, Journal of Management Information Systems, 19(4), p. 191-212
Viji, S. (2002), Term and Document Correlation and Visualization for a set of Documents, Technical report, Stanford University
Voorhees, E.M. (1998), Variations in Relevance Judgements and the Measurement of Retrieval Effectiveness, Proceedings of the ACM SIGIR 1998 Conference
Wang, J., Lochovsky, F. (2003), Web Search Engines, Journal of ACM Computing Survey (accepted for revision)
Wolf, K.E. (1993), A First Course in Formal Concept Analysis, Advances in Statistical Software, 4, p. 429-438
Yang, Y. (1999), An Evaluation of Statistical Approaches t
o Text Categorization, Journal of Information Retrieval, vol. 1, nos. 1/2, pp 67-88
Yang, Y., Pederson, J. (1997), A Comparative Study of Feature Selection in Text Categorization, International Conference on Machine Learning
Yang, Y., Slattery, S., Ghani, R. (2002), A Study of Approaches to Hypertext Categorization, Kluwer Academic Publishers, pp. 1-25
Zakos, J., Verma, B. (2006), A Novel Context-based Technique for Web Information Retrieval, World Wide Web, 9(4), p. 485-503
Zamir, O., Etzioni, O. (1999), Grouper: A Dynamic clustering Interface to Web Search Results, Proceedings of the 1999 World Wide Web Conference
* Supported by the POSC/EIA/58367/2004/Site-o-Matic Project (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual de Unidades de I & D.
[1] DEI-ISEP Deptº de Engenharia Informática, Instituto Superior de Engenharia do Porto ; http://www.dei.isep.ipp.pt
[2] 2FEP-UP Faculdade de Economia, Universidade do Porto; http://www.fep.up.pt
LIAAD, INESC Porto LA Laboratório de Inteligência Artificial e Análise de Dados; http://www.liaad.up.pt