<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1645-9911</journal-id>
<journal-title><![CDATA[Tékhne - Revista de Estudos Politécnicos]]></journal-title>
<abbrev-journal-title><![CDATA[Tékhne]]></abbrev-journal-title>
<issn>1645-9911</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico do  Cávado e do Ave]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1645-99112008000100018</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Satisfying Information Needs on the Web: a Survey of Web Information Retrieval]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Escudeiro]]></surname>
<given-names><![CDATA[Nuno Filipe]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
<xref ref-type="aff" rid="A03"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Jorge]]></surname>
<given-names><![CDATA[Alípio Mário]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
<xref ref-type="aff" rid="A03"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Instituto Superior de Engenharia do Porto Departamento de Engenharia Informática ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<aff id="A02">
<institution><![CDATA[,Universidade do Porto Faculdade de Economia ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<aff id="A03">
<institution><![CDATA[,INESC Porto Laboratório Associado LIAAD - Laboratório de Inteligência Artificial e Apoio à Decisão Laboratório de Inteligência Artificial e Análise de Dados]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2008</year>
</pub-date>
<numero>9</numero>
<fpage>337</fpage>
<lpage>369</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.pt/scielo.php?script=sci_arttext&amp;pid=S1645-99112008000100018&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.pt/scielo.php?script=sci_abstract&amp;pid=S1645-99112008000100018&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.pt/scielo.php?script=sci_pdf&amp;pid=S1645-99112008000100018&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="pt"><p><![CDATA[Desde muito cedo que a espécie Humana sentiu a necessidade de manter registos da sua actividade, para que possam ser facilmente consultados futuramente. A nossa própria evolução depende, em larga medida, deste processo iterativo em que cada iteração se baseia nestes registos. O aparecimento da web e o seu sucesso incrementaram significativamente a disponibilidade da informação que rapidamente se tornou ubíqua. No entanto, a ausência de controlo editorial origina uma grande heterogeneidade sob vários aspectos. As técnicas tradicionais em recuperação de informação provam ser insuficientes para este novo meio. A recuperação de informação na web é a evolução natural da área de recuperação de informação para o meio web. Neste artigo apresentamos uma análise retrospectiva e, esperamos, abrangente desta área do conhecimento Humano.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[Human kind felt, since early ages, the need to keep records of its achievements that could persist through time and that could be easily retrieved for later reference. Our own evolution depends largely on this iterative process, where each iteration is based on these records. The advent of the web and its attractiveness highly increased the availability of information which rapidly becomes ubiquituous. However, the lack of editorial control originates high heterogeneity in several ways. The traditional information retrieval techniques face new, challenging problems and prove to be inefficient to deal with web characteristics. In this paper we present a comprehensive and retrospective overview of web information retrieval.]]></p></abstract>
<kwd-group>
<kwd lng="pt"><![CDATA[Recuperação de informação na web]]></kwd>
<kwd lng="pt"><![CDATA[motores de pesquisa]]></kwd>
<kwd lng="en"><![CDATA[Web information retrieval]]></kwd>
<kwd lng="en"><![CDATA[search engines]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="center"><b>Satisfying Information Needs on the Web: a Survey of Web    Information Retrieval<sup>*</sup></b></p>     <p align="center">Nuno Filipe Escudeiro<sup>[<a href="#1">1</a>]<a name="top1"></a><a href="#3">•</a><a name="top3"></a></sup>,    Alípio Mário Jorge<sup>[<a href="#2">2</a>]<a name="top2"></a><a href="#3">•</a><a name="top3"></a></sup></p>     <p align="center"><a href="mailto:nfe@isep.ipp.pt">nfe@isep.ipp.pt</a>, <a href="mailto:amjorge@fep.up.pt">amjorge@fep.up.pt</a></p>     <p align="center">(recebido em 20 de Março de 2008; aceite em 22 de Abril de 2008)</p>     <p align="center">&nbsp;</p>     <p align="center">&nbsp;</p>      <p><b>Resumo.</b> Desde muito cedo que a espécie Humana sentiu a necessidade de manter registos da sua actividade, para que possam ser facilmente consultados futuramente. A nossa própria evolução depende, em larga medida, deste processo iterativo em que cada iteração se baseia nestes registos. O aparecimento da web e o seu sucesso incrementaram significativamente a disponibilidade da informação que rapidamente se tornou ubíqua. No entanto, a ausência de controlo editorial origina uma grande heterogeneidade sob vários aspectos. As técnicas tradicionais em recuperação de informação provam ser insuficientes para este novo meio. A recuperação de informação na web é a evolução natural da área de recuperação de informação para o meio web. Neste artigo apresentamos uma análise retrospectiva e, esperamos, abrangente desta área do conhecimento Humano.</p>      <p><b>Palavras-chave:</b> Recuperação de informação na web, motores de pesquisa.</p>     <p>&nbsp;</p>     <p>&nbsp;</p>      ]]></body>
<body><![CDATA[<p><b>Abstract.</b> Human kind felt, since early ages, the need to keep records of its achievements that could persist through time and that could be easily retrieved for later reference. Our own evolution depends largely on this iterative process, where each iteration is based on these records. The advent of the web and its attractiveness highly increased the availability of information which rapidly becomes ubiquituous. However, the lack of editorial control originates high heterogeneity in several ways. The traditional information retrieval techniques face new, challenging problems and prove to be inefficient to deal with web characteristics. In this paper we present a comprehensive and retrospective overview of web information retrieval.</p>         <p><b>Keywords:</b> Web information retrieval, search engines.</p>     <p>&nbsp;</p>       <p>&nbsp;</p>       <p>Texto completo dispon&iacute;vel apenas em PDF.</p>       <p>Full text only available in PDF format.</p>        <p>&nbsp;</p>     <p>&nbsp;</p>     <p><b>References</b></p>      <!-- ref --><p>Aas, K., Eikvil, L. (1999), <i>Text Categorization: A Survey</i>, Norwegian Computing Center &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000022&pid=S1645-9911200800010001800001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p> Aggarwal, C.C., Al-Garawi, F., Yu, P. (2001), <i>Intelligent crawling on the World Wide Web with arbitrary predicates</i>, Proceedings of the 10<sup>th</sup> World Wide Web Conference </p>      <p>Aggarwal, C.C. (2004), <i>On Leveraging User access Patterns for Topic Specific Crawling</i>, Data mining and Knowledge Discovery, 9, pp 123-145, Kluwer Academic Publishers</p>      <p>Apostolico, A., Baeza-Yates, R., Melucci, M. (2006), <i>Advances in information retrieval: an introduction to the special issue</i>, Journal of Information Systems, Elsevier Science Ltd., 31(7), p.569-572</p>      <p>Arcot, H.G.A. (2004) <i>Perception-based fuzzy information retrieval</i>. United States -- California: San Jose State University</p>      <p>Baeza-Yates, R. (2003), Information Retrieval in the Web: beyond current search engines, <i>Elsevier International Journal of Approximate Reasoning</i>, 34, pp 97-104</p>      <p>Baeza-Yates, R., Ribeiro-Neto, B. (1999), <i>Modern Information Retrieval</i>. ACM Press</p>      <p>Baldi, P., Frasconi, P., Smyth, P. (2003), <i>Modeling the Internet and the Web. Probabilistic Methods and Algorithms</i>, Wiley</p>      <p>Beitzel, Steven M. (2006) <i>On understanding and classifying web queries</i>, PhD dissertation USA, Illinois, Illinois Institute of Technology</p>      <p>Bennet, K.P., Demiriz, A. (1998), Semi-Supervised Support Vector Machines, <i>Proceeding of Neural Information Processing Systems</i></p>      <p>Berners-Lee, T. (1989), <i>Information Management: a proposal</i>., CERN</p>      ]]></body>
<body><![CDATA[<p>Berners-Lee, T., Hendler, J., Lassila, O. (2001), The Semantic Web. <i>Scientific American</i></p>      <p>Blum, A., Mitchell, T. (1998), Combining labelled and unlabelled data with Co&#8209;training, <i>Proceedings of the 11<sup>th</sup> Annual Conference on Computational Learning Theory</i>, pp 92-100</p>      <p>Borges, J.L.C.M. (2000), <i>A Data Mining Model to Capture User Web Navigation Patterns</i>, PhD dissertation, University of London</p>      <p>Brin, S., Page, L. (1998), “The anatomy of a large-scale hypertextual web search engine”, <i>Proceedings of the 7<sup>th</sup> World Wide Web Conference</i>, pp 107-117</p>      <p>Broder, A. (2002) A taxonomy of web search. <i>SIGIR Forum</i>. 36:2. p. 3-10</p>      <p>Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J. (2000), <i>Graph structure in the web</i> World Wide Web Conference, Amsterdam, Holand</p>      <p> Broder, A., Maarek, Y., Bharat, K., Dumais, S., Papa, S., Pedersen, J., Raghavan, P.(2005), Current Trends in the Integration of Searching and Browsing, Special interest tracks and posters of the 14th World Wide Web Conference ,<b> </b>Chiba, Japan, p.793</p>      <p>Bruza, P., McArthur, R., Dennis, S. (2000), <i>Interactive Internet search: keyword, directory and query reformulation mechanisms compared</i>, Research and Development in Information Retrieval</p>      <p> Bush, V<i>.</i> (1945), <i>As We May Think</i>, The Atlantic Monthly, July</p>      <p>Carey, M., Kriwaczek, F., Ruger, S.M. (2000), A Visualization Interface for Document Searching and Browsing, <i>Proceedings of the NPIVM 2000</i></p>      ]]></body>
<body><![CDATA[<p>Chakrabarti, S. (2003), Mining the Web. Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers</p>      <p>Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P. (1998a), Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies, <i>The VLDB Journal</i>, 7, pp 163-178</p>      <p>Chakrabarti, S., Dom, B., Indyk, P. (1998b), Enhanced hypertext categorization using hyperlinks, <i>Proceedings of ACM SIGMOD International Conference on Management of data</i>, pp 307-318</p>      <p>Chakrabarti, S., Byron, E., Kumar, S., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J. (1999a), Mining Web Link Structure, <i>IEEE Computer</i>, 32(8), pp 60-67</p>      <p>Chakrabarti, S., Berg, M., Dom, B. (1999b), <i>Focused crawling: a new approach to topic&#8209;specific resource discovery</i>, Proceedings of the 8<sup>th</sup> World Wide Web Conference</p>      <p>Chewar, C.M., Krowne, A., O´Laughlen, M. (2001), <i>User Object Collections: Visualization Concepts by collection-Insight Need</i>, CITIDEL project</p>      <p>Cho, J., Garcia-Molina, H. (2000), <i>Estimating Frequency of Change</i>, Technical report, Stanford University</p>      <p>Cleverdon, C.W. (1991), <i>The significance of the Cranfield tests on index languages</i>, Proceedings of the ACM – SIGIR, p. 3-12</p>      <p>Cleverdon, C.W. (1962), <i>Comparative Efficiency of Indexing Systems</i>, Cranfield</p>      <p>Cleverdon, C.W., Aitchison, J. (1963), <i>Test of the Index of Metallurgical Literature</i>, Cranfield</p>      ]]></body>
<body><![CDATA[<p>Cleverdon, C.W., Thorne, R.G. (1954), <i>An Experiment with the Uniterm System</i>, R.A.E. Cranfield, 7</p>      <p>Codd, E.F. (1970), A Relational Model of Data for Large Shared Data Banks, <i>Communications of the ACM</i>, Vol. 13, No. 6, June 1970, pp. 377-387</p>      <p>Cooley, R., Mobasher, B., Srivastava, J.(1997), Web Mining: Information and Pattern Discovery on the World Wide Web, <i>Proceedings of the 9th IEEE International conference on tools with Artificial Intelligence</i>, pp 558-567</p>      <p>Cormack, G.V., Palmer, C.R, Clarke, C.L.A. (1998), <i>Efficient Construction of Large Test Collections</i>, Proceedings of the ACM SIGIR 1998 Conference</p>      <p> Crestani, F., Shengli, W. (2006), Testing the cluster hypothesis in distributed information retrieval, <i>Information Processing and Management</i>. 42, p. 1137-1150</p>      <p>Crestani, F., Ruthven (2007), I., Introduction to special issue on contextual information retrieval systems. <i>Information Retrieval</i>. 10, p. 111-113</p>      <p>Croft, W.B. (2003), <i>Information retrieval and computer science: an evolving relationship</i>, ACM SIGIR Conference, Toronto, Canada, p.2-3</p>      <p>Cugini, J., Piatko, C., Laskowski, S. (1996), Interactive 3D Visualization for Document Retrieval, <i>Proceedings of the ACM Conference on Information and Knowledge Management</i></p>      <p>Dao, T. (1998), <i>An Indexing Model for Structured Documents to Support Queries on Content, Structure and Attributes</i>, Proceedings of IEEE ADL Conference, Santa Barbara, California, USA</p>      <p> Dewey, M. (2004), <i>A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library</i>, Project Gutenberg EBook</p>      ]]></body>
<body><![CDATA[<p>Domingos, P. (2007), What's missing in AI: The Interface Layer, University of Washington, Washington, USA</p>      <p>Domingos, P., Kok, S., Poon, H., Richardson, M., Singla, P. (2006), Unifying Logical and Statistical AI, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, Massachusetts, USA</p>      <p>Donato, D., Laura, L., Millozi, S. (2000), A beginner’s guide to the Webgraph: Properties, Models and Algorithms,<i> Proceedings of the 41<sup>st</sup> FOCS</i>,&nbsp; pp.57-65</p>      <p>Escudeiro, N., Jorge, A., (2006) Semi-automatic Creation and Maintenance of Web Resources with webTopic. <i>Semantics, Web and Mining</i>. LNCS, vol. 4289, pp. 82-102, Springer, Heidelberg </p>      <p> Glover, E.J., Flake, G.W., Lawrence, S., Birmingham, P., Kruger, A., Giles, C.L., Pennock, D.M. (2001), Improving Category Specific Web Search by Learning Query Modifications, <i>Symposium on Applications and the Internet, IEEE Computer Society</i>, pp 23-31</p>      <p> Gulli, A., Signorini A. (2005), <i>The Indexable Web is More than 11.5 billion pages</i>. In: WWW 2005, Chiba, Japan</p>      <p>Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M. (2003), “Thesus: Organizing Web document collections based on link semantics”, <i>The VLDB Journal</i>, 12, pp 320-332</p>      <p>Hammouda, K.M., Kamel, M.S. (2004), Efficient Phrase-Based Document Indexing for Web Document Indexing. <i>IEEE Transactions on Knowledge and Data Engineering</i>. 16:10, p. 1279-1296</p>      <p>Haveliwala, T.H. (2005), <i>Context-sensitive Web search</i>, PhD dissertation, Stanford University, California, USA</p>      <p>Henzinger, M., Motwani, R., Silverstein, C. (2003), <i>Challenges in Web Search Engines</i>, 18th International Joint Conference on Artificial Intelligence</p>      ]]></body>
<body><![CDATA[<p>Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalaim, M., Ur, S. (1998), The Shark-search algorithm. An application: tailored web site mapping, <i>Computer Networks</i> 30(1-7), pp 317-326</p>      <p>Hu, W., (2002), World Wide Web Search Technologies, <i>Architectural Issues of Web&#8209;Enables Electronis Business</i>, edited by Shi Nansi&nbsp; for Idea Group Publishing</p>      <p>Ifrim, G., Theobald, M., Weikum, G. (2005), Learning Word-to-Concept Mappings for Automatic Text Classification, International Conference on Machine Learning</p>      <p>Jardine, N., Rijsbergen, C.J. (1971), The use of hierarchic clustering in information retrieval, <i>Information Storage and Retrieval</i>, 7(5), pp. 217-240</p>      <p>Joachims, T. (1998), <i>Text Categorization with Support Vector Machines: Learning with Many Relevant Features</i>, Research Report of the unit no. VIII(AI), Computer Science Department of the University of Dortmund</p>      <p>Kandogan, E. (2001), Visualizing Multi-dimensional Clusters, Trends, and Outliers using Star Coordinates, <i>Proceedings of the KDD Conference</i>, San Francisco, Califormia, USA</p>      <p>Kahle, B. (1997), Preserving the internet, <i>Scientific American</i>. 276:3, p. 82-83</p>      <p>Kleinberg, J. (1998), Authoritative sources in a hyperlinked environment, <i>Proceedings of the 9<sup>th</sup> ACM-SIAM Symposium on Discrete Algorithms</i>, pp 668-677</p>      <p>Koller, D., Sahami, M. (1996), Toward Optimal Feature Selection, <i>Proceedings of the 13<sup>th</sup> International Conference on Machine Learning</i>, pp. 284-292, Morgan Kaufmann</p>      <p>Kosala, R., Blockeel, H. (2000), Web Mining Research: A Survey, <i>SIGKDD Explorations</i>, Vol. 2, No. 1, pp 1-13</p>      ]]></body>
<body><![CDATA[<p>Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E. (2000), The Web as a graph, <i>Proceedings of the 19th ACM SIGACT-SIGMOD-AIGART Symp. Principles of Database Systems</i></p>      <p>Lafferty, J., McCallum, A., Pereira, F. (2001), <i>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</i> , 18th International Conference on Machine Learning, 2001</p>      <p>Lawrence, S., Bollacker, K., Giles, C.L. (1999), Indexing and Retrieval of Scientific Literature, <i>Proceedings of the</i> <i>8<sup>th</sup> International Conference on Information and Knowledge Management</i>, pp 139-146</p>      <p>Lewandowski, D. (2005), Web searching, search engines and Information Retrieval. <i>Information Services and Use</i>. 25:3-4/2005, p. 137-147</p>      <p>Li, X., Liu, B. (2003), Learning to classify text with positive and unlabelled data, <i>Proceeding of IJCAI – 2003</i></p>      <p>Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R. (2001), Characterizing Web Document Change, <i>Lecture notes in Computer Science</i></p>      <p>Liu, R. L., Lin, W. J.(2005), Incremental mining of information interest for personalized web scanning, Information Systems journal, 30(8), p. 630-648</p>      <p> Lu, S., Dong, M., Fotouhi, F. (2002), The semantic web: opportunities and challenges for next generation web applications, <i>Information Research</i>, 7 (4)</p>      <p>Mitra, M., Singhal, A., Buckley, C. (1998), Improving automatic query expansion, <i>Proceedings of the 21st ACM SIGIR Conference</i></p>      <p>Nelson, T. (1965), <i>A file structure for the complex, the changing, and the indeterminate</i>, ACM National Conference, 84-100</p>      ]]></body>
<body><![CDATA[<p>Nicola, C., Gaussier, E., Goutte, C., Renders, J. M. (2003), “Word-Sequence Kernels”, <i>Journal of Machine Learning Research</i>, Nº 3, pp 1053-1082</p>      <p>Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M. (2000), Text classification from labeled and unlabeled documents using EM, <i>Machine Learning</i>, 39, pp 103-134</p>      <p>Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G. (1992), Visualization of a Document Collection: The VIBE System, <i>Information Processing &amp; Management</i>, Vol. 29, No. 1, pp 69-81</p>      <p>Orengo, V., Huyck, C. (2001), A Stemming Algorithm for the Portuguese Language, <i>Proceedings of the 8<sup>th</sup> SPIRE</i></p>      <p>O’Reily (2004), <i>Web 2.0</i></p>      <p>Porter, M.F. (1980), “An algorithm for suffix stripping”, <i>Program</i>, 14, No. 3, pp 130&#8209;137</p>      <p>Richardson, M., Domingos, P. (2004) Combining Link and Content Information in Web Search, Washington University, Washington, USA</p>      <p>Rijsbergen, K. (1979), <i>Information Retrieval</i>, Butherworth</p>      <p>Sahami, M. (2004),<i>The happy searcher: Challenges in the web information retrieval</i>, Pacific Rim International Conference on Artificial Intelligence, 3157, p.3-12</p>      <p>Salton, G., Lesk, M.E. (1965), The SMART automatic document retrieval systems - an illustration, <i>Communications of the ACM</i>, 8:6 (June 1965), p.391-398</p>      ]]></body>
<body><![CDATA[<p>Salton, G., McGill, M. (1983), <i>Introduction to Modern Information Retrieval</i>, McGraw-Hill</p>      <p>Salton, Wong, Yang (1975), A vector space model for automatic indexing. <i>Communications of the ACM</i>. 18:11 (1975). p. 613-620</p>      <p>Shen, G. (2005), Formal concepts and applications, PhD dissertation, Case Western Reserve University, Ohio, USA</p>      <p>Shwarzkopf, E. (2003), <i>Personalized Interaction with Semantic Information Portals</i>, German Research Center for Artificial Intelligence</p>      <p>Siddiqui, Tanveer, J. (2006), <i>Intelligent Techniques for Effective Information Retrieval (A Conceptual Graph Based Approach)</i>, ACM SIGIR Forum. 40:2</p>      <p>Spangler, S., Kreulen, J.T., Lessler, J. (2003), <i>Generating and Browsing Multiple Taxonomies Over a Document Collection</i><b>, </b>Journal of Management Information Systems, 19(4), p. 191-212</p>      <p>Viji, S. (2002), <i>Term and Document Correlation and Visualization for a set of Documents</i>, Technical report, Stanford University</p>      <p>Voorhees, E.M. (1998), <i>Variations in Relevance Judgements and the Measurement of Retrieval Effectiveness</i>, Proceedings of the ACM SIGIR 1998 Conference</p>      <p>Wang, J., Lochovsky, F. (2003), “Web Search Engines”, Journal of ACM Computing Survey (accepted for revision)</p>      <p>Wolf, K.E. (1993), <i>A First Course in Formal Concept Analysis</i>, Advances in Statistical Software, 4, p. 429-438</p>      ]]></body>
<body><![CDATA[<p> Yang, Y. (1999), An Evaluation of Statistical Approaches t</p>      <p>o Text Categorization, <i>Journal of Information Retrieval</i>, vol. 1, nos. 1/2, pp 67-88</p>      <p>Yang, Y., Pederson, J. (1997), “A Comparative Study of Feature Selection in Text Categorization”, <i>International Conference on Machine Learning</i></p>      <p>Yang, Y., Slattery, S., Ghani, R. (2002), <i>A Study of Approaches to Hypertext Categorization</i>, Kluwer Academic Publishers, pp. 1-25</p>      <p>Zakos, J., Verma, B. (2006), A Novel Context-based Technique for Web Information Retrieval, World Wide Web, 9(4), p. 485-503</p>      <p>Zamir, O., Etzioni, O. (1999), Grouper: A Dynamic clustering Interface to Web Search Results, <i>Proceedings of the 1999</i> <i>World Wide Web Conference</i></p>       <p><sup>*</sup>&nbsp;Supported by the POSC/EIA/58367/2004/Site-o-Matic Project    (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual    de Unidades de I &amp; D.</p>     <p>&nbsp;</p>          <p><sup><a name="1"></a>[<a href="#top1">1</a>]</sup>&nbsp;DEI-ISEP – Deptº de    Engenharia Informática, Instituto Superior de Engenharia do Porto ; <a href="http://www.dei.isep.ipp.pt" target="_blank">http://www.dei.isep.ipp.pt</a></p>          <p><sup><a name="2"></a>[<a href="#top2">2</a>]</sup>&nbsp;2FEP-UP – Faculdade    de Economia, Universidade do Porto; <a href="http://www.fep.up.pt" target="_blank">http://www.fep.up.pt</a></p>      ]]></body>
<body><![CDATA[<p><a name="3"></a><a href="#top3">•</a> LIAAD, INESC Porto LA – Laboratório de    Inteligência Artificial e Análise de Dados; <a href="http://www.liaad.up.pt" target="_blank">http://www.liaad.up.pt</a></p>      <p>&nbsp;</p>             ]]></body><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aas]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Eikvil]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Text Categorization: A Survey]]></source>
<year>1999</year>
<publisher-name><![CDATA[Norwegian Computing Center]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
