Similiaridade de texto

1006 palavras 5 páginas
Recognizing Text Similarity

Ozlem Uzuner, Randall Davis & Boris Katz Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139 http://www.ai.mit.edu

@ MIT

The Problem: There are a variety of circumstances under which it would be useful to be able to determine that two documents contain similar text, including detecting plagiarism and copyright infringement, and filtering and organizing documents returned as matches to a query by a search engine. The vast amount of digital information available on the Web makes it necessary to deal with all of these issues. The ease of copying facilitates both plagiarism and copyright infringement, while the volume of information available increases the difficulty of finding the right information quickly. Motivation: Automatic text similarity detectors can help identify plagiarism and copyright infringement and help reduce the abuse and misuse of electronic content. In addition, they can make information discovery more intuitive and less time consuming. Related Work in Text Similarity Recognition: Existing text similarity detection systems recognize verbatim similarities between documents but do not pay attention to similarity in expression. SCAM [4, 5], developed in the Stanford Digital Library looks for verbatim copies of text documents by fingerprinting documents and checking these fingerprints against a repository of previously known fingerprints. SCAM looks for overlaps between verbatim text strings to identify partial similarity. We want to detect non-verbatim similarity by measuring similarity of expression. We are particularly interested in identifying documents that are paraphrases of each other and that express the same content in the same way. Related Work in Rhetorical Structure Theory: The main idea of Rhetorical structure theory (RST) [2] is to model the discourse structure of a text with a hierarchical tree diagram that uses rhetorical relations such as sequence,

Relacionados

  • As limitações do método comparativo da antropologia
    826 palavras | 4 páginas
  • macanismos de coesao
    759 palavras | 4 páginas
  • Família e condições feminina em MG no séc.XVIII
    1168 palavras | 5 páginas
  • Jornal humor
    2635 palavras | 11 páginas
  • METODOLOGIAS DE PESQUISA APLICADAS ÀS REDES SOCIAIS O USO DA NETNOGRAFIA COMBINADA COM A TEORIA FUNDAMENTADA NO TWITTER DA OKTOBERFEST BLUMENAU
    14889 palavras | 60 páginas
  • Sql dados
    6740 palavras | 27 páginas
  • Porto de santos
    12772 palavras | 52 páginas
  • UMA HISTÓRIA CRÍTICA DO FOTOJORNALISMO OCIDENTAL
    109831 palavras | 440 páginas
  • boletim paulista de geografia 84
    50239 palavras | 201 páginas
  • A república - platão
    41102 palavras | 165 páginas