OFICINA Escrita’de’Ar1gos’ - ic.uff.brvanessa/papers/wtdbd-2014.pdf · –...
Transcript of OFICINA Escrita’de’Ar1gos’ - ic.uff.brvanessa/papers/wtdbd-2014.pdf · –...
WTDBD 2014 Vanessa Braganholo 2
h9p://www.ic.uff.br/~vanessa
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de ArMgos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 3
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de ArMgos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 4
Por quê escrever ArMgos? 1) Divulgar conhecimento
Ciência avança através de pesquisas
Pesquisas são divulgadas através de publicações
WTDBD 2014 Vanessa Braganholo 5
De que adianta...
Descobrir a cura da AIDS Se ninguém ficar sabendo disso?
WTDBD 2014 Vanessa Braganholo 6
Por quê escrever ArMgos? 2) Dinheiro
Sem dinheiro
Não há pesquisa
WTDBD 2014 Vanessa Braganholo 7
Dinheiro pra quê? • Bolsas • Professores • Papel para a impressora • Equipamentos para o laboratório • Limpeza das salas • Sabonete para o banheiro • Livros para biblioteca • etc…
WTDBD 2014 Vanessa Braganholo 8
U1 U2 U3 U4 Un … Universidades Públicas e Privadas
Governo Federal
Capes CNPq
Distribuição de verbas
WTDBD 2014 Vanessa Braganholo 9
Distribuição de verbas
• Não é uniforme… – Cada curso de Pós-‐Graduação recebe recursos de acordo com sua nota na avaliação da Capes
U1 U2 U3 U4 Un … Universidades Públicas e Privadas
WTDBD 2014 Vanessa Braganholo 10
Avaliação da Capes • Avalia todos os cursos de Pós-‐Graduação do país • Feita a cada 3 anos • Nota de 1 a 7
WTDBD 2014 Vanessa Braganholo 11
Critérios de Avaliação • Definidos em um documento chamado Documento de Área hhp://qualis.capes.gov.br/webqualis/ menu “documentos de área”)
• Documento de Área da Computação define como as notas para os cursos de Computação são calculadas
• Uma parte considerável (35%) da nota vem da Produção Intelectual do Programa de Pós-‐Graduação (PUBLICAÇÕES!!)
WTDBD 2014 Vanessa Braganholo 12
Qualidade dos veículos de publicação
• Comitê de Computação da Capes mede a qualidade através de vários critérios – Congressos: h-‐index calculado com o auxílio do Google Scholar
– Periódicos: h-‐index calculado com o auxílio do Google Scholar (HG) e JCR
WTDBD 2014 Vanessa Braganholo 13
O que é h-‐index? • H-‐index = N se os N arMgos mais citados possuem ao menos N citações – Serve para periódicos, congressos, pesquisadores
WTDBD 2014 Vanessa Braganholo 14
WTDBD 2014 Vanessa Braganholo 15
h9p://shine.icomp.ufam.edu.br/
WTDBD 2014 Vanessa Braganholo 16
h9p://scholar.google.com.br/
Qualidade dos veículos de publicação
Congressos e periódicos:
• Recebem uma nota entre A1, A2, B1, B2, B3, B4, B5 e C
• A Computação é a única área que tem Qualis para congressos
WTDBD 2014 Vanessa Braganholo 17
WTDBD 2014 Vanessa Braganholo 18
h9p://qualis.capes.gov.br/webqualis/
WTDBD 2014 Vanessa Braganholo 19
Qualis • Deve ser usado para avaliar cursos, e não pessoas
• Consulte sempre seu orientador para saber onde publicar – Ele conhece a área e sabe quais são os melhores fóruns
WTDBD 2014 Vanessa Braganholo 20
Mas... • Como ficar sabendo quais congressos estão com datas de submissão abertas?
• Chamadas de Trabalhos – áreas de interesse do evento – formato que o arMgo deve ter – datas limite de submissão – composição do comitê de programa – etc
WTDBD 2014 Vanessa Braganholo 21
Listas de email • Lista da SBC
– hhps://grupos.ufrgs.br/mailman/lisMnfo/Sbc-‐l/
WTDBD 2014 Vanessa Braganholo 22
Listas de email • Listas internacionais específicas de cada área
– DBWORLD (área de Banco de Dados) • hhp://www.cs.wisc.edu/dbworld/
– SEWORLD (área de Engenharia de Sowware) • hhp://serl.cs.colorado.edu/~serl/seworld/
– ...
WTDBD 2014 Vanessa Braganholo 23
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de ArMgos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 24
Congresso
Autor
Chair
Comitê de Programa Revisores
WTDBD 2014 Vanessa Braganholo 25
Congresso Submissão
Autor
Chair
ANTES DO DEADLINE
WTDBD 2014 Vanessa Braganholo 26
Congresso
Autor
Chair
3 Revisores
Distribuição
WTDBD 2014 Vanessa Braganholo 27
Congresso
Autor
Chair
Revisores
3 Avaliações
WTDBD 2014 Vanessa Braganholo 28
Congresso Resultado
Autor
Chair
WTDBD 2014 Vanessa Braganholo 29
Congresso Resultado
Autor
Chair
ACEITO RECUSADO REBUTTAL*
WTDBD 2014 Vanessa Braganholo 30
Periódico
Autor
Editor
Corpo de Revisores
WTDBD 2014 Vanessa Braganholo 31
Periódico Submissão
Autor
Editor
A QUALQUER TEMPO
(FLUXO CONTÍNUO)
WTDBD 2014 Vanessa Braganholo 32
Periódico
Autor
Editor
Distribuição
3 Revisores
WTDBD 2014 Vanessa Braganholo 33
Periódico
Autor
Editor
Revisores
3 Avaliações
WTDBD 2014 Vanessa Braganholo 34
Periódico Resultado
Autor
Editor
WTDBD 2014 Vanessa Braganholo 35
Periódico Resultado
Autor
Editor
ACEITO RECUSADO
MAJOR REVIEW MINOR REVIEW
WTDBD 2014 Vanessa Braganholo 36
Periódico Submissão Revisada + Carta de Resposta
Autor
Editor
DENTRO DA DATA ESTIPULADA PELO
EDITOR
WTDBD 2014 Vanessa Braganholo 37
WTDBD 2014 Vanessa Braganholo 38
Submissão
Avaliação
Resposta
Reformulação
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de ArMgos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 39
ObjeMvo • Responder os comentários dos revisores
– Frisar o que foi feito para atender cada comentário
WTDBD 2014 Vanessa Braganholo 40
META: Evitar que o revisor leia o ar1go todo novamente
WTDBD 2014 Vanessa Braganholo 41
Response to Reviewers Dear editors and reviewers We would like to thank you for your valuable suggesMons. We have taken them into account to improve the paper in several ways. In the following, we explain the acMons we took to address each of your comments.
WTDBD 2014 Vanessa Braganholo 42
Introdução agradecendo
-‐ For beher readability and clarity, the organizaMon of the paper should be changed to break long, wordy secMons into shorter subsecMons. We have followed your suggesMon and divided SecMon 3, which now is SecMon 4 into several subparagraphs that act as subsecMons. SecMon 4.1 now has two subparagraphs, and secMon 4.2 has four subparagraphs.
WTDBD 2014 Vanessa Braganholo 43
Repita o que o revisor disse. Explique o que foi feito para atender cada comentário
-‐ The paper compares fragmentaMon in relaMonal databases and XML databases in various parts of the paper. I have found this a bit confusing. Is it really necessary? The paper could instead focus directly on XML fragmentaMon. The work in relaMonal data fragmentaMon is probably more mature and well established. So, maybe the contrast between the two could be brought up when the authors are discussing the open research challenges in XML fragmentaMon. Because it is well established, we believe that keeping the analogy with the relaMonal model is beneficial. We are not comparing but rather using it as a reference model. It helps the readers to establish a parallel between the concepts in the two models (from relaMonal to XML). Because of this, we kept SecMon 2 as an informal grasp into XML fragmentaMon, and then in SecMon 3 we introduce formal definiMons for XML fragmentaMon.
WTDBD 2014 Vanessa Braganholo 44
Discorde quando achar apropriado, mas explique os mo1vos
In secMon 5 (discussion), a Mmeline shows the appearance of the different approaches. However, I think that what is important here is to understand the evoluMon of the approaches by analyzing how each approach has improved the previous one. Therefore, instead of just menMoning the works one could menMon the involved features within the Mmeline. I also understand that what I am suggesMng is not easy to implement in a latex table. In fact, this is a great suggesMon, we have followed it and we really think it improved the paper a lot. We have changed the figure to include lanes that group approaches that follow the same principles. Besides, we added arrows from derived approaches to the approach that originated them. Features are discussed on Tables 1 and 2.
WTDBD 2014 Vanessa Braganholo 45
Elogie as boas sugestões
Response to Reviewers The authors would like to thank the reviewers for their suggesMons, which we have addressed as follows. REVIEWER 1 C1: Summary of the arMcle C2: JusMfied in the introducMon C3: Explained in SecMon 3 C4: Future work
WTDBD 2014 Vanessa Braganholo 46
Response to Reviewers The authors would like to thank the reviewers for their suggesMons, which we have addressed as follows. REVIEWER 1 C1: Summary of the arMcle C2: JusMfied in the introducMon C3: Explained in SecMon 3 C4: Future work
WTDBD 2014 Vanessa Braganholo 47
Avaliador de mal humor em 3, 2, 1...
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de Ar1gos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 48
Formato • Definido pelo veículo para o qual o arMgo será submeMdo
• Ver regras no CALL FOR PAPERS
WTDBD 2014 Vanessa Braganholo 49
For the 2014 ediMon, SBBD will accept two kinds of submissions: JIDM ar1cles. Submissions in this category should present interesMng results or novel thought-‐provoking ideas in any of the topics of interest. They must be wrihen in English and not exceed 12 pages, according to the format of the Journal of InformaMon and Data Management (JIDM). The templates for submission are available at hhp://bit.ly/1eil5co. ArMcles must be submihed by the deadline through EasyChair at hhps://www.easychair.org/conferences/?conf=jidmvol5 SBBD proceedings. Submissions in this category should present iniMal results or on-‐going work in any of the topics of interest. They must be wrihen in Portuguese or English and not exceed 10 pages, according to the format of the SBC. The templates for submission are available at hhp://bit.ly/1coeoWk. Papers must be submihed by the deadline through EasyChair website at hhps://www.easychair.org/conferences/?conf=sbbd2014 WTDBD 2014 Vanessa Braganholo 50
Uma Questão de EsMlo... • Congressos e periódicos fornecem esMlos prontos para serem usados na redação do arMgo
• Não tentem fazer tudo do zero • Baixem e usem o template!
WTDBD 2014 Vanessa Braganholo 51
52
SBC ACM
IEEE LNCS
WTDBD 2014 Vanessa Braganholo
Essencial (!!!)
WTDBD 2014 Vanessa Braganholo 53
Passos para a escrita de arMgo
CONTEXTO
WTDBD 2014 Vanessa Braganholo 54
Passos para a escrita de arMgo PROBLEMA
CONTEXTO
WTDBD 2014 Vanessa Braganholo 55
Passos para a escrita de arMgo PROBLEMA
OBJETIVOS
entendem e descrevem
CONTEXTO
WTDBD 2014 Vanessa Braganholo 56
Passos para a escrita de arMgo PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
entendem e descrevem
definem
CONTEXTO
WTDBD 2014 Vanessa Braganholo 57
Passos para a escrita de arMgo PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
RESULTADOS
entendem e descrevem
definem
que fornece
CONTEXTO
WTDBD 2014 Vanessa Braganholo 58
Passos para a escrita de arMgo PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
RESULTADOS
entendem e descrevem
definem
que fornece
para
obter a solução
do
CONTEXTO
WTDBD 2014 Vanessa Braganholo 59
Passos para a escrita de arMgo PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
RESULTADOS
entendem e descrevem
definem
que fornece
para
obter a solução
do
CONTEXTO
WTDBD 2014 Vanessa Braganholo 60
Onde Vender o Peixe?
WTDBD 2014 Vanessa Braganholo 61
Título Resumo Introdução Conclusão
Título • Menor resumo de um arMgo (by Leonardo Murta) • TEM que vender o trabalho
WTDBD 2014 Vanessa Braganholo 62
WTDBD 2014 Vanessa Braganholo 63
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Of snowstorms and Bushy trees
Rafi Ahmed, Rajkumar Sen, Meikel Poess, Sunil Chakkappen. Of Snowstorms and Bushy Trees. PVLDB 7(13), 2014: 1452-‐1461
Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996: 103-‐114
WTDBD 2014 Vanessa Braganholo 64
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Of snowstorms and Bushy trees
Rafi Ahmed, Rajkumar Sen, Meikel Poess, Sunil Chakkappen. Of Snowstorms and Bushy Trees. PVLDB 7(13), 2014: 1452-‐1461
Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996: 103-‐114
Onde Vender o Peixe?
WTDBD 2014 Vanessa Braganholo 65
Título Resumo Introdução Conclusão
Resumo
• Um único parágrafo (50 a 200 palavras) • Função: dizer aos leitores se vale a pena ler o
restante do artigo • Atenha-se ao essencial
Zobel, 2001 WTDBD 2014 Vanessa Braganholo 66
Resumo NÃO deve ter...
• Fórmulas matemáticas • Descrição da organização do texto • Acrônimos • Abreviaturas • Referências (só em casos muito específicos)
Zobel, 2001 WTDBD 2014 Vanessa Braganholo 67
Resumo
WTDBD 2014 Vanessa Braganholo 68
PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
RESULTADOS
entendem e descrevem
definem
que fornece
para
obter a solução
do
CONTEXTO
Resumo
WTDBD 2014 Vanessa Braganholo 69
PROBLEMA
OBJETIVOS
UMA ABORDAGEM (solução)
RESULTADOS
entendem e descrevem
definem
que fornece
para
obter a solução
do
CONTEXTO
RESUMO PERFEITO =
Uma ou duas frases para cada um desses itens
Entregue o ouro...
Espaço necessário em memória pode ser reduzido em 60%
Espaço necessário em memória pode ser reduzido significaMvamente
Zobel, 2001 WTDBD 2014 Vanessa Braganholo 70
Entregue o ouro...
Nós definimos um novo algoritmo de inversão baseado em listas “mover
para frente”
Nós definimos um novo algoritmo de inversão
Zobel, 2001 WTDBD 2014 Vanessa Braganholo 71
Many applica1ons rely on Web data and extrac1on systems to accomplish knowledge-‐driven tasks. Web informaMon is not curated, so many sources provide inaccurate, or conflicMng informaMon. Moreover, extracMon systems introduce addiMonal noise to the data. We wish to automaMcally disMnguish correct data and erroneous data for creaMng a cleaner set of integrated data. Previous work has shown that a naïve voMng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlaMon between sources can be much broader than copying: sources may provide data from complementary domains (negaMve correlaMon), extractors may focus on different types of informaMon (negaMve correlaMon), and extractors may apply common rules in extracMon (posiMve correlaMon, without copying). In this paper we present novel techniques modeling correlaMons between sources and applying it in truth finding. We provide a comprehensive evaluaMon of our approach on three real-‐world datasets with different characterisMcs, as well as on syntheMc data, showing that our algorithms outperform the exisMng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 72 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Many applicaMons rely on Web data and extracMon systems to accomplish knowledge-‐driven tasks. Web informa1on is not curated, so many sources provide inaccurate, or conflic1ng informa1on. Moreover, extrac1on systems introduce addi1onal noise to the data. We wish to automaMcally disMnguish correct data and erroneous data for creaMng a cleaner set of integrated data. Previous work has shown that a naïve voMng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlaMon between sources can be much broader than copying: sources may provide data from complementary domains (negaMve correlaMon), extractors may focus on different types of informaMon (negaMve correlaMon), and extractors may apply common rules in extracMon (posiMve correlaMon, without copying). In this paper we present novel techniques modeling correlaMons between sources and applying it in truth finding. We provide a comprehensive evaluaMon of our approach on three real-‐world datasets with different characterisMcs, as well as on syntheMc data, showing that our algorithms outperform the exisMng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 73 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Many applicaMons rely on Web data and extracMon systems to accomplish knowledge-‐driven tasks. Web informaMon is not curated, so many sources provide inaccurate, or conflicMng informaMon. Moreover, extracMon systems introduce addiMonal noise to the data. We wish to automa1cally dis1nguish correct data and erroneous data for crea1ng a cleaner set of integrated data. Previous work has shown that a naïve voMng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlaMon between sources can be much broader than copying: sources may provide data from complementary domains (negaMve correlaMon), extractors may focus on different types of informaMon (negaMve correlaMon), and extractors may apply common rules in extracMon (posiMve correlaMon, without copying). In this paper we present novel techniques modeling correlaMons between sources and applying it in truth finding. We provide a comprehensive evaluaMon of our approach on three real-‐world datasets with different characterisMcs, as well as on syntheMc data, showing that our algorithms outperform the exisMng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 74 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Many applicaMons rely on Web data and extracMon systems to accomplish knowledge-‐driven tasks. Web informaMon is not curated, so many sources provide inaccurate, or conflicMng informaMon. Moreover, extracMon systems introduce addiMonal noise to the data. We wish to automaMcally disMnguish correct data and erroneous data for creaMng a cleaner set of integrated data. Previous work has shown that a naïve vo1ng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correla1on between sources can be much broader than copying: sources may provide data from complementary domains (nega1ve correla1on), extractors may focus on different types of informa1on (nega1ve correla1on), and extractors may apply common rules in extrac1on (posi1ve correla1on, without copying). In this paper we present novel techniques modeling correlaMons between sources and applying it in truth finding. We provide a comprehensive evaluaMon of our approach on three real-‐world datasets with different characterisMcs, as well as on syntheMc data, showing that our algorithms outperform the exisMng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 75 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Many applicaMons rely on Web data and extracMon systems to accomplish knowledge-‐driven tasks. Web informaMon is not curated, so many sources provide inaccurate, or conflicMng informaMon. Moreover, extracMon systems introduce addiMonal noise to the data. We wish to automaMcally disMnguish correct data and erroneous data for creaMng a cleaner set of integrated data. Previous work has shown that a naïve voMng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlaMon between sources can be much broader than copying: sources may provide data from complementary domains (negaMve correlaMon), extractors may focus on different types of informaMon (negaMve correlaMon), and extractors may apply common rules in extracMon (posiMve correlaMon, without copying). In this paper we present novel techniques modeling correla1ons between sources and applying it in truth finding. We provide a comprehensive evaluaMon of our approach on three real-‐world datasets with different characterisMcs, as well as on syntheMc data, showing that our algorithms outperform the exisMng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 76 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Many applicaMons rely on Web data and extracMon systems to accomplish knowledge-‐driven tasks. Web informaMon is not curated, so many sources provide inaccurate, or conflicMng informaMon. Moreover, extracMon systems introduce addiMonal noise to the data. We wish to automaMcally disMnguish correct data and erroneous data for creaMng a cleaner set of integrated data. Previous work has shown that a naïve voMng that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlaMon between sources can be much broader than copying: sources may provide data from complementary domains (negaMve correlaMon), extractors may focus on different types of informaMon (negaMve correlaMon), and extractors may apply common rules in extracMon (posiMve correlaMon, without copying). In this paper we present novel techniques modeling correlaMons between sources and applying it in truth finding. We provide a comprehensive evalua1on of our approach on three real-‐world datasets with different characteris1cs, as well as on synthe1c data, showing that our algorithms outperform the exis1ng state-‐of-‐the-‐art techniques.
WTDBD 2014 Vanessa Braganholo 77 Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correlations. SIGMOD 2014: 433-444
Finding useful pa9erns in large datasets has a9racted considerable interest recently, and one of the most widely studied problems in this area is the idenMficaMon of clusters, or densely populated regions, in a mulM-‐dimension dataset. Prior work does not adequately address the problem of large datasets and minimizaMon of I/O costs. This paper presents a data clustering method named BIRCH (Balanced IteraMve Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming mulM-‐dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and Mme constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few addiMonal scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pahern) effecMvely. We evaluate BIRCH’s Mme/space efficiency, data input order sensiMvity, and clustering quality through several experiments. We also present a performance comparison of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
WTDBD 2014 Vanessa Braganholo 78 Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very
Large Databases. SIGMOD 1996: 103-114
Finding useful paherns in large datasets has ahracted considerable interest recently, and one of the most widely studied problems in this area is the iden1fica1on of clusters, or densely populated regions, in a mul1-‐dimension dataset. Prior work does not adequately address the problem of large datasets and minimiza1on of I/O costs. This paper presents a data clustering method named BIRCH (Balanced IteraMve Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming mulM-‐dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and Mme constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few addiMonal scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pahern) effecMvely. We evaluate BIRCH’s Mme/space efficiency, data input order sensiMvity, and clustering quality through several experiments. We also present a performance comparison of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
WTDBD 2014 Vanessa Braganholo 79 Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very
Large Databases. SIGMOD 1996: 103-114
Finding useful paherns in large datasets has ahracted considerable interest recently, and one of the most widely studied problems in this area is the idenMficaMon of clusters, or densely populated regions, in a mulM-‐dimension dataset. Prior work does not adequately address the problem of large datasets and minimizaMon of I/O costs. This paper presents a data clustering method named BIRCH (Balanced Itera1ve Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming mulM-‐dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and Mme constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few addiMonal scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pahern) effecMvely. We evaluate BIRCH’s Mme/space efficiency, data input order sensiMvity, and clustering quality through several experiments. We also present a performance comparison of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
WTDBD 2014 Vanessa Braganholo 80 Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very
Large Databases. SIGMOD 1996: 103-114
Finding useful paherns in large datasets has ahracted considerable interest recently, and one of the most widely studied problems in this area is the idenMficaMon of clusters, or densely populated regions, in a mulM-‐dimension dataset. Prior work does not adequately address the problem of large datasets and minimizaMon of I/O costs. This paper presents a data clustering method named BIRCH (Balanced IteraMve Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming mul1-‐dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and 1me constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few addi1onal scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pa9ern) effec1vely. We evaluate BIRCH’s Mme/space efficiency, data input order sensiMvity, and clustering quality through several experiments. We also present a performance comparison of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
WTDBD 2014 Vanessa Braganholo 81 Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very
Large Databases. SIGMOD 1996: 103-114
Finding useful paherns in large datasets has ahracted considerable interest recently, and one of the most widely studied problems in this area is the idenMficaMon of clusters, or densely populated regions, in a mulM-‐dimension dataset. Prior work does not adequately address the problem of large datasets and minimizaMon of I/O costs. This paper presents a data clustering method named BIRCH (Balanced IteraMve Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming mulM-‐dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and Mme constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few addiMonal scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pahern) effecMvely. We evaluate BIRCH’s 1me/space efficiency, data input order sensi1vity, and clustering quality through several experiments. We also present a performance comparison of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
WTDBD 2014 Vanessa Braganholo 82 Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very
Large Databases. SIGMOD 1996: 103-114
Onde Vender o Peixe?
WTDBD 2014 Vanessa Braganholo 83
Título Resumo Introdução Conclusão
Introdução • Mesma estrutura do resumo • Transformar cada parte do resumo em um parágrafo ou mais
• Incluir um parágrafo no final com a estrutura do ar1go
WTDBD 2014 Vanessa Braganholo 84
Introdução deve Responder • Qual é o problema que o seu trabalho pretende resolver?
• Por que os trabalhos existentes não resolvem esse problema?
• Como o seu trabalho aborda o problema? • Quais resultados experimentais seu trabalho obteve?
• Como seu trabalho está organizado?
WTDBD 2014 Vanessa Braganholo 85
Fonte: Leonardo Murta
Análise de Introdução de ArMgos • Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correla1ons. SIGMOD 2014: 433-‐444
• Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996: 103-‐114
WTDBD 2014 Vanessa Braganholo 86
Organização do ArMgo • Agora é a hora de planejar o restante da escrita!
WTDBD 2014 Vanessa Braganholo 87
Estrutura 1. Introdução 2. Background (*)
3. Trabalhos Relacionados 4. Abordagem
5. Avaliação (*)
6. Conclusão
WTDBD 2014 Vanessa Braganholo 88
Estrutura 1. Introdução 2. Background (*)
3. Trabalhos Relacionados 4. Abordagem
5. Avaliação (*)
6. Conclusão Agradecimentos (*)
Referências WTDBD 2014 Vanessa Braganholo 89
Análise de Estrutura de ArMgos • Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correla1ons. SIGMOD 2014: 433-‐444
• Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996: 103-‐114
WTDBD 2014 Vanessa Braganholo 90
Onde Vender o Peixe?
WTDBD 2014 Vanessa Braganholo 91
Título Resumo Introdução Conclusão
Conclusão • Parágrafo inicial: Resumo do que foi apresentado no arMgo
• Frisar as contribuições • DiscuMr limitações • DiscuMr trabalhos futuros
WTDBD 2014 Vanessa Braganholo 92
Análise de Conclusão de ArMgos • Pochampally, Das Sarma, Luna Dong, Meliou, Srivastava. Fusing data with correla1ons. SIGMOD 2014: 433-‐444
• Zhang, Ramakrishnan, Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996: 103-‐114
WTDBD 2014 Vanessa Braganholo 93
Reforçando a MoMvação...
WTDBD 2014 Vanessa Braganholo 94
Trabalhos Relacionados
Trabalhos Relacionados • Lição de casa que deve ser feita ANTES da elaboração da abordagem – Quem são seus concorrentes? – Quais problemas eles deixam em aberto? – Qual a relação deles entre si?
WTDBD 2014 Vanessa Braganholo 95
Como descobrir? • Uso de ferramentas como bibliotecas digitais (ACM, IEEE)
• Uso de ferramentas que controlam citações – Google Acadêmico – CiteSeer
WTDBD 2014 Vanessa Braganholo 96
WTDBD 2014 Vanessa Braganholo 97
WTDBD 2014 Vanessa Braganholo 98
ObjeMvos da Revisão da Literatura • Evidenciar que você conhece o que existe na área • Posicionar-‐se em relação ao que já existe • Evidenciar lacunas que seu trabalho preenche
WTDBD 2014 Vanessa Braganholo 99
Tipos de Revisão da Literatura a Serem Evitados
• Summa – Apresenta “toda” a produção científica da cultura
ocidental e oriental – Tentativa de esgotar o assunto
• Patchwork – Colagem de conceitos, pesquisas de diversos autores
sem um fio condutor – Não se percebe planejamento ou sistematização – Autor Perdido
WTDBD 2014 Vanessa Braganholo 100
Tipos de Revisão da Literatura a Serem Evitados
• Suspense – Há um roteiro, mas não se sabe onde o autor quer
chegar, nem qual a ligação dos fatos com o tema do estudo
– Cortina de fumaça
• Caderno B – Texto leve que trata os assuntos mais complexos de
modo ligeiro, sem aprofundamentos cansativos – Coleção Primeiros Passos
WTDBD 2014 Vanessa Braganholo 101
Tipos de Revisão da Literatura a Serem Evitados
• Cronista Social – Independente de sua importância no contexto do
trabalho, cita autores da moda • Off the records
– Autor garante o anonimato das fontes: não revela os autores
• sabe-se... • muitos autores... • vários estudos... • tem sido observado
WTDBD 2014 Vanessa Braganholo 102
Tipos de Revisão da Literatura a Serem Evitados
• Ventríloquo – Autor só fala pela boca dos outros
• Para Fulano... • Segundo Beltrano... • Fulano afirma... • Sicrano observa...
– Não há criticas ou posicionamento
WTDBD 2014 Vanessa Braganholo 103
Durante a Redação • GaranMr que todo trabalho citado aparece nas referências bibliográficas
• Todas as referências foram citadas
WTDBD 2014 Vanessa Braganholo 104
Cuidado na construção das frases Qi et al. [22] constructed a graphical model that clusters dependent sources into groups and measures the quality of each group as a whole (instead of each individual source).
WTDBD 2014 Vanessa Braganholo 105
Nome do autor faz parte da frase
Cuidado na construção das frases CorrelaMon between sources are studied in two bodies of works. First, copy detecMon has been surveyed in [10] for various types of data and studied in [3, 5, 6, 7, 16] for structured data. […] Second, […]
WTDBD 2014 Vanessa Braganholo 106
Reescrevendo CorrelaMon between sources are studied in two bodies of works. First, copy detecMon has been surveyed for various types of data [10] and for structured data [3, 5, 6, 7, 16]. […] Second, […]
WTDBD 2014 Vanessa Braganholo 107
Uso de Sowware de Apoio • Latex
– JabRef/Bibtex • Word
– Zotero – Reference Manager – EndNote
WTDBD 2014 Vanessa Braganholo 108
Demonstração • Zotero
WTDBD 2014 Vanessa Braganholo 109
Questões Secundárias • Correção GramaMcal • Figuras • Tabelas • Formatação
WTDBD 2014 Vanessa Braganholo 110
Planejamento • Diversas formas de se escrever um arMgo • Escolha a que funciona melhor para você
– Sequencia sugerida aqui; ou – Começar pela estrutura do arMgo e ir construindo o todo a parMr daí (técnica de redação através de scripts); ou
– …
WTDBD 2014 Vanessa Braganholo 111
Agenda • Importância de Publicar • Fluxo de Publicação: Congresso x Periódico • Carta de Resposta aos Revisores • Redação de ArMgos • Considerações Finais
WTDBD 2014 Vanessa Braganholo 112
Um bom arMgo
Início
Fim
WTDBD 2014 Vanessa Braganholo 113
Um ArMgo Ruim Início
Fim
WTDBD 2014 Vanessa Braganholo 114
Um ArMgo Ruim Início
Fim
WTDBD 2014 Vanessa Braganholo 115
Comentários dos revisores • Leve-‐os em consideração na preparação da versão final do arMgo – Na maioria das vezes, são comentários perMnentes, mesmo que o revisor não tenha sabido se expressar bem
– Não adianta se revoltar com os comentários. – Mãos à obra!
WTDBD 2014 Vanessa Braganholo 116
WTDBD 2014 Vanessa Braganholo 117
Não Perca!
WTDBD 2014 Vanessa Braganholo 118
Como Apresentar Resultados de Pesquisa
AMANHÃ, 16h
WTDBD 2014 Vanessa Braganholo 119
hhps://www.facebook.com/gpbduff
Venha fazer Mestrado e Doutorado na UFF!!
Vanessa Braganholo [email protected]
Universidade Federal Fluminense