Download - What is Hydra?

Transcript

Page 1: What is Hydra?

WHAT IS HYDRA? Findability Day 2012

Page 2: What is Hydra?

Hydra is technology

Page 3: What is Hydra?

Hydra brings structure

What is unstructured data?

•  A linguistic excuse?

News articles

Plain text that contains invaluable metadata for search, such as:

•  Title

•  Author byline

•  Lead paragraph

Page 4: What is Hydra?

Hydra is about your data

•  Enrich your documents with metadata, to power your search

•  Language detec+on •  Sen+ment analysis

•  Headline extrac+on •  Regular expression matching and extrac+on

•  Filter out unwanted documents

•  Collect statistics

•  Export to Staging environments

Page 5: What is Hydra?

Before Hydra

Page 6: What is Hydra?

Before Hydra

Page 7: What is Hydra?

Hydra scales

Page 8: What is Hydra?

Hydra Design Objectives

Scalability

•  Possible to connect any number of processing machines

Fault tolerance

•  Failiure of a stage affects only a single document

•  Failiures can be automaticly detected

Robustness

•  Stages and nodes are completely independent (no domino-

effect)

Development ease

•  Allow test driven pipeline development

Page 9: What is Hydra?

What about Hadoop and Big Data?

Usecases for document enrichment

•  Pagerank •  Analy+cs Hadoop & Map/Reduce advantages •  Huge scalability •  Ability to work on en+re document set at once

Hadoop & Map/Reduce drawbacks •  Batch processing •  Time-‐to-‐index

Page 10: What is Hydra?

Hydra integrated with Hadoop

Blue – First round of indexing only Red – Second round of indexing Purple – All documents

Page 11: What is Hydra?

Hydra in summary

Hydra

•  can chew through almost anything

•  has many heads

•  regenerates

•  scales

Page 12: What is Hydra?

Hydra is Open Source

•  Other committers

•  The role of Findwise

For more information:

•  http://www.findwise.com/hydra

•  http://findwise.github.com/Hydra

•  Email: [email protected]

Page 13: What is Hydra?

Joel Westberg [email protected]

@joelwes

Top Related

APEGA Outline Outline Acronyms Acronyms What is APEGA What is APEGA What is P.Eng. What is P.Eng. Who May Apply Who May Apply Why Register.

APEGA Outline Outline Acronyms Acronyms What is APEGA What is APEGA What is P.Eng. What is P.Eng. Who May Apply Who May Apply Why Register.

What day is it today? What day was it yesterday? What is the date today? What was the date yesterday?

What day is it today? What day was it yesterday? What is the date today? What was the date yesterday?

What is eTwinning?

What is eTwinning?

What is ReadySet? This is not! What is ReadySet? This is ReadySet!

What is ReadySet? This is not! What is ReadySet? This is ReadySet!

What is ahiruyaki?

What is ahiruyaki?

Languages

Pages

Legal

Copyright © 2022 FDOCUMENT