Welcome to IEPY’s documentation!¶
IEPY is an open source tool for Information Extraction focused on Relation Extraction.
To give an example of Relation Extraction, if we are trying to find a birth date in:
“John von Neumann (December 28, 1903 – February 8, 1957) was a Hungarian and American pure and applied mathematician, physicist, inventor and polymath.”
then IEPY’s task is to identify “John von Neumann
” and
“December 28, 1903
” as the subject and object entities of the “was born in
”
relation.
- It’s aimed at:
- users needing to perform Information Extraction on a large dataset.
- scientists wanting to experiment with new IE algorithms.
You can follow the development of this project and report issues at http://github.com/machinalis/iepy or join the mailing list here
Features¶
- A corpus annotation tool with a web-based UI
- An active learning relation extraction tool pre-configured with convenient defaults.
- A rule based relation extraction tool for cases where the documents are semi-structured or high precision is required.
- A web-based user interface that:
- Allows layman users to control some aspects of IEPY.
- Allows decentralization of human input.
- A shallow entity ontology with coreference resolution via Stanford CoreNLP
- An easily hack-able active learning core, ideal for scientist wanting to experiment with new algorithms.
Contents:¶
Authors¶
IEPY is © 2014 Machinalis in collaboration with the NLP Group at UNC-FaMAF. Its primary authors are:
- Rafael Carrascosa <rcarrascosa@machinalis.com> (rafacarrascosa at github)
- Javier Mansilla <jmansilla@machinalis.com> (jmansilla at github)
- Gonzalo García Berrotarán <ggarcia@machinalis.com> (j0hn at github)
- Franco M. Luque <francolq@famaf.unc.edu.ar> (francolq at github)
- Daniel Moisset <dmoisset@machinalis.com> (dmoisset at github)
Changelog¶
- 0.9.6
- Fixed some dependencies declarations to provide support for python 3.5
- Bug fix respect to active learning predictions
- Added support for German preprocess (thanks @sweh)
- 0.9.5
- Bug fix on TokenizerSentencerRunner (thanks ezesalta)
- Fix on installation dependencies
- Tokenization options can be handled from instance settings file
- 0.9.4
- Added multicore preprocess
- Added support for Stanford 3.5.2 preprocess models
- 0.9.3
- Added grammatical parsing to the preprocess flow of documents
- Added support for Spanish preprocess
- Restricted each iepy-instance to a single language
- Gazetter support
- Labeling UI improvements
- Performance and memory usage improvements
- Model simplifications (labels, metadata)
- Storage & view of predictions
- 0.9.2
- Add ability to use custom features (http://iepy.rtfd.org/en/latest/how_to_hack.html#implementing-your-own-features)
- Add ability to use rules as features (http://iepy.rtfd.org/en/latest/how_to_hack.html#using-rules-as-features)
- Add rules verifier (http://iepy.rtfd.org/en/latest/rules_tutorial.html#verifying-your-rules)
- Fixed bugs of compatibility with firefox [thanks dchaplinsky for the bug report]
- Skip instead of crashing when a document could not be loaded via csv importer [thanks dchaplinsky for the report and suggestion]
- Performance improvement on rules runner
- Change instance files schema, now it’s a python package and renamed settings.
- Add lemmatization to the pre-process (http://iepy.rtfd.org/en/latest/preprocess.html#lemmatization)
- Fix critical bug on loading rules
- Fix critical bug on ranking questions on the active learning extraction runner
- 0.9.1
- Add entity kind on the modal dialog
- Change arrows display to be more understandable
- Join skip and don’t know label options
- Change options dropdown for radio buttons
- Show help for shortcuts and change the order of the options
- Documents rich view (without needing to be labeling the document for some relation)
- instance upgrader