Score
The idea of Score is to be a system for document retrieval using Formal Concept Analysis (FCA) in
a web context. Its basic parts are:
- Indexer
- An indexer scans a number of documents and finds attributes to describe them, typically keywords.
- Database
- A relational database stores the references to the documents, the attributes and their relation.
- FCA Module
- An FCA module creates a formal concept lattice from the database and offers a query API.
- Frontend
- A Java Servlet will offer the query features to the user.
The different subsystems shall use ODBC and CORBA to communicate, the database structure will be
simple but extensible, so the code we create should be easily reusable. The usage of ODBC and CORBA
ensures interoperability between platforms, programming languages and networks.
Targeted applications are:
- Ontology guided document retrieval
- A set of documents with a specific topic can be indexed using an ontology system and specific documents can be retrieved.
- Mailing list archives
- Mails in a mailing list archive can be queried for keywords that are generated from the header information and using idf (inverse
document frequency).
There are a number of advantages of this approach compared to classic retrieval systems:
- We can avoid empty result sets
- We can offer refinements that are always true refinements (including a ranking for their efficiency)
- We have implicit document clusters (the formal concepts) with a natural distance relation, thus we can give result sets with
documents that match the query not exactly but nearly, including a ranking on them.
Status: At the moment this project is on hold. One day we might get back to it, but
at the moment there are other things we consider more important for our research interests. Check out
our personal document management tool Docco which picks up similar interests in
a smaller context.
|