Although represents a schema of the reference architecture. Note

Although the implementation details of each OBIE system differ from that of others, a reference architecture of such systems has been proposed by Wimalasuriya and Dou cite{first}. This architecture is called the emph{Ontology-based Components for Information Extraction (OBCIE)} architecture. It consists of the union of different components found in existing OBIE systems, and aims to promote its adoption by encouraging re-usability through modularity. Figure
ef{arch} represents a schema of the reference architecture. Note that this schema serves as an ‘idealized view’, meaning that some OBIE systems may slightly deviate from this architecture. It should also be noted that in some implementations, the OBIE system is part of a larger query answering system that makes use of the information extracted by the OBIE system. Figure
ef{arch} shows these external components as well. However, these components should not be recognized as parts of an OBIE system.paragraph{ extbf{OBCIE components}}As illustrated, the textual input of an OBIE system first needs to be converted by a emph{preprocessor} into a format that can be handled by the emph{information extraction module}. For instance, a preprocessor might remove tags from an HTML file and convert it into a pure text file. The information extraction module is the component that performs the actual extraction. This can be implemented using techniques described in Section 4. The extraction process may be performed in a semi-automatic manner, meaning that humans are involved in the process to correct erroneous extractions. No matter what method is used, the extraction process is always guided by an ontology. The ontology that is used by the system may be defined by others or it may be constructed internally by an emph{ontology generator component}. Most OBIE systems that create their own ontology combine that process with a emph{semantic lexicon} for the language of the ontology. A semantic lexicon is a digital dictionary of words labeled with semantic classes cite{lexicon}. This way associations can be made between unknown words. An example is WordNet, a toolkit that is widely used for the English language cite{wordnet}. It groups English words into sets of synonyms and provides semantic relationships between. Additionally, humans may assist in the ontology construction process by manually defining the ontology to be used or by performing manual changes through an emph{ontology editor component}. A common example of such a component is Prot’eg’e cite{protege}. The output of an OBIE system consists of the information extracted from the input text. It can be represented using an ontology definition language, such as OWL. The output might also link to the documents from which the information was extracted, providing justification for the query answering system. In case the OBIE system is part of a query answering system, the output might be stored in a database or knowledge base, enabling the query answering system to answer user queries based on the stored information.