Building ontologies for the semantic web is from a technological point of view (meaning to put all available data into one ontology) nothing great. But there are a few critical challenges to build a really good, useful ontology. I could gather some experiences in a small project at my university, where we (the students) have tried to build an ontology from scratch by ourselves to gain some insights for the problems and challenges on the business side. The conclusion is, that there are mainly two, interlinked "obstacles" on the way to a successful ontology.
Define a domain
As already mentioned in the abstract, an ontology has to be concrete enough to provide some value for the user, which means essentially that it allows the users to gain on the one side a rough overview of a topic but on the other side to dig deeper into special aspects of a domain.
This then is clearly an optimisation problem, where the designer of an ontology has to decide for his individual case how wide and deep the domain should be outlined. Thanks God, has been thought about this problem when the OWL-Standard has been created. A designer can link to already existing ontologies (which can be found for example with Swoogle).
I would like to illustrate this with a simple example: Think about the domain of "Travel / Travelling". There are a lot of things involved in this domain. When somebody plans a journey, he or she first searches a destination. To get to the desired destination this person needs transportation. Transportation can happen by train, plane, car, bus and so on... Then you need an accomodation, want to make some sightseeing and go out to a restaurant for dinner and many other things more.
This rather incomplete example shows how many concepts can be found in one domain. And all this concepts, if they already exist as ontology, can be used from your own ontology. That is great, isn't it?
The second good news is, that the most information about a domain, is already available on the internet. Some may talk there about the Deep Web and thus discriminate between the "Surface Web" and the underlying databases / data storages. But to put this short: what can be found on the internet can be crawled (Google is maybe the best example for this). So there is already a possibility to gather the data in an automated way. Which saves time and thus costs.
But what you never should forget during your project, is the focus of your domain. How wide should it be and how deep should it go? To answer this question correctly you should always think about your future customer/user of the ontology. And you can maybe test this by yourself.
Getting the data
As I have already mentioned in the section above: getting the data is another, if not the main challenge during the process of building an ontology. We know now, that there is data out there in the internet and we can use it, to build our purpose-oriented ontology.
Getting data into the ontology is not a cheap task, as we had to experience when building our own file. The problem is, that OWL relies on some object-relational constructs, which have to be built up first. The most simple case of such a relation is the super-/subclass relation. Everything in an OWL file is a subclass of "Thing". You can then go further and create some more classes. There are many other relationships which can be defined, as well as properties on a class. To define the most basic classes, is in most cases, when the domain is clear, an easy thing.
The second step is then to fill up the ontology with individuals, which is much more time consuming. From a business perspective you need to automate this task as much as you can. This will maybe in the future be a really easy thing, when there are enough ontologies out there to use. But for now, there are not so much ontologies out there and many of the existing try to cover too many aspects of a domain and do not provide a good value in the sense of enough specificity. This then leads to the vicious circle, you have poor quality ontologies, which are not used because they do not provide any value and can also not be used to build new ontologies.
We have then decided to use a two step implementation. In a first step we have used the well known crawler techniques to crawl existing sources in the internet for the individuals we need. The results can the be mapped against existing classes and build a simple ontology, consisting only of super- and subclasses. This makes it much easier for the later automated analysis of bigger data volumes to classify them. Through iterative crawling multiple sources can be covered and the ontology can so be tested against various existing models of a domain out there in the web space. This enhances the quality and ensures that the ontology may provide value.
The next step in building a successful special-purpose ontology is the automatic gathering of vast amounts of data from the web, which can be classified programmatically with the help of the before mentioned "dictionary" file with the most basic relations.
Thus, having this two step implementation saves some time, since the manual work to be done is reduced to a minimum. The more dictionaries one has, the more can the manual work be reduced. And the really big task is automated, which saves time and thus money. From my point of view, this is at the moment the only way to provide proofed quality within ontologies to affordable costs and break the mentioned vicious circle.

Leave a comment