Extracting Personalised Ontology from Data-Intensive Web Application: an HTML Forms-Based Reverse Engineering Approach

Sidi Mohamed Benslimane, Mimoun Malki, Mustapha Kamal Rahmouni, Djamal Benslimane
2007 Informatica  
The advance of the Web has significantly and rapidly changed the way of information organization, sharing and distribution. The next generation of the web, the semantic web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. In this context we try to propose a novel and integrated approach for a semi-automated extraction of ontology-based semantic web from data-intensive web application and thus, make the web content
more » ... able. Our approach is based on the idea that semantics can be extracted by applying a reverse engineering technique on the structures and the instances of HTML-forms which are the most convenient interface to communicate with relational databases on the current data-intensive web application. This semantics is exploited to produce over several steps, a personalised ontology. Key words: semantic web, reverse engineering, ontology, HTML-forms, data-intensive web application. S.M. Benslimane et al. Fig. 2. HTML pages along with HTML-form and HTML-table. Form type: is a structured collection of empty fields that are formatted in a way that permits communication with the database. A particular representation of a form type is called form template that suggests three basic components namely title, captions, and entries. Structural units (SUs): correspond to objects that closely group related fields in a form. Form instance: corresponds to an occurrence of a form type. This is the extensional part that is obtained when a form template is filled in with data. Fig. 2 shows two instances of Booking and flight itinerary forms type. Form field: consists of a caption and its associated entry. Each entry is generally linked to a table's name as per the table names in the underling database. The values that a form field displays/receives are provided by (or stored in) the linked-attribute. Some form fields are computed; others can be simply not linked to the relational database. We distinguish three types of fields: filling fields (e.g., TEXT, CHECKBOX, RADIO, TEXTAREA attributes); selection fields (e.g., SELECT attribute); and link fields (HREF attribute). Underlying source: corresponds to the structure of the relational database (i.e., a relational schema) in terms of relations and attributes along with their data types. Relationship: is a connection between SUs. There are two kinds of relationship: Membership (belongs to) and Reference (refers to). Membership is one-to-many or oneto-one relationship between two SU types. One of the SUs (always the one-side) is called the parent SU, the other (many-side or sometimes also one-side) is called the child SU. An occurrence of a relationship consists of one SU occurrence of the parent and one or several occurrences of the child SU. Reference is a many-to-many relationship between SU types. A SU can refer to one (maybe itself) or to many other SUs. Constraint: is a rule that defines which data validity for a given form field. For in-
doi:10.15388/informatica.2007.191 fatcat:5chgeaahznfltfzokgm5fbl5h4