Home » n. 30 ottobre 2004

A tutoring system based on a hierarchical representation of information

30 ottobre 2004 | Roberto Pirrone (Dip. di Ingegneria Informatica, Università di Palermo; Istituto per il Calcolo e le Reti ad alte prestazioni, CNR Palermo), Massimo Cossentino (Istituto per il Calcolo e le Reti ad alte prestazioni, CNR Palermo) Giovanni Pilato (Istituto per il Calcolo e le Reti ad alte prestazioni, CNR Palermo) Riccardo Rizzo (Istituto per il Calcolo e le Reti ad alte prestazioni, CNR Palermo)

1 Introduction

Last years have been characterized by a growing interest in e-learning technologies, above all in the field of web learning. This work presents a first trial to implement a web tutor, called TutorJ, to help students of a course focused on the Java Programming Language in filling their skill gaps when facing a programming task.
The key idea leading in our implementation is to follow the constructive approach in managing all the material presented to the student, in order to realize a system as flexible as possible. Moreover, a good web-learning system should let the user free to select her/his learning style, and this can be accomplished only through dynamic rearranging of the structure of each learning object inside the pages served by the system.
All these considerations outline some crucial points to keep in mind when implementing a web-learning system:

- The system has to generate personalized learning paths
- The system has to provide a weak structure while creating relations between all the learning objects related to a particular topic (constructivism will make the rest for us)
- The system has to interact strongly with the user for two reasons
a) understanding of the user intentions (what he/she really wants to know)
b) understanding of the user way of learning in order to provide a well suited structure for the single learning object.

In some recent works [1] the authors have proposed an original model for the representation of the course related knowledge. At the lowest level, information is aggregated according to the SCORM (http://www.adlnet.org/) standard.
The intermediate representation is achieved by a concept map that is a Self Organizing Map (SOM) [2] used to cluster documents (or learning objects) using a measure of the similarity between the terms they contain. A concept map is trained in an unsupervised way, and it is labelled with some landmark concepts that are used to bridge the gap with the symbolic level.
Finally, a linguistic representation of the domain ontology is provided, where the landmark concepts play the role of atomic assertions inside the knowledge base. This architecture is able to generate a suitable learning path for the student, joining her/his formative needs, and the learning preferences in a unique representation at the symbolic level.
Effective interaction with the student is needed to isolate the key assertions which, in turn, are useful to define the bodies of knowledge related both to the user’s way of learning and to the skill gap filling.
The main goal of this work is to demonstrate the feasibility of generating personalized learning paths, on the basis of a strong interaction with the user. We focused our attention on the development of a tutoring system rather than an effective e-learning one, because we claim that long term personalized path generation for an entire course cannot be successful, due to the great variety of relations between all the involved topics.
TutorJ is a prototype system, and it is focused only on the crucial implementation aspects of the proposed paradigm: adaptive path generation, and interaction with the user. The first one is addressed arranging information only at the conceptual and symbolic level, while the chatter bot technology is used to perform interaction in natural language between the system, and the user. The rest of the paper is arranged as follows. In section 2 a complete description of the TutorJ architecture is provided. Section 3 deals with the reference scenario, and a description of some experimental tests. Finally, section 4 reports a discussion of the possible developments of the system.

2 The TutorJ Architecture
The whole TutorJ architecture is reported in fig. 1.

The user interacts with a A.L.I.C.E. chatter bot (Artificial Linguistic Internet Computer Entity) using natural language to speak about Java language. The chatter bot has a dialog repository written in AIML (Artificial Intelligence Markup Language) an XML language designed for creating stimulus-response chat robots. A.L.I.C.E. performs pattern matching on the dialog phrases to isolate those terms corresponding to concepts in the ontology owned by the knowledge management module.
The ontology is implemented in the Cyc knowledge base and a planning module is used to browse it by means of some predicates between concepts (atomic terms) and arguments (Cyc collections of concepts). These predicates are used to model the existence of prerequisite topics with respect to the one the user wants to know more in detail, and to link it to all the related information.
The planning module acts also on the Search module, which is essentially the concept map. Here, the regions corresponding to the arguments selected by the planner are highlighted, and they’re presented to the user using a nice GUI. Clicking on a region inside the map representation enables browsing all documents that have been clustered in the corresponding region of the concept map.

2.1 Chatter bots
In traditional e-learning systems, navigating through huge amounts of data to get the desired information can require a time-consuming series of keyboard entries and mouse clicks. The goal is to develop a more efficient and flexible instrument for e-learning which uses natural language. In order to make it possible, a conversational interface, known also as chat-robot, is needed. A conversational interface enables humans to communicate with computers for creating, accessing, managing information to solve problems.
The A.L.I.C.E. chatter bot implements a dialog based on alternating a question and an answer, i.e. a stimulus-response pair, called pattern and template respectively. The patterns are stored in a tree structure managed by an object called the Graphmaster, implementing a pattern storage and matching algorithm. We have trained a chat-robot for managing information about various topics about the Java Programming Language. The main concepts of the ontology are reported as possible stimulus in the AIML code. These terms are then extracted from the log of the interaction between the user and the chat-robot, and are feeded into the inferential engine of the ontological level of the architecture. An example of dialog, and a screenshot of the interface are illustrated in fig 2.

2.2 Concept Maps
The concept map implemented in the Search module is based on a two dimensional SOM network it is possible to obtain a map of input space where units represents document collections, so it is necessary to use a vector document representation to obtain the training vectors. The classical Term Frequency-Inverse Document Frequency (TF-IDF) has been used to obtain the Vector Space Representation (VSR), a document encoding based on statistical considerations. Using the VSR each document in a document collection is represented by using a vector where each component corresponds to a different word. The component value depends on the frequency of occurrence of the word in the document weighted by the frequency of occurrence in the whole set of documents.
The landmarks are not lessons or slides but key concepts that where ad-hoc chosen in order to represent significant landmarks over the concept space. Moreover, these landmarks are not supposed to be useful for the student: a single lesson can cluster many concepts, probably for the lessons scheduling problem. Landmarks are used to allow the ontology to deal with concepts, and to organize the concepts sequencing needed to obtain the learning path. A possible navigation through the concept map is illustrated in fig.3.

2.3 Knowledge management and planning
The TutorJ knowledge management module is based on the the Cyc knowledge base [3]. In the proposed approach each concept is mapped onto a Cyc collection that expresses the topic of the slides about the argument treated. The basic concepts (i.e. the landmarks of the SOM) are organized as mono-thematic collections. Hence we have the basic element of the e-learning ontology that is a concept, which is a collection of slides, mapped in the SOM Concept map.
The concepts which are strictly related as belonging to a given topic are grouped into Cyc collections, named arguments, that can be themselves organized in higher level arguments collections. All the concepts belonging to a given argument are, of course, conceptually related, but also a concept Conceptk belonging to an argument Argi can be conceptually related to a concept belonging to another concept Conceptj of the argument Argl or to a whole argument Argm.
The Cyc inferential engine has been adopted as planning module. It browses the KB by means of some ad-hoc predicates. In the proposed solution a concept can be related to another one or to an argument and an argument can be related to another one or to a specific concept belonging to a given argument using the following binary predicates:
1. isPrerequisiteFor
2. conceptuallyRelated

The first one was not present in Cyc, and it is used to generate sequencing between arguments and/or concepts. The other one is yet present in the knowledge base, and expresses a relation with respect to the meaning of two concepts.
The ontology inferential engine draws a path between concepts and or arguments, starting from the terms derived from the dialog between the user, and the chat-robot . The concepts sequence thus obtained is used by the map renderer to highlight the document clusters matching with the user request.

3 Experimental Scenario
The reference scenario is a tutoring system for the Java Programming Language. When facing a programming task, a student spends much of her/his time in browsing the Java classes library, finding the best one to accomplish the duty.
Another typical problem arises when the student is learning about one topic (e.g. 2D graphics) but any meaningful code example has to include unknown topics that will be covered in subsequent lessons (e.g. GUI event handling for the window used to display drawings). Even in this case the learner has to know quickly how the involved classes work: at least the default behaviour is needed.
TutorJ has a backbone course about Java, that is arranged in form of slides, but it is enriched by all the javadoc HTML pages of the class library. Each javadoc page is clustered to a slide lesson on the basis of the landmarks defined in the concept map. A session starts chatting with the A.L.I.C.E. bot. The bot tries to guide the user to explain her/his needs, and the actual level of comprehension of the topic.
Logs from the dialog are used to extract concepts for the ontology, and the inferential engine uses them to perform a possible learning path. The path is not imposed to the user: it is presented as a set of highlighted regions in the pictorial representation of the concept map, and the order of arguments discovering is suggested also in a graphical way. Clicking on a point inside a region (a cluster in the map) the user can access to the lesson or to a related javadoc file.

4 Conclusions and Future Work
A simple implementation of a web-based tutoring system has been presented, that addresses the main points to keep in mind in development such a system: personalized path generation, and strong interaction with the user. The Java scenario is only a mere implementation choice, but the approach we presented is general.
We are convinced that the key to accomplish these tasks is in natural language processing joined with AI techniques to reason about the outcomes of the dialog. As a consequence, a simple NLP tool has been adopted (the chatter bot) and it has been coupled with a flexible scheme to represent course information at various levels of abstraction, and to allow planning of the learning path.
Future work is oriented to refine the system in several ways. The information representation scheme will be completed implementing also a SCORM compliant content level: in this way the concept map will be trained directly on the SCORM tags, thus allowing a finer correspondence between contents and landmark concepts. Multi resolution approaches can be also thought, where clusters are made for the general categories and the sub-categories obtained from SCORM annotations.
Another interesting direction of investigation is the extension of the chatter bot role: in this scenario the bot doesn’t perform simple pattern matching on phrases, but it is able to instantiate terms directly in the KB, and to enable reasoning to select the best effective response to provide for continuing the dialog.

1. Pirrone, R., Cossentino, M., Pilato, G., Rizzo, R.: Concept maps and course ontology: a multi-level approach to e-learning. In: Proc. of II AI*IA Workshop on AI and E-learning, Pisa, Italy (2003)
2. Kohonen, T.: Self-Organizing Maps. Springer, Berlin, Heidelberg (1995)
3. Reed, S.L., Lenat, D.B.: Mapping Ontologies into Cyc, http://citeseer.nj.nec.com/509238.html

<< Indietro Avanti >>