Приложения Приложение A формат записи важнейших таблиц БД на диск при резервном копировании.
Приложение A.1 Форма записи резервной копии таблицы с итоговыми результатами тестирований.
anonymous_tests
a_testID
s_testID
test_job
2774
3130
323837
3c3f786d6c207665 <…> 2f746573743e
|
3131
323838
3c3f786d6c20766572 <…> 73743e
|
<…>
Приложение A.2 Форма записи резервной копии таблицы с описанием включённых в систему тестов.
questionnairies
questionnaireID
s_testID
server
questionnaire_name
ip
Domain
questionnaire_job
Datetime
117
3133
30
656e7472616e74732e616d2e7470752e7275
656e7472616e74735f7175657374696f6e6e61697265
3130392e3132332e3134352e3335
67727970686f6e352e616d2e7470752e7275
3c3f78 <…> 6d6c20
323031332d30352d33312032303a35393a3231
|
3134
30
6d756c7465732e616d2e7470752e7275
656e7472616e74735f7175657374696f6e6e61697265
3130392e3132332e3134352e3335
67727970686f6e352e616d2e7470752e7275
3c3f <…> 786
323031332d30362d31372031343a35353a3231
|
<…>
Приложение A.3 Форма записи резервной копии таблицы со статистикой обращений к тестам.
stat_tests
testID
server
test_name
ip
Domain
is_saved
start_time
end_time
3284
38
6d742e616d2e7470752e7275
657973656e636b
3130392e3132332e3134352e3335
67727970686f6e352e616d2e7470752e7275
30
323031332d30342d32332031383a33353a3034
|
39
6d756c7465732e616d2e7470752e7275
686f6c6c616e64
3130392e3132332e3134352e3335
67727970686f6e352e616d2e7470752e7275
30
323031332d30342d32332031383a33353a3237
|
<…>
Приложение Б Фрагмент описания библиотеки кластерного анализа в Orange (Python)
Basic functionality
------------------- .. autofunction:: clustering .. class:: HierarchicalClustering
.. attribute:: linkage
Specifies the linkage method, which can be either. Default is
:obj:`SINGLE`. .. attribute:: overwrite_matrix If True (default is False), the algorithm will save memory
by working on the original distance matrix, destroying it in
the process. .. attribute:: progress_callback
A callback function (None by default), which will be called 101 times.
The function only gets called if the number of objects is at least 1000.
.. method:: __call__(matrix)
Return an instance of HierarchicalCluster representing
the root of the hierarchy (instance of :class:`HierarchicalCluster`). The distance matrix has to contain no negative elements, as
this helps the algorithm to run faster. The elements on the
diagonal are ignored. The method works in approximately O(n2)
time (with the worst case O(n3)). :param matrix: A distance matrix to perform the clustering on.
:type matrix: :class:`Orange.misc.SymMatrix` .. rubric:: Linkage methods .. data:: SINGLE Distance between groups is defined as the distance between the closest
pair of objects, one from each group. .. data:: AVERAGE Distance between two clusters is defined as the average of distances
between all pairs of objects, where each pair is made up of one
object from each group. .. data:: COMPLETE Distance between groups is defined as the distance between the most
distant pair of objects, one from each group. Complete linkage is
also called farthest neighbor. .. data:: WARD Ward's distance.
Drawing
-------------- .. autofunction:: dendrogram_draw(file, cluster, attr_cluster=None, labels=None, data=None, width=None, height=None, tree_height=None, heatmap_width=None, text_width=None, spacing=2, cluster_colors={}, color_palette=ColorPalette([(255, 0, 0), (0, 255, 0)]), maxv=None, minv=None, gamma=None, format=None) .. rubric:: Example The following scripts clusters a subset of 20 instances from the Iris data set.
The leaves are labelled with the class value. .. literalinclude:: code/hierarchical-draw.py
:lines: 1-8 The resulting dendrogram is shown below. .. image:: files/hclust-dendrogram.png The following code, that produces the dendrogram below, also colors the
three topmost branches and represents attribute values with a custom color
schema, (spanning red - black - green with custom gamma minv and maxv). .. literalinclude:: code/hierarchical-draw.py
:lines: 10-16 .. image:: files/hclust-colored-dendrogram.png
Cluster analysis
----------------- .. autofunction:: cluster_to_list
.. autofunction:: top_clusters
.. autofunction:: top_cluster_membership
.. autofunction:: order_leaves
.. autofunction:: postorder
.. autofunction:: preorder
.. autofunction:: prune
.. autofunction:: pruned
.. autofunction:: clone
.. autofunction:: cluster_depths
.. autofunction:: cophenetic_distances
.. autofunction:: cophenetic_correlation
.. autofunction:: joining_cluster
HierarchicalCluster hierarchy
----------------------------- Results of clustering are stored in a hierarchy of
:obj:`HierarchicalCluster` objects. .. class:: HierarchicalCluster A node in the clustering tree, as returned by
:obj:`HierarchicalClustering`. .. attribute:: branches
A list of sub-clusters (:class:`HierarchicalCluster` instances). If this
is a leaf node this attribute is `None`
.. attribute:: left
The left sub-cluster (defined only if there are only two branches).
Same as ``branches[0]``.
.. attribute:: right
The right sub-cluster (defined only if there are only two branches).
Same as ``branches[1]``.
.. attribute:: height
Height of the cluster (distance between the sub-clusters).
.. attribute:: mapping
A list of indices to the original distance matrix. It is the same
for all clusters in the hierarchy - it simply represents the indices
ordered according to the clustering.
.. attribute:: mapping.objects A sequence describing objects - an :obj:`Orange.data.Table`, a
list of instance, a list of features (when clustering features),
or even a string of the same length as the number of elements.
If objects are given, the cluster's elements, as got by indexing
or interacion, are not indices but corresponding objects. It we
put an :obj:`Orange.data.Table` into objects, ``root.left[-1]``
is the last instance of the first left cluster. .. attribute:: first
.. attribute:: last
``first`` and ``last`` are indices into the elements of ``mapping`` that
belong to that cluster. .. method:: __len__()
Asking for the length of the cluster gives the number of the objects
belonging to it. This equals ``last - first``.
.. method:: __getitem__(index)
By indexing the cluster we address its elements; these are either
indices or objects.
For instance cluster[2] gives the third element of the cluster, and
list(cluster) will return the cluster elements as a list. The cluster
elements are read-only.
.. method:: swap()
Swaps the ``left`` and the ``right`` subcluster; it will
report an error when the cluster has more than two subclusters. This
function changes the mapping and first and last of all clusters below
this one and thus needs O(len(cluster)) time.
.. method:: permute(permutation)
Permutes the subclusters. Permutation gives the order in which the
subclusters will be arranged. As for swap, this function changes the
mapping and first and last of all clusters below this one. Subclusters are ordered so that ``cluster.left.last`` always equals
``cluster.right.first`` or, in general, ``cluster.branches[i].last``
equals ``cluster.branches[i+1].first``. Swapping and permutation change the order of
elements in ``branches``, permute the corresponding regions in
:obj:`~HierarchicalCluster.mapping` and adjust the ``first`` and ``last``
for all the clusters below.
Приложение В
Английский язык
Студент
|
|
|
| 8Б00
| Романчуков Сергей Викторович
|
|
| Группа
| ФИО
| Подпись
| Дата
|
INTRODUCTION
Psychological research and services, including being organized at the higher education institutions play an important role in ensuring the well-being of individuals and society. That universities are interested in a wide range of such studies - from career guidance tests and evaluation of potential applicant’s abilities to learning material quality assurance, qualification of teaching staff and the general level of knowledge and student’s satisfaction about studies. One means of conducting such tests are network resources, specialized as well as multifunctional. Anyway, in addition to collecting information those resources should enable the accumulation of data about the participants, the results and the process of passing the tests and processing of available information in order to create or adjust strategies for the future conduct of the university or individual units.
Tomsk Polytechnic University, as well as most other universities, has a unit dealing with vocational guidance of applicants and decision support in occupational choice. Laboratory of Information Technologies in the social and medical research is designed and maintained multifunctionaltest portal «MultiTest», implemented a number of test methods in the art, some of which are available directly from the web site of TPU preliminary training center. During operation of of these applications has been accumulated extensive material requiring further processing. In the relevant databases are stored as raw data on the process and results of testing as well test results. Existing functionality allows to make adjustments to the test applications, emulate carrying out testing to debugging such modifications, extract from the database information about any specific tests, but, unfortunately, does not provide a simple to handle set of tools for processing large amounts of accumulated results. Obvious need to create a subsystem, giving analysts and psychologists possibility to work with information without programming skills (psychologist or a mathematician have no obligation to have this skills), and, in addition, extends the capabilities of possible feedback and interaction between a psychologist and a team of programmers, created and supported by the resource.
Aim is to develop the analytical module for multifunction web portal «MultiTest», on which subsystem for organizing on-line psychological testing for different user groups, including TPU students, is currently implemented.
For achieving this purpose it is necessary to solve a number of problems:
- to determine the desired functionality of the module and the boundaries of authority for users with different access rights;
- to examine methods of multivariate statistical analysis;
- to choose data mining methods;
- to develop and implement a program and to include analysis module in the «MultiTest» portal;
- to test module and evaluate the processing capabilities of existing data;
- on the basis of test results to draw conclusions about the effectiveness of the subsystem implementation, further development possibilities and possible problems addressing methods identified during the project. PROBLEM STATEMENT
Any research or practical problem in psychology is primarily exposed to psychological interpretations, allowing the transition from theoretical views to operationally defined concepts and empirical procedures. This allows mathematical interpretation, which selects and implements mathematical methods. The obtained result should be substantively interpreted, ie, mathematical and psychological interpretation of significance levels, the approximate dependencies, etc. At this stage, the problem is solved and it allows to either move on to another, or clarify and repeat the previous study. Such is the logic of action in the application of mathematics in psychology and other sciences.
Early identification of abilities and aptitudes of pupils and students in order to provide effective assistance in occupational choice seems not only relevant, but also promising. Problem of this study is the need to improve and deepen the career guidance to potential students of the university by expanding presence of the psychological services on the Internet by providing them with additional features.
It is necessary to develop analytical methods for the implementation of the module for statistical processing and also deep analysis of the received information on the basis of already existing in the TPU software system, expanding the functional of «MultiTest» web portal, devoted to the implementation of socio-psychological research, including professional orientation tests for potential applicants.
In this case, the developed subsystem’s scope of application is indicators processing for members of online testing, including anonymous, accumulated in the portal’s database. With each of them were carried out psychological tests in order to determine personality traits and abilities that are relevant to decision support systems for choosing a career. It is supposed to use this module to empower employees of psychological service when analyzing the results of testing, forecasting and, more importantly, strengthening feedback to testing software developers.
Experimental data characterization
As background information, the system has data about test results, including:
- personal data of applicant (if the test was not anonymous);
- information about the connection (IP-address, region, use the browser, etc.);
- all answers to test questions;
- test results (proposed by the profession and the extent to which the entrant);
- time spent for the test.
Database tables also present information about the structure of the tests and some additional information. Keeping all the listed data is provided by the DBMS MySQL, using XML markup language for heterogeneous data recording.
CONCLUSION
As a result of this work methods of multivariate data analysis methods of data mining, including cluster analysis methods were studied. At this time, methods which can be used to solve this problem have been studied. Model the behavior of potential users have been developed, the complex requirements for analysis module on multifunctional web portal «MultiTest» were made. On the basis of suitable methods of computer implementation was selected software, suitable for this module. Open source software products for data mining were considered. Final software implementation process is currently ongoing and far from complete, but it has already provided some fairly significant results. For example, one of the failures in the database with the results of of experiments was found during development, also some data discrepancies were noted in the reports, provided by the individual test applications, in comparison with internal standards, common to all parts of the portal. |