User Authentication System Based on Keystroke Timing in E-Learning Settings

Biometrics technology has lately been applied to user authentication systems to achieve high precision. The problem is that such systems require special hardware devices, which results in low installation rates. In our research, the interkeystroke timing of keyboards is adopted as one of the biometric measures. The objective of this paper is to verify the validity of the identification by measuring interkeystroke timings of individual computer users. For better reliability, we decided to use interkeystroke timings of similarly combined alphabets for authentication and further incorporate bi-directional authentication. The evaluation and feasibility of this method are discussed using the results of the experiments. The results suggest that this system is capable of authenticating not only during login but also in the course of general typing (for instance, during e-Learning).


Introduction
In recent years, e-Learning systems have been adopted in many locations around the world.Consequently, there is a high demand for user authentication systems with high precision, in preparation of malicious key-logger software, and personal data leakage.And now, just the simple pair of login account and its password are getting insufficient.This has led to emerging systems that adopt biometrics technology.However, such systems require special hardware devices and prevent users from focusing on learning activities if authentication intervals are made short to block illegal users.To solve these problems, research on authentication by keyboard keystrokes has been undertaken.Authentication by keystrokes takes advantage of the typing habits of individual users.No further hardware is needed, and there is no burden on the user because all of the data are obtained while users type during e-Learning sessions.
What is the strong point of e-learning is so-called "learn anytime & anywhere," flexible to learn remotely.Then, an additional hardware device is not a good option to choose from because installation of such devices to all PCs is not realistic.And one of the weak points is, of course, the possibility of spoofing, in contrast with on-site learning, ending up having to require authentication systems.Another perspective is the recent diversification of questions & quizzes in e-Learning systems.They used to be simple multiple-choice-type or fill-in-type problems, but new types of problems based on keyboard typing, such as free-typing, have emerged in recent years.At the same time, this might enable us to realize constant authentication with keyboard dynamics, not just at the time of log-in.One good example is Coursera [1], one of the leading companies of MOOCs (Massive Open Online Courses), which has started to analyze users' keystrokes for authentication.They allow users to log in sessions after checking their typing timings are similar enough to those of registered for identification (see Fig. 1).They issue a certificate for "signature track" which guarantees that the course taken is authenticated with no spoofing, for users using them for job-hunting and promotional purposes.Hence, our goal is set to build an authentication system taking advantage of keyboard dynamics for not only at the login time, but in the course of general typing.
In this article, let us employ the terminology for measuring the precision of an authentication system as follows: FRR (False Rejected Rate, the percentage that the system doesn't authenticate authentic users), FAR (False Alarm Rate, the percentage that the system authenticates illegal users), and EER (Equal Error Rate, or EER = FRR + FAR).
In previous literature, Kasukawa et al. [2] succeeded in improving the precision of the classic Joyce-Gupta Method [3] by setting a threshold for relative errors of interkeystroke timings between registration and inspection data.Sugiyama et al. [4] proposed a method to achieve high precision even for changes in typing habits by updating registration data in case of long-term typing.Monrose et al. [5] took advantage of both latency and duration ("interkeystroke pressure" data) aiming for a composite authentication.
Gunnetti and Picardi [6] addresses the development of authentication algorithm for free-typing, but this literature differs from us in character in that the intended target is native speakers of English.Boechat et al. [7] investigated on the feature quantities from keystroke timing data, and insisted that average of the timing data be an appropriate parameter and approximately 90% of all the data obtained be used.
In this research, our goal is to propose an authentication method for practical use in terms of precision and the number of keystrokes necessary to authenticate.The method proposed in the aforementioned research [4] is considered to be one of the methods with a higher precision among similar methods.Therefore, updating this algorithm is attempted in the following three respects: 1. Modification of elimination criterion of outlier data 2. Bi-directional authentication 3. Groupings of similar keystroke pairs

Method
First, the details of our baseline algorithm in [4] shall be given.Registration data are preregistered data by authentic users and latency of each key-pair is collected key-pair wise.Key-pair is defined as an ordered pair of two keys typed by users.For instance, when "japan" is typed, keypairs "j-a", "a-p", "p-a", "a-n" are obtained and the difference of the time when Keyup event of the former letter takes place and its counterpart when Keydown event of the latter occurs is each recorded.Inspection data are stored likewise in the format of key-pairs for guessing whose data they belong to.And the procedure is as follows.Firstly, the algorithm compares key-pairs which commonly appear more than 15 times.Prior to the authentication, longest and shortest data of the registration data are 011605-2 JJAP Conf.Proc.(2016) 011605 eliminated as outliers.Then, pairwise authentication is executed, using the following inequality: Avg(registration data) -2 * SD < Avg(inspection data) < Avg(registration data) + 2 * SD, where Avg is the function computing average of the data in the argument, and SD represents the standard deviation of the registration data.If, out of 100 such pairwise authentications, more than 83 times are successful, we stipulate one (regular) authentication is successfully done.Now, each step of the following three updates is explained below: 1. Modification of elimination criterion of outlier data 2. Bi-directional authentication 3. Groupings of similar keystroke pairs

Modification of elimination criterion of outlier data
The first thing is to use a new criterion for data elimination as outliers.Sugiyama [4], as mentioned, removed longest & shortest data obtained as registration data for authentication.This algorithm, however, might remove necessary data as well because selected data do not necessarily differ greatly from other data and therefore are not appropriate as outliers.Our algorithm sets an elimination criterion that removes only statistically deviated values, and regards the data out of range (defined as within average of key-pairs plus minus 2 times of their SD) as outliers.

Bi-directional authentication
The second is bi-directional authentication.The authors, equalizing the status of registration and inspection data, added one more inequality for authentication, by exchanging registration and inspection data, for more rigid authenticity.
In our algorithm, authentication is executed bi-directionally (namely, registration data is also authenticated by inspection data) under the assumption that both should work properly if they belong to the same user, namely, the following inequality is also added as conditions to be authenticated: Avg(inspection data) -2 * SD < Avg(registration data) < Avg(inspection data) + 2 * SD.This mechanism filters those users whose registration and inspection data are not similarly distributed.

Groupings of similar keystroke pairs
The last is the incorporation of similar key-pairs authentication.Our intention to group "similar key-pairs" is to create an opportunity for frequent authentication.Key-pairs that share the same Keyup or Keydown are regarded to belong to the same group.For instance, similar key-pairs of A-B will be A- A, A-C, A-D, …, A-Z, and, B-B, C-B, …, Z-B because either the former or the latter character is shared with that of A-B.Since they are grouped, the frequency of authentication is expected to increase dramatically.At the same time, we are concerned that the precision of authentication will deteriorate because they are not exactly the same keystroke pairs.In addition, computing the relative error rather than the absolute error is expected to help mitigate temporary changes in the typing speed of each user.

Proposed method
With three changes combined, below is our proposed method, explaining the example of A-B as key-pair and A-C as its similar key-pair.First, the system compares latency data of key-pair and its similar key-pair that both appear more than 15 times.For those data, the system checks if the following two inequalities hold in bi-directional ways with the use of similar key-pairs: and Again, we count authentication succeeds if pairwise authentications exceed some certain threshold alpha, which will be determined later.

Experiment 1setting tentative alpha
Nine university students of science majors joined the experiment, and they used PCs they own.The students typed sentences (for approximately 300 keystrokes consisting of only alphabets [small letters]) ten times in two weeks.The interkeystroke timing data summed up to approximately 27,000.By incorporating the data obtained in the experiment, the precision of the previously proposed methods was investigated.The result is given in Table 1 for FRR, FAR, and EER by the methods of [2], [4], and the proposed method.And the best alpha for our proposed method was 75 (out of 100).As a result, the average number of keystrokes till each authentication dropped from 204.22 (by [4]) to 37.80 in our method (average among the subjects).Likewise, their initial authentication required 1,250.36keystrokes (by [4]) while 855.81 by the proposed method (again, average among the subjects).These show that keystrokes till the first authentication were reduced to be around 2/3, and average keystrokes for each authentication became less than 1/5, after keystrokes reached 1,500.

Experiment 2verification of robustness in different settings
The second experiment was conducted to see if the alpha (=75), temporarily determined by the former experiment, works suitably for different types of subjects as well.Fifteen university students majoring in liberal arts joined the experiment, which was conducted during class in the PC room of the university.The students typed approximately 1,500 fixed sentences twice.The interkeystroke timing data summed up to approximately 45,000.By incorporating the data obtained in the experiment, the precision of the previously proposed methods was investigated again.Also, from the data obtained in Experiment 2, the best alpha was chosen (=71 out of 100).Table 2 shows the results of FRR, FAR, and EER again, by the method of [2], [4], and the proposed method.
Ta ble 2. FRR, FAR, and EER by [2], [4], and proposed method with alpha = 75, and proposed method with the best alpha = 71.Furthermore, it was observed that, between alpha = 65 and 75, EERs for our proposed method were less than 3%, showing the stability and validity of the algorithm (see Fig. 2).

Personalization
Lastly, recalling the EER of the proposed method in Experiment 1, which is 19.47%, another calibration was attempted.Allowing personalization of individual subjects, we set alpha for each of them for the first half of keystrokes.Then alpha ranged from 71 to 84.Also, we changed the timing of authentication to some discrete numbers and chose the best frequency, which was 350.Then the EER dropped to 5.18%.See Table 3 for the details.

Conclusion and future plan
In this study, the authors proposed an algorithm for authentication that took advantage of interkeystroke timings with the belief that the timings reflect the typing habits of each user.As a result, our method worked as expected in two respects: improvement of precision and decreasing the number of keys typed until authentication is attempted.In our future work, we would like to consider the interkeystroke pressure, the use of all keyboard characters (currently we use only case-insensitive alphabets), and the separation of the interkeystroke timings by each word.Our next goal is to create a better authentication system by creating a composite of these algorithms.

Figure 1 .
Figure 1.Verification process of learners' identity taking advantage of their keystroke timing patterns (Coursera).

Table 3 .
Results after personalization is attempted.