Why are There Differences between Student Performance on Term Tests and External Achievement Tests? (Course name: What does a test measure?)

Objective : Previous studies suggest that there are students who obtain high scores on term tests but not on external achievement tests ( external tests, hereafter ) and that differential performances across term and external tests are affected by students’ learning strategies, motivation for learning, and test belief ( i.e., the beliefs a test taker has about a test ) . This study examined ( a ) to what extent students perform well only on term tests and ( b ) how differences in scores between term and external tests are related to motivation for learning and test belief. Methods and Materials : We asked 134 first-year medical students at Juntendo University to complete two questionnaires. We inquired about their motivation for learning and test belief and about whether they obtained higher scores on term tests or external tests, in general across all subjects, and specifically on each of five subjects: Japanese, English, math, science, and social studies. Results : The results suggest that 41% of students gained high scores on term tests only when students gave a general response without regard to any specific subject, with 37–59% across the five subjects. Further, we did not find any differences between different groups in relation to motivation and test belief. Conclusions : The results suggest the importance of investigating differences between term and external test scores and the factors behind them.


Introduction
When we were in high school, we observed that many people found studying for tests stressful. Some people were probably able to achieve high scores on both term tests (teiki tests) and external achievement tests (jitsuryoku tests, such as practice exams administered under conditions similar to a real exam administered by external test institutions; external tests, hereafter). However, others would say that they were good at term tests but not good at external tests, and still others would state the opposite. We examine whether this observation holds true across subjects and the factors behind it.

Literature review
Although term tests are important for high school students, they have not been given sufficient critical attention 1) . The same seems to apply to differences in test scores and performance between term and external tests. Matsunuma 2) stated that some students receive relatively high scores on term tests but low scores on external tests in the subject of English because English passages on term tests are usually the same as those in the textbook, and students are likely to obtain high scores even when they do not understand the essence of the English language. Saito 1) further stated that similar phenomena can be observed in other subjects such as math and Japanese. However, he stated that these observations are not based on empirical studies, emphasizing the need for such research.
If there are differences between academic performances in terms of term and external tests, the next question would be why they differ. The relevant literature suggests many possible test-related, affective, and cognitive factors. For example, Fukazawa 3) listed three factors: 1) differences in test content (e.g., speaking is included on English term tests, but not on English external tests); 2) areas of test coverage (e.g., term tests tend to cover narrow areas of the textbook such as select units, whereas external tests cover all areas or wide areas, based on what students have learned so far); and 3) students' affective states (e.g., more anxious during external tests). Another concept that seems related is adaptation to the test, in which learners change their studying and testtaking actions 4) and teachers change their teaching behaviors based on their perceptions and beliefs about test purposes or patterns [5][6][7][8] . Differences between term and external test scores are also affected by students' (a) learning strategies, (b) motivation for learning, and (c) test belief (i.e., the beliefs a test taker has about a test) 1,9) , all of which change students' learning and test-taking behaviors 2, 10) . The following review will cover the three student factors (a) to (c) because of their particular importance and relevance to term tests 1) .
Learning strategies have been shown to affect differences between term and external test scores 2) . Matsunuma 2) examined why some students who achieve high scores on English term tests score low on external English tests and reported that learning strategies and learning time had different effects on the two types of tests. Term test scores were affected by the memorization strategy employed and the amount of time spent before term tests. External tests were affected by the time spent on daily learning. Both tests were affected by the strategy of focusing on grammatical points and structures. Matsunuma inferred that different strategy use was observed because term tests, in contrast to external tests, use the same materials for learning and testing and thereby induce superficial learning; high scores do not lead to increases in knowledge, skills or to higher test scores on external tests.
Fujisawa [5][6][7][8] classified learning strategies into two types: (a) strategies to achieve the learning goals of each subject, namely orthodox learning (seitoha-no-gakushu); and (b) strategies to achieve high scores temporarily by mechanically memorizing test contents and procedures (i.e., cramming), namely fake learning or fake study (gomakashi-benkyo). He argued that the two strategies are encouraged by teachers' behaviors. Orthodox learning tends to be prompted by teachers who support students' understanding and stimulate their intellectual curiosities by providing interesting, advanced programs and by presenting content in test questions that is not covered in class; further examples of teacher behavior observed in each subject include using dictionaries in English and Japanese, thinking of multiple solutions in math, conducting experiments in science, and using supplementary resources in social studies. On the other hand, fake study tends to be triggered by teachers who focus on a limited range of learning and test content, support study methods that conserve students' power (e.g., by providing materials to memorize, instead of having students summarize), and recommend rote memorization. Fujisawa 5,6) examined how teachers at junior and senior high schools encouraged the two types of learning behaviors to study and prepare for term tests and argued that fake study may lead to high scores on term tests but not on external tests that require orthodox learning.
Orthodox learning and fake study seem to invoke different types of learning processing: deep processing and surface processing 2,10) . Deep processing involves aiming to understand content and involves considering the relations and connections among the content as well as knowledge that they have already learned 10) . Surface processing aims to memorize content and involves simple and repeated actions 10) without understanding the meaning of the content 1) . The depth of processing affects how students learn and their eventual academic performance. Thus, those who practice fake study with surface processing are likely to study to gain high scores on term tests, but their learning is not usually transformed into an increase in their knowledge and skills, and they obtain low scores on external tests. Contrarily, those who practice orthodox learning with deep processing can gain high scores on external tests (and probably entrance exams as well). In this case, they are likely to perform well on both types of tests 1) ; however, they sometimes perform poorly on term tests because of lack of attention during school lessons or to teachers' instructions.
Saito suggested the importance of considering learning motivation and test belief as well as learning strategies to understand students' test performance, calling for empirical studies 1) . Motivation for learning can be classified in various ways 1,2,11) . Asano 11) divided learning motivation into five orientations: orientation related to human interactions, self-improvement orientation, orientation related to occupation or specialty, task orientation related to experience, and special task orientation. Test belief 9) can be categorized into four aspects: test belief related to improvement, guidance, comparison, and compulsion. Improvement and guidance are generally positive test beliefs, whereas comparison and compulsion are negative ones. Suzuki argued that to improve motivation for learning and promote the use of proper learning strategies, it is important to possess strong positive test beliefs and weak negative ones 9) .
Despite interests in relationships between test types and students' personal variables, the lack of relevant studies is obvious. Previous studies have not directly examined whether differences between scores on term tests and external tests are prevalent or to what degree there are students who are good at term tests only (Term Group), external tests only (External Group), and both tests (Both Group). We particularly focus on students in the Term Group because of the possible serious personal and societal consequences detailed in Fujisawa 8) : students' failure to gain important knowledge and skills, schools' failure to achieve educational goals, wasted energy spent on fake study, lowered motivation for learning, lowered value of learning, students' failure to become autonomous learners, potential to employ cheating attitudes and behaviors in later life and in the workplace, and passing such attitudes on to the next generation.
We posed the following research questions to examine this under-researched topic: Question 1: To what extent do students perform well on term tests but not on external tests? Question 2: How are differences in scores between term and external tests related to learning motivation and test belief?

Participants and data collection
We asked 134 first-year medical students at Juntendo University to complete two questionnaires using Google Forms. The response rates for Questionnaires 1 and 2 were 64% (86/134) and 58% (78/134), respectively.

Survey
We used two questionnaires. Questionnaire 1 asked on which test the respondent achieved higher scores overall (1 question: referred to below as Question A), and how much each question matched the respondent's style regarding motivation for learning (5 questions) and test belief (6 questions). Questionnaire 2 asked on which test a student achieved higher scores for each of five subjects, Japanese, English, math, science, or social studies; such scores are predicted to be related to learning strategies (5 questions). We asked them which test they performed better on instead of using their actual test scores, as such scores would be difficult to obtain. We asked respondents to complete the two questionnaires during the same period and to provide answers while recalling their times as a third-year high school student. The questionnaire items were presented in Japanese; the following questions shown in this article were translated by the authors.
In Questionnaire 1, we asked respondents to rate the following questions on a 5-point Likert scale (1 being "this never applies to me" and 5 being "this applies to me very much"). We selected five questions out of 25 from Asano 11) , because the five questions appeared to be most relevant to our study. We interpreted the results as learning motivations regarding why they studied. In relation to differences across subjects related to our first research question, we asked the following question five times in Questionnaire 2, once for each subject (Japanese, mathematics, English, science, and social studies, in this order): Did you perform better at term tests than external tests, perform similarly on both types of tests, or perform better on external tests than terms tests?

Analysis
First, we divided respondents into three groups using the responses to Question A: On which test did you achieve higher scores overall when you were a third-year high school student, term tests or external tests (e.g., mogi shiken or practice exams administered under conditions similar to a real exam)? The options were "Term tests," "I got similar scores on both tests," and "External tests." Those who selected the first, second, and third options were categorized as the Term Group, Both Group, and External Group, respectively. Second, we similarly grouped responses to Questionnaire 2. Third, we calculated means and standard deviations (SDs) for the three groups, based on Question A, of each question for learning motivation and test belief. As we did not intend to generalize our results to the populations of students, we did not use statistical significance tests such as one-way analysis of variance (ANOVA).

Percentages of the three groups
We examined to what degree students belong to the three groups: Term Group, Both Group, and External Group. As seen in the pie charts in Figure 1, when students gave a general response without regard to any specific subject (n = 86), the Term Group included 41% of students, the Both Group 35%, and the External Group 24%. The results showed that the Term Group included the highest number of students, followed by the Both Group and then the External Group. Results across subjects (n = 78) showed that this order was the same as in the overall result: for example, in Japanese, 44% were in the Term Group, 29% were in the Both Group, and 27% were in the External Group. However, there were slight differences across subjects; these are explored in particular using previous studies on learning strategies.
Of the five subjects, social studies had the highest number of students in the Term Group (59%), followed by science (55%), math (49%), Japanese (44%), and English (37%). These results are partially consistent with those in Fujisawa 7) , which asked university students in science majors to select which type of learning they conducted for each subject in junior high schools. He reported that the percentage of those who practiced fake study was highest in social studies (approximately 40%), followed by Japanese (30%), science (25%), math (15%), and English (10%). The order was the same between the current study and Fujisawa 7) ; the highest was in social studies and the lowest in English. Fujisawa 7) suggested that if social studies require memorization of knowledge in each unit, with such knowledge not used across units or accumulated, fake study may likely take place, especially when students do not take this subject's entrance exam. In terms of English, he speculated that the subject requires accumulation of knowledge and skills, which may lead students to practice orthodox learning. These reasons seem to accord well with the participants in the current study. Social sciences was not always required to enter this medical school; the majority of students passed the general-type entrance exam, which requires candidates to take English, math, and two science subjects out of three (i.e., physics, biology, and chemistry), not social studies or Japanese. Those who passed this type of exam may not have felt the need to diligently study social studies to obtain a high ability to prepare for external tests and entrance exams, thus resorting to fake study, using surface processing to achieve adequate scores on term tests but not on external tests. In the case of English, all students needed to obtain high scores in difficult English entrance exams, and those who passed likely study English seriously, using orthodox learning with deep processing. The lowest percentage of the Term Group in the current study was not in line with Matsunuma 2) ; he argued that scores on term and external English tests tend to diverge because term tests tend to contain the same passages learned in class, and students may achieve high scores without understanding the tested aspects. However, this situation may not always hold true currently (as also acknowledged by Matsunuma 2) ), because some schools do not use the same passages in teaching and testing or they provide questions that measure students' under-standing even when the same passages are used, based on recommendations from testing literature 12) .

Motivation for learning
As shown in Table 1, there were no clear differences in Motivation questions 1-5 across the three groups. For example, in Motivation question 1 (MQ1: I study to enrich human relationships), the Term Group, Both Group, and External Group received similar means (M = 2.2, 2.7, and 2.4, respectively). Thus, regardless of which test score was better, degrees of motivation for learning were found to be similar across the motivation types and the groups.

Test belief
Although we expected relationships between test belief and term/external tests, there were no substantial differences among the groups as seen in Table 2. There was an instance of similar means in Test belief questionnaire 1 (TBQ2): 3.9, 3.8, and 4.3 in the Term Group, Both Group, and External Group, respectively.

Conclusion
We posed two research questions concerning (a) the extent to which students perform well only on term tests and (b) how differences in scores between term and external tests are related to motivation for learning and test belief.
For the first research question, we found that a substantial percentage (41%) of students belonged to the Term Group when students gave a general response without regard to any specific subject. This group was the largest of the three groups (Term Group, Both Group, and External Group) across five subjects, although the percentages ranged from 59% in social studies to 37% in English. Learning strategy theory suggests that the types of learning (fake study and orthodox learning) and depths of processing (surface or deep processing) that students choose when preparing for term and external tests explain a part of the current results. While differences between test scores on term and external tests have not garnered much attention in research, the large percentages of students who perform well on term tests but not on external tests clearly suggest that this is an important topic in research and practice, which corroborates the arguments made by previous studies 1,2,[5][6][7][8] .
For the second research question, we found practically no differences between different groups of students. The literature on motivation for learning suggests that the willingness to continue learning and test belief are linked to academic performance 9,11) and possibly to differences in scores between the term and external tests. However, the lack of differences among the groups from the two perspectives suggests the need to further explore these relationships.
There are four possible reasons why motivation for learning and test belief did not vary much across groups in the current study: First, the characteristics of participants may have affected the results. All the participants were Juntendo medical students who are relatively good at science, math, and English. Many have studied seriously so far, and those who completed the questionnaires may have been even more serious. Furthermore, many of these students study to be good doctors in the future. Therefore, those who responded to our questionnaire may have had similar motivation for learning and test belief, regardless of the type of tests they were good at. Second, as for the Both Group, there might have been two types of students: those who achieve the same high scores and the  same low scores in the two types of tests; therefore, we should have asked for more details to classify the Both Group into two smaller groups. These two groups may have different types of motivation and test belief. It should be noted that although some may argue that obtaining low scores for both term and external tests is unlikely as all the participants passed the difficult entrance examination to enter medical school, some participants passed the exam without Japanese and social studies, and others may have had very strong abilities in a few subjects that may have minimized the need to obtain high scores in other subjects. Thus, it seems possible that some students had low scores on both term and external tests. Third, we extracted only part of the questions from the previous studies; however, it may have been better to apply all or other questions. Fourth, because we collected data using questionnaires, the results may show the participants' self-assessment or perception of their academic performance and other personal characteristics, that is, how students presumed they did in the two types of tests and how they judged their motivation and test belief. The nature of self-assessment 13) may have blurred the results of the relationships in our focus. Therefore, a narrow range of differences among the current respondents and the characteristics of our questionnaires are areas in need of improvement in terms of examining the relationships between motivation for learning, test belief, and academic performance in term and external tests. Given the opportunity to conduct another survey, we will further expand the range of target participants, increase the number of subjects, and consider teachers' viewpoints as well as students' to rigorously test the current study's research questions.