2025 年 1 巻 2 号 p. 1-19
This study aims to verify the effectiveness of a parallel learning approach for Python and R programming languages in data science education and to develop a practical learning environment using the Windows Subsystem for Linux (WSL) and Jupyter Notebook in a Bring Your Own Device (BYOD) setting. Additionally, it measures and analyzes the educational impact of this method on specialized courses in business and healthcare. The research methodology includes implementing a curriculum for simultaneous learning of both languages in the first-year "DS Programming Introduction" course, with effectiveness measured through language comprehension tests, problem-solving ability tests, and student language preference surveys. Furthermore, the study evaluates the efficacy of the BYOD environment using the WSL and Jupyter Notebook and optimizes programming education methods for specific fields. The findings of this study are expected to contribute to the development of highly skilled data science professionals by establishing new best practices in data science education.
本研究では、データサイエンス教育において、PythonとR言語を並行して学ぶ教育手法の有効性を検証し、Windows Subsystem for Linux (WSL) とJupyter Notebookを活用したBYOD環境での実践的学習環境の構築を目指す。さらに、ビジネスとヘルスケアの専門科目に対する本手法の教育効果についても測定・分析する。研究方法として、1年次の「DSプログラミング入門」において両言語を並行して学ぶカリキュラムを実施し、言語理解度テスト、課題解決能力テスト、学生の言語選好アンケートなどを用いて効果を測定する。また、WSLとJupyter Notebookを用いたBYOD環境の有効性を評価し、専門分野別のプログラミング教育手法の最適化を図る。本研究の成果は、データサイエンス教育の新たなベストプラクティスとして、高度なスキルを持つデータサイエンティスト人材の育成に貢献することが期待される。
[Original Article]
A Quasi-Experimental Study of Parallel Python-R Learning in Data Science Education: Implementation and Assessment in a BYOD Environment
Naruki Shirahama1
2Shimonoseki City University
要旨
本研究では、データサイエンス教育において、PythonとR言語を並行して学ぶ教育手法の有効性を検証し、Windows Subsystem for Linux (WSL) とJupyter Notebookを活用したBYOD環境での実践的学習環境の構築を目指す。さらに、ビジネスとヘルスケアの専門科目に対する本手法の教育効果についても測定・分析する。研究方法として、1年次の「DSプログラミング入門」において両言語を並行して学ぶカリキュラムを実施し、言語理解度テスト、課題解決能力テスト、学生の言語選好アンケートなどを用いて効果を測定する。また、WSLとJupyter Notebookを用いたBYOD環境の有効性を評価し、専門分野別のプログラミング教育手法の最適化を図る。本研究の成果は、データサイエンス教育の新たなベストプラクティスとして、高度なスキルを持つデータサイエンティスト人材の育成に貢献することが期待される。
キーワード: データサイエンス教育、並行学習、Python、R言語、BYOD、WSL、Jupyter Notebook、ビジネス、ヘルスケア、プログラミング教育
Abstract
This study aims to verify the effectiveness of a parallel learning approach for Python and R programming languages in data science education and to develop a practical learning environment using the Windows Subsystem for Linux (WSL) and Jupyter Notebook in a Bring Your Own Device (BYOD) setting. Additionally, it measures and analyzes the educational impact of this method on specialized courses in business and healthcare. The research methodology includes implementing a curriculum for simultaneous learning of both languages in the first-year "DS Programming Introduction" course, with effectiveness measured through language comprehension tests, problem-solving ability tests, and student language preference surveys. Furthermore, the study evaluates the efficacy of the BYOD environment using the WSL and Jupyter Notebook and optimizes programming education methods for specific fields. The findings of this study are expected to contribute to the development of highly skilled data science professionals by establishing new best practices in data science education.
Keywords: Data Science Education, Parallel Learning, Python, R Language, BYOD, WSL, Jupyter Notebook, Business, Healthcare, Programming Education
With the advent of Society 5.0, the demand for human resources possessing the capability to effectively analyze large and complex data sets and make precise decisions is rapidly increasing. In response to this societal demand, universities across Japan are establishing new data science faculties and departments. Shimonoseki City University also established the Faculty of Data Science in 2024. Python, which demonstrates particular advantages in data analysis, machine learning, and R language, which specializes in statistical analysis, has been adopted as the primary teaching language at these educational institutions.
However, substantial challenges remain for contemporary data science education. Numerous students tend to rely exclusively on the initial programming language they acquire and experience difficulty adapting to alternative languages and utilizing multiple languages effectively. This phenomenon has impeded the development of a versatile and adaptable workforce in the rapidly evolving field of data science.
Furthermore, in actual work environments, the ability to use multiple languages and tools according to a situation is often required. Therefore, there is an urgent need in the educational field to develop human resources that not only have a deep understanding of a single language, but are also able to flexibly use multiple languages and take advantage of their respective strengths. The purpose of this study was to propose a new approach to this issue and to examine its effectiveness.
1.1 Background of the StudyData science is an interdisciplinary field that derives meaningful insights from large and complex datasets. The field requires a wide variety of tasks, including data collection, pre-processing, analysis, and visualization, to interpret the results. It is known that different programming languages and tools are best suited to the different characteristics of each task. For example, Python is widely adopted for data processing and machine-learning tasks because of its rich set of libraries (NumPy, Pandas, Scikit-learn, etc.). R, on the other hand, has advantages in statistical analysis and advanced data visualization and is particularly useful in academic research and biostatistics [1].
In recent years, the role of data scientists in the industry has evolved, and the required skill set has changed significantly. According to Kumar and Aithal (2020)[2], modern data scientists need flexibility and broad technical knowledge that is not dependent on a specific programming language [2]. This trend is a prerequisite for adapting to rapidly evolving technology and business environments.
Against this backdrop, there is an urgent need to develop effective methods for learning multiple programming languages in data science education. In particular, Python and R languages are highly suitable for parallel learning because their strengths are complementary and can cover the entire data science workflow. Donoho (2017) provided a comprehensive analysis of the scope and evolution of data science, arguing that data science encompasses much more than just statistics and machine learning, including data preparation, representation, computation, visualization, and the scientific study of data analysis itself [3].
Furthermore, Bialystok et al. (2009) suggested that the concurrent learning of multiple languages contributes to improved problem-solving skills [4]. This approach has the potential to foster student adaptability and creative thinking in data science education.
While existing research has extensively examined programming language education, several critical gaps remain.
This study addressed these gaps through a systematic investigation of parallel language learning in a controlled educational environment.
1.2 Purpose of the StudyThe main objective of this study was to empirically verify the effectiveness of a parallel learning approach for Python and R languages in data science education. Furthermore, we aim to evaluate the impact of a Bring Your Own Device (BYOD) environment integrating the Windows Subsystem for Linux (WSL) and the Jupyter Notebook on practical data science learning. To achieve this objective, the following three research questions will be set
Primary Research QuestionHow does the parallel learning of Python and R affect students' programming competency and problem-solving abilities in data science education?
Secondary Research QuestionsThis investigation aims to examine the optimization of the concurrent acquisition of Python and R programming languages in the context of data science applications across various disciplines. The objective was to formulate an educational framework that addresses the specific requirements and demands of each field of study.
Through these research topics, this study aims to provide a new perspective on conventional data science education methods and contribute to the establishment of more effective and practical educational approaches. Furthermore, the results of this research are expected to serve as the basis for an educational model to foster highly adaptable human resources in the rapidly evolving field of data science.
1.3 Significance of the StudyThis study aims to contribute to the development of educational methodology in this field by presenting a new paradigm for data science education and empirically verifying its effectiveness. The significance of this study can be viewed from the following perspectives from multiple perspectives.
This apprenticeship model of learning aligns with the established practices in other scientific fields. This approach contributes to improving students' adaptability and creative problem-solving skills and can be positioned as an attempt to apply the cognitive advantages identified in the context of programming education [4]. This study is expected to contribute to the development of highly adaptive personnel in the rapidly evolving field of data science.
Second, the construction and evaluation of a BYOD environment integrating WSL and the Jupyter Notebook aims to design an educational environment based on practical learning theory [5] and demonstrate its effectiveness. This effort bridges theory and practice and provides students with learning opportunities that closely resemble the actual work environment. This is expected to effectively implement Kolb's (2014) experiential learning cycle [6] and facilitate the acquisition of deep understanding and practical skills.
Third, the development of educational methods specific to the different disciplines of business and healthcare is based on situated learning theory [7], which aims to develop context-dependent applied competencies in data science. This approach bridges the gap between industry needs and academic education, and contributes to the development of a more effective human resource development model.
Furthermore, through the qualitative improvement of data science education, this research will contribute to the realization of an advanced data-utilizing society, which is the goal of Society 5.0. The development of human resources to drive digital transformation will strengthen industrial competitiveness and promote social innovation, and its social significance is extremely significant.
The results of this research can be applied not only to data science education but also to education in other fields requiring complex skill sets, and are expected to contribute widely to the development of practical and effective educational models in higher education.
Data science education has developed rapidly in response to the demands of the digital age and has come to occupy a core position in the curricula of higher education institutions. A comprehensive survey by Shimbaru (2023) documented that the Mathematics, Data Science, and AI Education Program Accreditation System (MDASH) had certified 78 educational institutions by August 2021, including 66 universities, 2 junior colleges, and 10 technical colleges [8]. However, the survey revealed marked differences in the curriculum structure and completion requirements of each institution, making the standardization and quality assurance of educational content an urgent issue.
Sasajima et al. (2023) conducted a five-year study implementing problem-based learning (PBL) exercises using real data in undergraduate data science education and reported positive outcomes in practical skill development [1]. Their longitudinal study showed that the systematic introduction of PBL exercises from lower grades had a statistically significant positive impact on students' acquisition of practical skills (p < 0.01). However, the study also pointed out that individual differences in students' basic programming skills may affect the learning effectiveness of PBL, suggesting the need for further research in this regard.
A survey conducted by Nakada and Gherghel (2024) found that only 33% of humanities students in the Faculty of Foreign Studies showed interest in data science, whereas other faculties showed higher levels of interest (e.g., 52% in the Faculty of Business) [9].
Specifically, 54.3% of liberal arts students indicated that they were "not very interested" or "not interested at all" in data science, a statistically significant difference from science students (23.7%) (χ²(1) = 45.62, p < 0.001). This result suggests that current data science education may be biased toward science fields, and highlights the need for an integrated arts and sciences approach.
While these previous studies demonstrate the importance and prevalence of data science education, they raise three main issues: standardization of the curriculum, effective ways to develop practical skills, and the need for a comprehensive approach that is not limited to the humanities and sciences. This study aimed to develop and validate a new educational model to address these issues.
2.2 Methodology of Programming Language EducationWith regard to programming language teaching methodologies, there are two main paradigms: the traditional approach, which emphasizes a deep understanding of a single language, and the emerging approach, which encourages the parallel learning of multiple languages. Each approach is based on a different theoretical foundation and pedagogical goal.
Yusupova et al. (2022) empirically examined the impact of adaptive e-learning environments using machine learning algorithms on programming education [10]. Their study showed that e-learning environments that provide personalized learning pathways significantly improve student learning outcomes compared to traditional uniform teaching methods (t(198) = 3.74, p < 0.001, Cohen's d = 0.53). In particular, they concluded that flexibility in meeting students' individual needs is a key factor in this effect.
Kumar & Aithal (2020), on the other hand, through a qualitative analysis of the demand for human resources in the data science field, revealed the diversity of skill sets required of data scientists [2]. They emphasized that industry demands require data scientists to be proficient in multiple programming languages and tools, highlighting the importance of versatile technical skills in the field. This finding suggests the effectiveness of a concurrent learning approach for multiple languages, and has important implications for the design of educational curricula.
In addition, Bialystok et al. (2009) demonstrated that bilingualism is associated with enhanced executive control functions, particularly in tasks requiring inhibition of distracting information and switching between tasks, although bilingual speakers also show some disadvantages in lexical access and vocabulary size compared with monolinguals [4].
While research on programming education is expanding, Robins (2019) notes that teaching and learning programming continue to face well-documented challenges [11]. In particular, longitudinal studies on the effects of parallel learning and comparative studies that consider different learner characteristics are lacking.
While these previous studies suggest the potential effectiveness of the multiple-language parallel learning approach in programming language education, they also highlight the need to rigorously examine its effectiveness. This study aims to address this research gap and empirically examine the effectiveness of multiple language parallel learning in data science education.
2.3 Educational Use of WSL and Jupyter NotebookResearch on the educational use of the Windows Subsystem for Linux (WSL) and Jupyter Notebook is an emerging area of computer science education. The WSL provides the ability to run Linux directly in a Windows environment, greatly simplifying the construction of a cross-platform development environment. Jupyter Notebook, on the other hand, is an open-source web application that allows users to interactively execute code and instantly visualize results.
Perez and Granger (2015), in describing Project Jupyter, emphasized the importance of integrating code and explanatory text into a single interactive document for scientific computing and data analysis [12]. According to their qualitative analysis, a key advantage of the Jupyter Notebook is its ability to integrate code and explanatory text into a single interactive document. This characteristic is highly valued in both reproducible research and teaching, suggesting that it is particularly effective in developing computational thinking skills. Rule et al. (2019) provide guidelines for using Jupyter Notebooks to enhance reproducibility and transparency in computational research through proper documentation and sharing practices [13].
Ochkov et al. (2022) demonstrated how Jupyter Notebook and JupyterLab can be effectively integrated into STEM education, supporting various learning activities including problem formulation, scientific calculations, computational experiments, and interactive teaching materials [14]. On the other hand, empirical research on the combined educational use of WSL and the Jupyter Notebook remains limited; Ochkov et al. (2022) provide a comprehensive analysis of the impact of these environments on STEM education. Their study showed that the Jupyter Notebook and JupyterLab support a wide range of learning activities, from problem-setting formulation to scientific computation, computational experimentation, and even the creation of interactive teaching materials. In particular, the ability to integrate multiple programming languages (e.g., Python and R) and markup languages (e.g., Markdown and LaTeX) in a single environment has been pointed out as having the potential to deepen learners' understanding and promote practical skill acquisition. This finding provides theoretical support for the effectiveness of the parallel learning approach for Python and R languages proposed in this study and suggests that an integrated learning environment may contribute to the acquisition of multiple languages and practical data science skills. A survey of freshmen university students conducted by Kambe et al. (2024) (n = 102) reported significant individual differences in students' ICT utilization skills in a 14-item self-assessment of basic computer skills [15].
While these previous studies show that WSL and the Jupyter Notebook have great potential for data science education, they also highlight the lack of empirical studies on their effective implementation and long-term learning effects. This study aims to address this research gap and systematically examine the impact of a learning environment that integrates WSL and Jupyter Notebook on concurrent learning of multiple programming languages.
In this study, we adopted a quasi-experimental approach (quasi-experimental approach) using naturally formed learning patterns to examine the effectiveness of parallel learning of Python and R languages in data science education. This approach is suitable for implementation in an educational setting because it exploits naturally occurring differences in conditions without experimental manipulation, and is also appropriate from the perspective of ethical considerations [16].
3.1 Research SubjectsThe research subjects will be first-year students enrolled in the 2025 academic year at the Faculty of Data Science at Shimonoseki City University. The same curriculum will be offered to all students, and learning patterns that naturally form during the one-year learning process will be observed.
3.2 Learning EnvironmentThe following integrated learning environments are provided for all students:
1. Ubuntu 22.04 LTS environment using WSL2 (Windows Subsystem for Linux 2) on Windows 11
This allows students to use a Linux-based development environment integrated directly into Windows, providing a modern development platform that bridges the Windows and Linux environments (Leeks, 2020) [17].
2. Jupyter Notebook and RStudio Server
3. Visual Studio Code(VS Code)
A lightweight cross-platform code editor that supports multiple programming languages, including Python and R, provides seamless integration with the WSL. Its extensible architecture allows it to function effectively across different development scenarios, including web, desktop, and cloud development (Del Sole, 2022) [20].
Students have autonomy in selecting and utilizing these environments, based on their individual preferences. This flexible environment setting will allow students to choose languages according to their personal learning styles and preferences, and to observe natural learning patterns. These environments provide students exposure to professional computational tools and practices (Nolan & Lang, 2015), while facilitating practical data science skills [21]."
3.3 CurriculumThe curriculum for this study will be designed to allow students to learn the basics of Python and R languages in parallel, and will be built on the constructivist alignment theory proposed by Biggs et al. (2023) [22]. This theory emphasizes the coherence among learning objectives, instructional activities, and assessment methods to promote deep learning. The curriculum consists of six main topics, with appropriate study periods for each topic.
1. Fundamental data types and variables
2. Control structure (conditional branching, loops)
3. Functions and Modules
4. Manipulation of data frames
5. Data visualization:
6. Basics of statistical analysis
Each topic will be structured around interactive exercises using Jupyter Notebook and RStudio Server, adopting a Problem-Based Learning (PBL) approach (Hmelo-Silver, 2004) [23]. This allows students to learn by comparing the characteristics of both languages through practical problem-solving. The PBL approach was chosen because it promotes flexible knowledge acquisition, effective problem-solving skills, self-directed learning, collaboration, and intrinsic motivation, all of which are essential in data science education.
The effectiveness of educational approaches that utilize real-world datasets has been demonstrated in multiple studies. Vandergon et al. (2018) examined the impact of computer-supported collaborative science (CSCS), a teaching method that engages students in the collection, analysis, and interpretation of real-world data [24]. Their research showed that working with real-world data significantly increases students' motivation and engagement in learning. Based on these findings, the use of real-world datasets is recommended for assignments conducted at the end of the semester to promote the acquisition of practical data science skills.
The evaluation consisted of an assignment at the end of each topic (50%), and a final comprehensive exercise (50%). These projects allow students to freely choose their language and observe their natural language preferences and learning patterns. This curriculum design allows students to choose a language according to their own interests and aptitudes while learning the basics of both languages systematically and in depth.
3.4 Data CollectionThis study employs a Mixed Methods Research method to collect quantitative and qualitative data (Guest & Fleming, 2014), as this approach allows for more comprehensive evidence and can address research questions that a single-method approach cannot adequately answer [25].
1. Submission of assignment file
2. Self-assessment questionnaire
3. Semi-structured interviews
The collected data will also be classified a posteriori into the following groups.
This study employs a mixed analysis approach that combines quantitative and qualitative analyses.
1. Descriptive statistics
2. Statistical hypothesis testing
3. Multivariate analysis
4. Qualitative analysis
5. Mixed Analysis Methods
All statistical analyses were performed using R, with a significance level set at α = 0.05. Effect sizes were reported using Cohen's d or Hedges' g.
3.6 Ethical considerationsThis research will be conducted in accordance with the ethical principles of the Declaration of Helsinki [29] and the research ethics guidelines of Shimonoseki City University. Specifically, the following measures will be taken
These ethical considerations will protect the rights and welfare of participants while simultaneously ensuring the conduct of scientifically valuable research.
The parallel learning approach for Python and R languages proposed in this study is expected to have multifaceted effects on data science education. Below, we discuss the anticipated benefits in terms of key aspects.
4.1 Effects of language acquisitionThe parallel learning group (PLG) is expected to show more balanced proficiency in both languages compared to the single language focus group (SLFG). Specifically, the following effects are expected
The concurrent learning approach is expected to contribute to improved problem-solving skills in the following ways
The proposed integrated learning environment (WSL, Jupyter Notebook, RStudio Server, VS Code) is expected to be effective in
This study focuses on two specialties, business data science and healthcare data science, and is expected to have the following effects
These anticipated effects form the hypotheses of this study and provide a basis for future empirical testing. As the study progresses, a comparison of these predicted and actual results will help to clarify the effectiveness and challenges of the concurrent learning approach in data science education.
While these expected effects provide a theoretical framework for our research, it is essential to validate them through an empirical investigation. To this end, we conducted a preliminary experiment in 2024 to test our hypotheses and refine our methodology before its full implementation in 2025. This preliminary study focused on examining students' responses to parallel language learning and testing the effectiveness of our BYOD environment.
In this study, a preliminary experiment was conducted in the required course "Introduction to Data Science Programming" (15 weeks) in 2024 to verify the educational effectiveness and feasibility of Python-R parallel learning, which will be implemented in 2025. The purpose of this preliminary experiment was to confirm the effectiveness of the proposed parallel learning approach and identify practical issues.
5.1 Survey Subjects and TimeframeThe survey was conducted with 88 students enrolled in the "Introduction to Data Science Programming" (a required course) offered in the second semester (15 weeks) of the 2024 academic year. The survey was conducted after the 11th lecture and evaluations based on what they had learned up to that point were obtained. There were 62 valid responses were received (response rate: 70.5%).
5.2 Curriculum StructureStudents were provided with WSL2 (Windows Subsystem for Linux 2), Ubuntu 22.04, and JupyterLab environments on Windows 11 and were given hands-on training in a Bring Your Own Device (BYOD) format using their own laptop computers. The curriculum was designed to cover basic data science elements, such as functions, collections, data frames, and statistical methods, and to allow students to learn R and Python in parallel.
5.3 Survey Methodology 5.3.1 Quantitative evaluation by Visual Analog Scale (VAS)A Visual Analog Scale (VAS) was employed for the evaluation, which is a measurement method that enables continuous subjective evaluation, and respondents responded by placing a mark at an arbitrary position on a horizontal line.
In this study, we used VAS to quantitatively analyze learners' subjective evaluations of parallel learning in R and Python from four perspectives: understanding, confidence, motivation, and potential. It has the advantage of being able to detect minute differences that are difficult to capture by using a conventional Likert scale. A scale of to 0-100 was used for each evaluation index to measure the learner’s perception more precisely. In this analysis, the left and right ends of the line were set as 0 and 100, respectively, and the results were quantified as integer values. Figure 1 shows the questionnaire form used.
Figure 1 Online survey form (in part, VAS)
The following four items were evaluated for the R language and Python:
1. “Understanding” (Q1: R, Q2: Python)
2. “Confidence” (Q3: R, Q4: Python)
3. “Motivation” (Q5: R, Q6: Python)
4. “Potential” (Q7: R, Q8: Python)
The following questions were asked to measure language preference (Q9).
Questions were also asked about programming experience (Q10) and the desired major course of study (Q11) as background information. ("Extensive programming experience" was selected by 0 respondents)
Programming experience (Q10)
Preferred Course of Study (Q11)
In the free description field, there were entries for mathematics, AI, teaching, and sensibility that were undecided.
Free-text Qualitative EvaluationThe questions were in the form of free-answer questions.
Q12: "Please describe your preferences for Python and R, including the specific aspects that you like or find challenging in each language."
Additional reasons for adopting the VAS include the following.
Figure 2 Distribution of Four Metric VAS Score
Table 1 Descriptive Statistics for VAS Scores
Metric | Language | n | Mean | SD | Median | Min | Max |
---|---|---|---|---|---|---|---|
Understanding | R | 62 | 61.16 | 13.45 | 63.0 | 23 | 89 |
Python | 62 | 62.39 | 14.00 | 63.0 | 25 | 91 | |
Confidence | R | 62 | 49.47 | 18.17 | 47.5 | 11 | 91 |
Python | 62 | 49.65 | 16.63 | 45.0 | 15 | 88 | |
Motivation | R | 62 | 69.73 | 16.82 | 68.5 | 29 | 100 |
Python | 62 | 72.39 | 15.84 | 72.5 | 28 | 100 | |
Potential | R | 62 | 76.00 | 15.85 | 75.5 | 39 | 100 |
Python | 62 | 78.27 | 15.83 | 76.5 | 36 | 100 |
Wilcoxon’s signed-rank test was used to compare the R and Python languages in this study. The test results are listed in Table 2.
Table 2 Results of Wilcoxon Signed-Rank Test between R and Python VAS Scores
Basic Test Statistics
Metric | W-stat | p | p (adj) | Effect size r | Effect Int. |
---|---|---|---|---|---|
Understanding | 512.0 | 0.012 | 0.049 | 0.318 | Medium |
Confidence | 701.5 | 0.904 | 1.000 | 0.015 | Negligible |
Motivation to learn | 369.5 | 0.004 | 0.014 | 0.370 | Medium |
Potential | 382.0 | 0.008 | 0.033 | 0.335 | Medium |
Detailed Effect Analysis
Metric | Confidence Interval | H-L est. | Significance | |
---|---|---|---|---|
(Lower) | (Upper) | |||
Understanding | -3.000 | -0.500 | -1.500071 | TRUE |
Confidence | -1.500 | 1.000 | -0.000058 | FALSE |
Motivation to learn | -5.000 | -1.000 | -2.999991 | TRUE |
Potential | -3.500 | -0.500 | -2.000036 | TRUE |
Using this data, quantitative (descriptive and inferential statistics) and qualitative (text mining) analyses were conducted to examine the educational effects of learning R and Python in parallel from multiple perspectives.
First, the VAS score analysis was explained. Figure 2 shows a graph that visualizes the distribution of the four metrics (Understanding, Confidence, Motivation, and Potential) by superimposing a box-and-whisker plot on a violin plot and bee swarm plot. The outliers are indicated in red. Table 1 summarizes the descriptive statistics of the VAS scores.
To simultaneously analyze the four metrics (“Understanding,” “Confidence,” “Motivation,” and “Potential”), this study employed adjusted values using the Bonferroni method. This ensures the appropriate control of the first type of error and statistical reliability of the conclusions. Note that the adjusted value is calculated as four times the original value, and if it exceeds one, it is rounded to one. The results of the analysis of these four metrics are described below.
1. Comparative analysis of “Understanding”
The results of the Wilcoxon signed-rank test for “Understanding” confirmed a statistically significant difference between R and Python (W = 512.00, p = 0.012, r = 0.318). The effect size was medium, with a Hodges-Lehmann estimator of -1.50 (95% CI: [-3.00, -0.50]). Descriptive statistics showed that Python (mean = 62.39, SD = 14.00, median = 63.00) had slightly higher understanding than R (mean = 61.16, SD = 13.45, median = 63.00). From the distribution plots in Figure 2, a higher concentration of scores was observed in Python, although the variation in understanding was similar in both the languages.
2. Comparative analysis of “Confidence”
No statistically significant difference was detected for “Confidence” (W = 701.50, p = 0.904, r = 0.015). Effect sizes were negligible, with a Hodges-Lehmann estimator of 0.00 (95% CI: [-1.50, 1.00]). For descriptive statistics, R (mean = 49.47, SD = 18.17, median = 47.50) and Python (mean = 49.65, SD = 16.63, median = 45.00) were nearly equivalent. From the distribution chart in Figure 2, we can observe that the distribution pattern of confidence in both languages is similar and tends to be concentrated around the median value.
3. Comparative analysis of “Motivation”
A significant difference was detected for “Motivation” (W = 369.50, p = 0.004, r = 0.370). The effect size was medium, with a Hodges-Lehmann estimate of -3.00 (95% CI: [-5.00, -1.00]), with Python (mean = 72.39, SD = 15.84, median = 72.50) showing significantly higher learning motivation than R (mean = 69.73, SD = 16.82, median = 68.50 ), indicating a significantly higher motivation than R (mean = 69.73, SD = 16.82, median = 68.50). The distribution plots in Figure 2 confirm the bias toward the higher-score range in Python.
4. Comparative analysis of “Potential”
Significant differences were also identified in perceptions of “Potential” (W = 382.00, p = 0.008, r = 0.335). The effect size was medium, with a Hodges-Lehmann estimate of -2.00 (95% CI: [-3.50, -0.50]), with Python (mean = 78.27, SD = 15.83, median = 76.50) receiving slightly higher ratings of future potential than R (mean = 76.00, SD = 15.85, median = 75.50). The distribution in Figure 2 shows that both languages have a slightly higher evaluation of future potential than the other languages. The distribution in Figure 2 shows a bias toward higher scores in both languages but is more pronounced in Python.
The results of these analyses showed significantly more positive evaluations of Python, particularly in terms of motivation and perceived future potential. However, the differences in understanding were small and no differences were observed in confidence, suggesting that parallel learning does not cause a significant bias in basic language acquisition.
5.3.2 Sentiment analysis of free-text commentsFor the sentiment analysis of free-text comments, we employed a method combining morphological analysis and a sentiment word dictionary. First, morphological analysis was performed using RMeCab (Version 1.0) to extract nouns, verbs, and adjectives. To extract vocabulary containing sentimental expressions, vocabulary was classified according to the following criteria.
Sentiment scores were calculated using the following formula:
p: total number of positive expressions, n: total number of negative expressions
where p is the total number of positive expressions and n is the total number of negative expressions. Adding 1 to the denominator prevents zero division and normalizes the score to the range [-1, 1]. This sentiment score indicates that the closer the value is to -1, the more negative the sentiment expressions are, and the closer it is to 1, the more positive the sentiment expressions are.
Table 3 shows the descriptive statistics of the sentiment scores by language-preference group. The Python weak preference group had a neutral mean sentiment score of 0.000 and lowest standard deviation of 0.243. In contrast, the R weak preference group had a mean sentiment score of 0.271 and the Python strong preference group had a mean sentiment score of 0.389, indicating relatively positive sentimental expressions.
Table 3 Sentiment Scores by Language Preference Group
Language Preference Group | n | Mean | SD | Median | Positive Ratio |
---|---|---|---|---|---|
Strong R Preference | 20 | 0.000 | 0.243 | 0.000 | 11.1 % |
Weak R Preference | 12 | 0.271 | 0.295 | 0.250 | 50.0 % |
Weak Python Preference | 18 | 0.000 | 0.243 | 0.000 | 61.5 % |
Strong Python Preference | 12 | 0.389 | 0.347 | 0.500 | 66.7 % |
As shown in Figure 3, the distribution of sentiment scores was visualized using a combination of violin, box-and-whisker, and bee swarm plots. The analysis showed that the R strong preference group and Python weak preference group tended to concentrate their emotion scores on approximately 0. In contrast, the R weak preference group and Python strong preference group showed a relatively wide distribution of emotion scores and a tendency to use more positive emotional expressions. In particular, the Python strong-preference group showed the highest distribution of emotion scores compared with the other groups.
Figure 3. Distribution of sentimental expression by language preference group
To analyze the differences in emotional expressions between the language preference groups in more detail, we focused our analysis on the ratio of positive emotional expressions. As shown in Figure 4, the positive ratio for each group was visualized as a bar graph.
Figure 4: Positive Ratios by Language Preference Group
The analysis showed that The Python strong-preference group had the highest positive ratio (66.7 %), followed by the Python weak-preference group (61.5 %). In contrast, the R weak preference group showed a neutral value of 50.0%, whereas the R strong preference group had the lowest positive ratio (11.1 %). These results suggest that Python-preferring learners tended to use more positive expressions.
The results of this analysis indicate that students' affective responses to parallel learning reflect complex evaluations based on their understanding of language characteristics and learning experiences rather than simple likes and dislikes. In particular, the tendency of students who express a strong preference to make more objective evaluations suggests that parallel learning may promote a deeper understanding of language characteristics.
Finally, we analyzed the characteristic patterns of sentimental expressions from the learners' free descriptions. In positive contexts, expressions such as "like," "easy," and "easy to understand" were commonly used, especially positive evaluations regarding the "ease of understanding" of the language. On the other hand, in negative contexts, expressions such as "difficult," "complicated," and "confusing" confusing were observed, with particular reference to the difficulties in dealing with errors.
Furthermore, distinctive expression patterns were identified for each language preference group: the R language preference group often referred to the intuitive understandability of statistical processing, while the Python preference group was characterized by references to the abundance of libraries and practicality. In the neutral group, many objective expressions that compared and evaluated the features of both the languages were observed. These expression patterns may reflect differences in perceptions of each group's language preferences.
It should be noted that the results described above are only preliminary experiments, and that the findings were obtained under limited conditions of educational practice for first-term students. It is expected that more effective educational methods will be established in the future through further practice and accumulation of research.
These preliminary results largely support our theoretical expectations outlined in Section 4, particularly regarding the development of metalinguistic competence and effectiveness of the BYOD environment. However, they also highlight several areas requiring attention in full implementation, such as the need for more structured support for students who show strong language preferences. These insights will be invaluable in refining our approach to the 2025 implementation.
This study was designed to test the effectiveness of a parallel learning approach for Python and R languages in data science education. Below, we discuss key aspects of this study and provide future perspectives.
6.1 Effectiveness of the concurrent learning approachIn developing our theoretical framework, we carefully considered parallels from natural language acquisition research, while acknowledging the fundamental differences between natural and programming languages. Programming languages, which are formal systems with precise syntax and semantics, are likely to engage different cognitive processes than natural languages. However, insights from multilingual education research provide useful metaphors and hypotheses for investigating how learners might effectively acquire multiple programming languages. The following analysis explores these potential parallels, while maintaining appropriate theoretical caution.
The concurrent learning approach proposed in this study has the potential to provide a new perspective on data science education. While acknowledging the fundamental differences between natural and programming languages, we draw inspiration from theories of multilingual acquisition in natural language learning (cf. Hufeisen and Marx, 2007) [40] to explore potential parallels in how learners might develop competencies across multiple programming languages. The following benefits are expected.
However, the possibility of increased cognitive load and decreased learning efficiency owing to parallel learning must also be considered, and these points must be carefully examined.
6.2 Advantages and Challenges of a BYOD EnvironmentThe BYOD (Bring Your Own Device) environment employed in this study is expected to have the following advantages and challenges
Advantages:
Issue:
To address these issues, it is necessary to develop appropriate guidelines and establish a technical support system, particularly to address the diverse learning spaces and technical challenges identified by Sølvberg and Rismark (2012) [41] in their study of mobile learning environments.
6.3 Optimize educational methods specific to your field of expertiseThis study focuses on two specialized areas: business data science and healthcare data science. The following points are important for the optimization of educational methods specific to these disciplines.
Through these optimizations, more effective professional education based on Kolb's (1984) [42] experiential learning cycle, which emphasizes concrete experience, reflective observation, abstract conceptualization, and active experimentation, is expected to be realized.
6.4 Limitations of the study and future challengesThe following limitations and issues exist in this study and require further investigation in future studies.
It is hoped that addressing these issues will more clearly demonstrate the effectiveness of the concurrent learning approach in data science education, and lead to the establishment of a practical educational model. In the future, the findings of this study can be used as the basis for planning larger-scale, longer-term research projects.
Finally, this study proposes a new approach to data science education, and the results may have important implications for educational practices and policymaking. However, we must not forget that the essence of education lies in the growth of individual learners. We conclude this paper by emphasizing the importance of flexible educational approaches that respond to the needs and potential of each individual learner as well as technological and methodological advances.
∗ Department of Economics, Shimonoseki City University (email: notsu-ta@shimonoseki-cu.ac.jp). We are grateful to anonymous two referees for helpful comments.
∗ Department of Economics, Shimonoseki City University (email: notsu-ta@shimonoseki-cu.ac.jp). We are grateful to anonymous two referees for helpful comments.