I. INTRODUCTION
Now a days, Data Science is affecting almost every sector of the industry. Work across nearly all domains is becoming more data driven, affecting both available jobs and required skills. Accordingly, the demand for Data Science talent is growing rapidly, and academic institutions are actively offering relevant degree programs.
In particular, Data Science is an emerging discipline that was born in the United States, so the effort of universities in the U.S. to provide educational services are leading the way. ‘Discover Data Science [1]’ reported that five years ago, a bachelor’s degree in Data Science was nearly nonexistent but now, each academic year, colleges and universities are adding new Data Science programs (DS programs). In fact, there are over 50 schools across the U.S. offering a Data Science major. Commonly, the four-year undergraduate DS programs are offered by either the Computer Science, the Mathematics, the Statistics, or newly established Data Science department. It is also provided as an inter-disciplinary program in collaboration with multiple departments [2].
According to the Korea Council of University Educations [3], as of 2022, there are 18 Data Science-related recruitment units at four-year universities in Korea, including AI·Data Department at Korea National University of Transportation. Unlike the U.S. universities, Data Science-related undergraduate programs are still in their infancy in Korea. The newly established Data Science department is facing difficulties in creating an essential curriculum and course objectives.
This paper introduces the results of survey and analysis of the curricula of Data Science undergraduate programs in the United States. In particular, we investigated the characteristics of the curriculum according to the hosting department of DS program. Also, we analyzed how much emphasis is placed on courses belong to the Data Science area, such as data management, data visualization, and data modeling. It is intended to be helpful to understand Data Science undergraduate programs and to develop them for better education in this attractive new discipline.
II. RELATED WORKS
The Data Science programs are multidisciplinary and a single subject domain is not enough to cover the magnitude of content and skills needed for DS programs [2]. In the United States, most of today’s DS program courses cover three foundation areas; Mathematics, Statistics, and Computer Science [4]. In addition to the foundation courses, several Data Science curriculum guidelines emphasize Data Science specific courses such as data curation, data visualization, data modeling, and data mining [4-6]. In addition, it is strongly recommended to include courses outside of Mathematics, Statistics, and Computer Science to accommodate a wide range of Data Science applications [4-5]. Also, the ‘National Academies of Sciences, Engineering, and Medicine [4]’ focuses on developing data insights through a curriculum including important Data Science concepts, applications to real-world problems with an understanding of limitations, and ethical issues related to Data Science. It is widely adopted as a guideline for curriculum in Data Science undergraduate program [7].
According to the ACM Taskforce on Data Science Education [8], since industry employers require more computing skills than statistical or mathematical skills, under-graduate programs should strengthen students’ computing skills.
The authors in [2] reported 107 Data Science academic programs in the United States. Although DS programs can be introduced as graduate, undergraduate, certificate and minor, they only considered the regular programs at B.S., M.S., and Ph.D. levels with 49, 45 and 13, respectively. The 49 B.S. programs consists of 25 programs from Doctoral Universities, 16 programs from Master’s Colleges, and 9 programs from Baccalaureate Colleges. They also found the distribution of the hosting departments of 49 B.S. programs. The standout departments were Computer Science and Data Science with 20 and 8, or 40% and 16%, respectively.
In the research [6], the 101 B.S. and B.A. degrees in Data Science were reviewed. They showed that almost all of the DS programs included introductory statistics, and more than 90% of the programs included introductory computer programming.
In addition to reviews of the distribution of the hosting departments or subject domains on DS programs, research [7] quantified the relative amount of coursework in three categories: computer science, statistics/mathematics, and domain knowledge. They investigated 16 Data Science undergraduate programs from Doctoral Universities in the U.S. For the quantitative evaluation of DS programs, they adopted an explicit framework describing the components of Data Science education; the framework presented in the National Academies of Sciences, Engineering & Medicine (NASEM) report [4]. For the framework, they developed a rubric and coded the undergraduate Data Science curricula on a four-point scale using direct survey methodology indicating the familiarity with a topic that could be expected from a student graduating from the program. As the results, training in Computational Foundations was one of the highest scoring areas. Most programs also scored well in Statistics and Mathematics training.
Because the DS programs are multidisciplinary many curricula have balanced composition in the disciplines of Computer Science, Mathematics and Statistics. In recent years, however, independent Data Science-related departments have rapidly emerged to offer Data Science degree programs. This paper analyzes how the curriculum differs for independent Data Science departments.
III. ANALYSIS OF CURRICULA IN DATA SCIENCE PROGRAMS
For the curricula analysis, we focused on the 40 campus-based B.S. programs in ‘Data Science Colleges and Universities [9]’. Like the research [2] we chose the regular B.S. programs in Data Science. And unlike the research [7], we investigated all Data Science undergraduate programs from Doctoral Universities, Master’s Colleges and Baccalaureate Colleges in the U.S. See Appendix for details.
The distribution of the hosting departments is shown in Fig. 1. The 40 B.S. programs were offered from 11 Computer Science, 8 Data Science, 6 Mathematics, and 5 Statistics departments. Similar to research [2], the prominent departments were Computer Science and Data Science, accounting for 27.5% and 20%, respectively.
For the analysis of courses in DS program, 870 course titles were collected from 40 programs’ course credits of B.S. degrees. In many cases, course titles vary from program to program. For example, courses related to ‘Programming’ have different titles such as ‘Introduction to Programing’, ‘Applied Programming’, ‘Programming Methods’, etc. These courses renamed to ‘Programming’. In order to preserve the original meaning of the courses, minimal changes were made.
Table 1 shows 17 popular courses which appear more than 10 times in 870 cases.
In order to define subject area for analysis of DS curricula, we used the guidelines presented in the NASEM report [4] as research [7]. The NASEM guidelines lists 10 key concept areas for describing the components of Data Science education. As shown in Table 2. we made up the 5 subject areas (or disciplines) from the 10 concept areas.
To find the distribution, all courses were grouped in five subject areas in Table 2. Then, a Cross-Tab Analysis was performed using two variables, ‘Department’ and ‘Area’. Fig. 2 represents that all departments have the highest number of courses in CS area and the second is STAT except for ‘Mathematics’ department.
The ACM Taskforce on Data Science Education [8] states that undergraduate programs should strengthen students’ computing skills because industry employers require more computing skills than statistical or mathematical skills. The result of the analysis show that the surveyed programs are meeting this need.
All courses in DS programs are of varying importance depending on ‘required’ or ‘elective’. Moreover, among ‘elective’ courses, some of them can be chosen from two courses, others from three or more courses. For example, in the required courses of one DS program, if three or more courses appear after a sentence such as “Take one of the following courses:”, these courses can be chosen from three or more courses. The necessity of a course can indicate its importance in DS programs. So, we give a ‘weight’ to each course according to its importance. In the weight range, ‘3’ means ‘required’, ‘2’ means ‘elective’ from two, and ‘1’ means ‘elective’ from three or more.
After assigning weights to each course, we compared how the weight average varies by hosting department. As shown in Fig. 3, courses in the department of Data Science have the highest weight average. This shows that Data Science departments have more compulsory courses than other departments. It can be thought that more effort is needed in the new Data Science department than in the traditional department to get a B.S. degree in Data Science.
To determine the importance of subject areas, i.e. disciplines in Table 2, we compared how weight averages varied across five disciplines. As shown in Fig. 4, courses in DS area have the highest weight average. This means that across all departments, there are many required courses in DS area rather than in the traditional area such as CS, MATH, or STAT. Contrary to research [7] showing that Computational Foundations was one of the highest scoring areas, we found that Data Science was the highest scoring area with course weight indicating its necessity.
IV. CONCLUSION
Due to the expansion of the demand, growing number of U.S. universities are offering undergraduate programs in Data Science. In this paper we analyzed the characteristics of the curricula for Data Science B.S. programs in the U.S. We investigated the characteristics according to the hosting department of DS program. Also, we analyzed how much emphasis is placed on courses belong to Data Science area. For quantitative evaluations, we used course weight to indicate its necessity.
First, as shown in Fig. 2, all departments have the highest percentage of CS area which means that the largest number of CS courses are being offered in 40 programs. This result shows that the surveyed programs are meeting the need from the ACM Taskforce on Data Science Education [8]. The need was that undergraduate programs should strengthen students’ computing skills.
Second, when weight were given according to the necessity of each course and then the averages were compared by department, the weight average of the Data Science department was found to be the highest as shown in Figure 3. Although the number is smaller than that of Computer Science as shown in Fig. 1, the Data Science department is a newly established department that fits the Data Science discipline and growing fast. Compared to the existing departments offering DS programs, these new departments are operating a larger scale of required courses to nurture talents suitable for their purpose.
Third, when course weights were compared by disciplines, the weight average of DS area was found to be the highest as shown in Fig. 4. This shows that although many MATH-related courses are placed in foundation courses, the required courses are more in DS area. And unlike to research [7], Data Science was the highest scoring area with course weight indicating course necessity. From the result it can be suggested that courses belong to Data Science area, such as data management, data visualization, and data modeling should be treated with great importance in the Data Science B.S. curriculum.
The limitation of this curriculum analysis is that it did not include all courses of Data Science programs. Capstone courses required in most programs were excluded, and courses related to Data Science application fields recommended as electives were also excluded from the analysis. Case analysis studies that include a wider range of courses are needed in the future.