Emphasis Area Overview
Graduates of the Master of Science in Data Science and Analytics who pursue the High Performance Computing (HPC) emphasis area will achieve the following educational objectives, in addition to the core program objectives while becoming immersed in Big Data computational ecosystems.
Courses
This course introduces the main concepts and techniques of data mining and information retrieval. It covers a variety of data mining topics and methods to extract hidden and predictive patterns from large data collections. Furthermore, theory and techniques for the modeling, indexing, and retrieval of relational, non-relational, textbased and multimedia databases is covered. Topics include introduction to data mining process, mining frequent patterns, and pattern analysis, as well as different information retrieval models and evaluation, query languages and operations, and indexing/searching methods.
3 Credit Hours
This course introduces students to cluster and cloud computing big data ecosystems. Topics include a survey of cloud computing platforms, architectures, and use-cases. Students will examine scaling data science techniques and algorithms using a variety of cluster and cloud paradigms, such as those built atop Hadoop (Map- Reduce) concepts, and cloud services, and others.
3 Credit Hours
This course will provide in-depth treatment of the evolution of high performance, parallel computing architectures and how these architectures and computational ecosystems support data science. We will cover topics such as: parallel algorithms for numerical processing, parallel data search, and other parallel computing algorithms which facilitate advanced analytics. To reinforce lecture topics, learning activities will be completed using parallel computing techniques for modern multicore and multi-node systems. Parallel algorithms will be investigated, selected, and then developed for various scientific data analytics problems. Programming projects will be completed using Python and R, leveraging various parallel and distributed computing infrastructure such as AWS Elastic Map Reduce and Google Big Query and various other parallel computing architectures. Students will research emerging parallel and scalable architectures for data analytics.
3 Credit Hours