| SDSC3001 - Big Data: The Arts and Science of Scaling | ||||||||||
| 
 | ||||||||||
| * The offering term is subject to change without prior notice | ||||||||||
| Course Aims | ||||||||||
| This course aims at teaching students how to tame massive data which are intensively used in high-impact industrial applications. Students will learn two mainstream categories of technical solutions for big data, namely algorithmic approaches and systems approaches. For algorithm approaches, some popular stream algorithms such as heavy hitters and sketching algorithms used when we have a limited memory will be introduced. To deal with huge amount of data, the instructor will also teach sampling-based algorithms, such as approximate counting, that tame big data via sampling a representative small collection of data. For the system approaches, the instructor will introduce Spark, one of the most popular big data computing software nowadays, to the students. Topics in Spark include the MapReduce model, Spark RDDs, DataFrames, DataSets, Spark SQL and Spark ML. | ||||||||||
| Assessment (Indicative only, please check the detailed course information) | ||||||||||
| Continuous Assessment: 70% | ||||||||||
| Examination: 30% | ||||||||||
| Examination Duration: 2 hours | ||||||||||
| Note: To pass the course, apart from obtaining a minimum of 40% in the overall mark, a student must also obtain a minimum mark of 30% in both continuous assessment and examination components. | ||||||||||
| Detailed Course Information | ||||||||||
| SDSC3001.pdf | ||||||||||