Course Syllabus

CSE 4334/5334: Data Mining

- Fall 2020 -

 

Instructor Information

Instructor: Chengkai Li 

Office Number: ERB 628 (will not be used for this class)

Office Telephone Number: 817-272-0162 (will not be used for this class)

Email Address: cli@uta.edu

Faculty Profile: https://idir.uta.edu/cli.html

Office Hours: Mon/Wed 4-5pm. Office hours through Microsoft Teams

Teaching Assistant (TA) Information

TA: Mohammed Samiul Saeef

Email Address: mohammedsamiul.saeef@mavs.uta.edu

Office Hours: Tue/Fri 12-1pm. Office hours through Microsoft Teams

 

Course Information

Section Information

CSE4334-003, 5334-002, 5334-902

Time and Place of Class Meetings

Mon/Wed 2:30-3:50pm. Online synchronous lecturing through Microsoft Teams

The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person. 

Description of Course Content

This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

Student Learning Outcomes

A solid understanding of the basic concepts, principles, and techniques in data mining; an ability to analyze real-world applications, to model data mining problems, and to assess different solutions; an ability to design, implement, and evaluate data mining software.

Required Textbooks and Other Course Materials

  • (Required) [TSKK] Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. Introduction to Data Mining, 2nd ed., Pearson, 2019. (Sample chapters at http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
  • (Required for relevant chapters) [MRS] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press. 2008. (Free book at http://nlp.stanford.edu/IR-book/)
  • (Reference) [LRU] Jure Leskovec, Anand Rajaraman and Jeff Ullman. Mining of Massive Datasets, 3rd ed., Cambridge University Press, 2020. (Free book at http://www.mmds.org/)
  • (Reference) [HKP] Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, 3rd ed. (2nd edition is also fine), Morgan Kaufmann Publishers, June 2011.
  • (Reference) Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, 1st ed., Springer, 2013. (Free book at http://faculty.marshall.usc.edu/gareth-james/ISL/)
  • (Reference) I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 4th ed. 2016.

Descriptions of major assignments and examinations

  • Pop quizzes (30%): best 6 out of 8 quizzes
  • Programming Assignments (30%): must be done independently
  • Midterm Exam (15%): October 14th, Wednesday, 2:30pm-3:50pm
  • Final Exam (25%): December 14th, Monday, 2pm-4:30pm

Technology Requirements

The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person. 

Prerequisites

  • For CSE 4334: CSE 3330 Database Systems I and IE 3301 Engineering Probability (or MATH 3313 Introduction to Probability) or consent of instructor.
  • For CSE 5334: prerequisites for CSE5334: There is no official prerequisites. You should have sound CSE background from your Bachelor's program (e.g., programming, data structures and algorithms, discrete mathematics, basics of probabilities and statistics). If you don't have database course from anywhere, you are allowed to take the course, but please get the consent of the instructor. You also must get the consent of the instructor if you have CSE deficiency courses to take.

 

Grading Information

Grading

  •  Pop quizzes (30%): best 6 out of 8 quizzes
  •  Programming Assignments (30%): must be done independently
  •  Midterm Exam (15%): October 14th, Wednesday, 2:30pm-3:50pm
  •  Final Exam (25%): December 14th, Monday, 2pm-4:30pm

The final letter grades will be based on students' performance. There is no pre-defined cutoffs or distribution of grades. Undergraduate and graduate students are compared in separate groups.

Assignments and Deadlines

  • All the assignments must be submitted through Canvas. We will NOT take hardcopy or email submission, unless the university verifies that Canvas was malfunctioning or unavailable. If you are not able to submit through Canvas due to its technical failure, you can email your assignment to us, together with a screenshot showing the technical failure. We will verify with the university.   
  • Everything is due by 11:59pm on the due date. The deadline is automatically managed by Canvas. You can still turn in assignment after the deadline. However, you automatically lose 5 points per hour after the due time, till you get 0. (Each individual assignment is 100 raw points.) We cannot waive the penalty, unless there was a case of illness or other substantial impediment beyond your control, with proof in documents.

Regrading

Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.

Grade Grievances

Any appeal of a grade in this course must follow the procedures and deadlines for grade-related grievances as published in the current University Catalog.

Announcements

Stay tuned and make sure to check Canvas on a daily basis. Important announcements will be posted there.

 

Course Schedule

As the instructor for this course, I reserve the right to adjust this schedule in any way that serves the educational needs of the students enrolled in this course. –Chengkai Li

 

Date

#

Lecture

Lecture Notes

Required Reading

Lecture Video

08/26

1

Course Overview

[PDF]

[Stream]

Multi-Dimensional Data Analytics

08/31

2

Multi-Dimensional Data Analytics: OLAP, data cube

[PDF] OLAP paper and  data cube paper (or you can read HKP Chapter 4 and Chapter 5) [Stream]

09/02

3

Multi-Dimensional Data Analytics: OLAP, data cube

[Stream]

09/07

Labor Day Holiday

09/09

4

Multi-Dimensional Data Analytics: skyline

[PDF]   [Colab]
[Dataset]

[Stream: part1]

[Stream: part2]

09/14

5

Multi-Dimensional Data Analytics: skyline

[Stream]

09/16

6

Multi-Dimensional Data Analytics: skyline

[Stream]

Overview of IDIR Research

09/21

7

Fact Finding

[PDF [Stream]

09/23

8

Fact Finding

[Stream]

09/28

9

Fact Finding

[Stream]

09/30

10

Fact Finding

[Stream]

10/05

11

Fact Finding

[Stream]

Overview of Data Mining

10/07

12

Data and Data Preprocessing 

[PDF TSK ch2 [Stream]

10/12

13

Data and Data Preprocessing

[Stream]

10/14

14

In-class midterm Exam (October 14th, Wednesday, 2:30pm-3:50pm)

10/19

15

Overview of Data Mining

[PDF] TSK ch1 [Stream]

Classification and Prediction

10/21

16

Decision Tree [PDF]

TSK ch3

[Colab]

[P2 dataset]

[P2 test dataset]

[Stream]

10/26

17

Decision Tree

[Stream]

10/28

18

Decision Tree

[Stream]

11/02

19

Bayesian Classifiers

[PDF] TSK ch4 [Stream]

11/04

20

Bayesian Classifiers

[Stream]

11/06

Last day to drop class

11/09

21

Nearest Neighbor Classifiers

[PDF] TSK ch4 [Stream]

11/11

22

Evaluating Classification Models

[PDF] TSK ch3 [Stream]

11/16

23

Classification Accuracy Measures

[PDF] TSK ch4 [Stream]

11/18

24

Proximity Measures

[PDF] TSK ch2 [Stream]

Clustering

11/23

25

Proximity Measures

[Stream]

11/25

No Classes Scheduled

11/30

26

Overview of Clustering

[PDF]

TSK ch7

[Colab]

[Dataset]

[P3 test dataset]

[Stream]

12/02

27

K-means

[Stream]

12/07

28

Hierarchical clustering

[Stream]

12/14

Final Exam (December 14th, Monday, 2pm-4:30pm)

 

Institution Information

UTA students are encouraged to review the below institutional policies and informational sections and reach out to the specific office with any questions. To view this institutional information, please visit the Institutional Information page (https://resources.uta.edu/provost/course-related-info/institutional-policies.php) which includes the following policies among others:

  • Drop Policy
  • Disability Accommodations
  • Title IX Policy
  • Academic Integrity
  • Student Feedback Survey
  • Final Exam Schedule

 

Additional Information

Mandatory Face Covering Policy

All students and instructional staff are required to wear facial coverings while they are on campus, inside buildings and classrooms. Students that fail to comply with the facial covering requirement will be asked to leave the class session. If students need masks, they may obtain them at the Central Library, the E.H. Hereford University Center’s front desk or in their department.  Students who refuse to wear a facial covering in class will be asked to leave the session by the instructor, and, if the student refuses to leave, they may be reported to UTA’s Office of Student Conduct.

Attendance

At The University of Texas at Arlington, taking attendance is not required but attendance is a critical indicator of student success. Each faculty member is free to develop his or her own methods of evaluating students’ academic performance, which includes establishing course-specific policies on attendance. As the instructor of this section, I will take attendance sporadically. However, while UT Arlington does not require instructors to take attendance in their courses, the U.S. Department of Education requires that the University have a mechanism in place to mark when Federal Student Aid recipients “begin attendance in a course.” UT Arlington instructors will report when students begin attendance in a course as part of the final grading process. Specifically, when assigning a student a grade of F, faculty report must the last date a student attended their class based on evidence such as a test, participation in a class project or presentation, or an engagement online via Canvas. This date is reported to the Department of Education for federal financial aid recipients.

This is a hybrid course that meets on every Monday/Wednesday. Students are expected to attend all sessions. The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person. 

Emergency Exit Procedures

Should we experience an emergency event that requires evacuation of the building, students should exit the room and move toward the nearest exit. When exiting the building during an emergency, do not take an elevator but use the stairwells instead. Faculty members and instructional staff will assist students in selecting the safest route for evacuation and will make arrangements to assist individuals with disabilities.  

Student Success Programs

UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring by appointment, drop-in tutoring, etutoring, supplemental instruction, mentoring (time management, study skills, etc.), success coaching, TRIO Student Support Services, and student success workshops. For additional information, please email resources@uta.edu, or view the Maverick Resources website.

The IDEAS Center (https://www.uta.edu/ideas/) (2nd Floor of Central Library) offers FREE tutoring and mentoring to all students with a focus on transfer students, sophomores, veterans and others undergoing a transition to UT Arlington. Students can drop in or check the schedule of available peer tutors at www.uta.edu/IDEAS, or call (817) 272-6593.

The English Writing Center (411LIBR)

The Writing Center offers FREE tutoring in 15-, 30-, 45-, and 60-minute face-to-face and online sessions to all UTA students on any phase of their UTA coursework. Register and make appointments online at the Writing Center (https://uta.mywconline.com). Classroom visits, workshops, and specialized services for graduate students and faculty are also available. Please see Writing Center: OWL for detailed information on all our programs and services.

The Library’s 2nd floor Academic Plaza (http://library.uta.edu/academic-plaza) offers students a central hub of support services, including IDEAS Center, University Advising Services, Transfer UTA and various college/school advising hours. Services are available during the library’s hours of operation.

Librarian to Contact

Each academic unit has access to Librarians by Academic Subject that can assist students with research projects, tutorials on plagiarism and citation references as well as support with databases and course reserves.

 

Emergency Phone Numbers

In case of an on-campus emergency, call the UT Arlington Police Department at 817-272-3003 (non-campus phone), 2-3003 (campus phone). You may also dial 911. Non-emergency number 817-272-3381

 

Library Information

Research or General Library Help

Ask for Help

Resources

#######

Course Summary:

Date Details Due