Course Syllabus
CSE 4334/5334: Data Mining
- Fall 2020 -
Instructor Information
Instructor: Chengkai Li
Office Number: ERB 628 (will not be used for this class)
Office Telephone Number: 817-272-0162 (will not be used for this class)
Email Address: cli@uta.edu
Faculty Profile: https://idir.uta.edu/cli.html
Office Hours: Mon/Wed 4-5pm. Office hours through Microsoft Teams
Teaching Assistant (TA) Information
TA: Mohammed Samiul Saeef
Email Address: mohammedsamiul.saeef@mavs.uta.edu
Office Hours: Tue/Fri 12-1pm. Office hours through Microsoft Teams
Course Information
Section Information
CSE4334-003, 5334-002, 5334-902
Time and Place of Class Meetings
Mon/Wed 2:30-3:50pm. Online synchronous lecturing through Microsoft Teams
The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person.
Description of Course Content
This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.
Student Learning Outcomes
A solid understanding of the basic concepts, principles, and techniques in data mining; an ability to analyze real-world applications, to model data mining problems, and to assess different solutions; an ability to design, implement, and evaluate data mining software.
Required Textbooks and Other Course Materials
- (Required) [TSKK] Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. Introduction to Data Mining, 2nd ed., Pearson, 2019. (Sample chapters at http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
- (Required for relevant chapters) [MRS] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press. 2008. (Free book at http://nlp.stanford.edu/IR-book/)
- (Reference) [LRU] Jure Leskovec, Anand Rajaraman and Jeff Ullman. Mining of Massive Datasets, 3rd ed., Cambridge University Press, 2020. (Free book at http://www.mmds.org/)
- (Reference) [HKP] Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, 3rd ed. (2nd edition is also fine), Morgan Kaufmann Publishers, June 2011.
- (Reference) Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, 1st ed., Springer, 2013. (Free book at http://faculty.marshall.usc.edu/gareth-james/ISL/)
- (Reference) I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 4th ed. 2016.
Descriptions of major assignments and examinations
- Pop quizzes (30%): best 6 out of 8 quizzes
- Programming Assignments (30%): must be done independently
- Midterm Exam (15%): October 14th, Wednesday, 2:30pm-3:50pm
- Final Exam (25%): December 14th, Monday, 2pm-4:30pm
Technology Requirements
The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person.
- Class team on Microsoft Teams: for lecturing, office hours, instant messaging; use code 46aqhj8 to join if you are not already in
- Online synchronous lecturing through Microsoft Teams: videos will be automatically made available to enrolled students shortly after each lecture
- Instructor’s Office hours through Microsoft Teams
TA’s Office hours through Microsoft Teams - Canvas: for video recordings of lectures, pop quizzes and exams, releasing and submitting programming assignments, questions/discussion forum, releasing grades
- We will use Respondus Monitor and Lockdown Browser to administer quizzes and exams. and webcam for quizzes and exams. Students will need a webcam, a microphone, and Internet access.
- Students can access tutorials on these tools by clicking on the “Get Started” Box on their Canvas Homepage.
Prerequisites
- For CSE 4334: CSE 3330 Database Systems I and IE 3301 Engineering Probability (or MATH 3313 Introduction to Probability) or consent of instructor.
- For CSE 5334: prerequisites for CSE5334: There is no official prerequisites. You should have sound CSE background from your Bachelor's program (e.g., programming, data structures and algorithms, discrete mathematics, basics of probabilities and statistics). If you don't have database course from anywhere, you are allowed to take the course, but please get the consent of the instructor. You also must get the consent of the instructor if you have CSE deficiency courses to take.
Grading Information
Grading
- Pop quizzes (30%): best 6 out of 8 quizzes
- Programming Assignments (30%): must be done independently
- Midterm Exam (15%): October 14th, Wednesday, 2:30pm-3:50pm
- Final Exam (25%): December 14th, Monday, 2pm-4:30pm
The final letter grades will be based on students' performance. There is no pre-defined cutoffs or distribution of grades. Undergraduate and graduate students are compared in separate groups.
Assignments and Deadlines
- All the assignments must be submitted through Canvas. We will NOT take hardcopy or email submission, unless the university verifies that Canvas was malfunctioning or unavailable. If you are not able to submit through Canvas due to its technical failure, you can email your assignment to us, together with a screenshot showing the technical failure. We will verify with the university.
- Everything is due by 11:59pm on the due date. The deadline is automatically managed by Canvas. You can still turn in assignment after the deadline. However, you automatically lose 5 points per hour after the due time, till you get 0. (Each individual assignment is 100 raw points.) We cannot waive the penalty, unless there was a case of illness or other substantial impediment beyond your control, with proof in documents.
Regrading
Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.
Grade Grievances
Any appeal of a grade in this course must follow the procedures and deadlines for grade-related grievances as published in the current University Catalog.
Announcements
Stay tuned and make sure to check Canvas on a daily basis. Important announcements will be posted there.
Course Schedule
As the instructor for this course, I reserve the right to adjust this schedule in any way that serves the educational needs of the students enrolled in this course. –Chengkai Li
Date |
# |
Lecture |
Lecture Notes |
Required Reading |
Lecture Video |
08/26 |
1 |
Course Overview |
[Stream] | ||
Multi-Dimensional Data Analytics |
|||||
08/31 |
2 |
Multi-Dimensional Data Analytics: OLAP, data cube |
[PDF] | OLAP paper and data cube paper (or you can read HKP Chapter 4 and Chapter 5) | [Stream] |
09/02 |
3 |
Multi-Dimensional Data Analytics: OLAP, data cube |
[Stream] | ||
09/07 |
Labor Day Holiday |
||||
09/09 |
4 |
Multi-Dimensional Data Analytics: skyline |
[PDF] | [Colab] [Dataset] |
|
09/14 |
5 |
Multi-Dimensional Data Analytics: skyline |
[Stream] | ||
09/16 |
6 |
Multi-Dimensional Data Analytics: skyline |
[Stream] | ||
Overview of IDIR Research |
|||||
09/21 |
7 |
Fact Finding |
[PDF] | [Stream] | |
09/23 |
8 |
Fact Finding |
[Stream] | ||
09/28 |
9 |
Fact Finding |
[Stream] | ||
09/30 |
10 |
Fact Finding |
[Stream] | ||
10/05 |
11 |
Fact Finding |
[Stream] | ||
Overview of Data Mining |
|||||
10/07 |
12 |
Data and Data Preprocessing |
[PDF] | TSK ch2 | [Stream] |
10/12 |
13 |
Data and Data Preprocessing |
[Stream] | ||
10/14 |
14 |
In-class midterm Exam (October 14th, Wednesday, 2:30pm-3:50pm) |
|||
10/19 |
15 |
Overview of Data Mining |
[PDF] | TSK ch1 | [Stream] |
Classification and Prediction |
|||||
10/21 |
16 |
Decision Tree | [PDF] |
[Colab] |
[Stream] |
10/26 |
17 |
Decision Tree |
[Stream] | ||
10/28 |
18 |
Decision Tree |
[Stream] | ||
11/02 |
19 |
Bayesian Classifiers |
[PDF] | TSK ch4 | [Stream] |
11/04 |
20 |
Bayesian Classifiers |
[Stream] | ||
11/06 |
Last day to drop class |
||||
11/09 |
21 |
Nearest Neighbor Classifiers |
[PDF] | TSK ch4 | [Stream] |
11/11 |
22 |
Evaluating Classification Models |
[PDF] | TSK ch3 | [Stream] |
11/16 |
23 |
Classification Accuracy Measures |
[PDF] | TSK ch4 | [Stream] |
11/18 |
24 |
Proximity Measures |
[PDF] | TSK ch2 | [Stream] |
Clustering |
|||||
11/23 |
25 |
Proximity Measures |
[Stream] | ||
11/25 |
No Classes Scheduled |
||||
11/30 |
26 |
Overview of Clustering |
[PDF] |
[Colab] [Dataset] |
[Stream] |
12/02 |
27 |
K-means |
[Stream] | ||
12/07 |
28 |
Hierarchical clustering |
[Stream] | ||
12/14 |
Final Exam (December 14th, Monday, 2pm-4:30pm) |
Institution Information
UTA students are encouraged to review the below institutional policies and informational sections and reach out to the specific office with any questions. To view this institutional information, please visit the Institutional Information page (https://resources.uta.edu/provost/course-related-info/institutional-policies.php) which includes the following policies among others:
- Drop Policy
- Disability Accommodations
- Title IX Policy
- Academic Integrity
- Student Feedback Survey
- Final Exam Schedule
Additional Information
Mandatory Face Covering Policy
All students and instructional staff are required to wear facial coverings while they are on campus, inside buildings and classrooms. Students that fail to comply with the facial covering requirement will be asked to leave the class session. If students need masks, they may obtain them at the Central Library, the E.H. Hereford University Center’s front desk or in their department. Students who refuse to wear a facial covering in class will be asked to leave the session by the instructor, and, if the student refuses to leave, they may be reported to UTA’s Office of Student Conduct.
Attendance
At The University of Texas at Arlington, taking attendance is not required but attendance is a critical indicator of student success. Each faculty member is free to develop his or her own methods of evaluating students’ academic performance, which includes establishing course-specific policies on attendance. As the instructor of this section, I will take attendance sporadically. However, while UT Arlington does not require instructors to take attendance in their courses, the U.S. Department of Education requires that the University have a mechanism in place to mark when Federal Student Aid recipients “begin attendance in a course.” UT Arlington instructors will report when students begin attendance in a course as part of the final grading process. Specifically, when assigning a student a grade of F, faculty report must the last date a student attended their class based on evidence such as a test, participation in a class project or presentation, or an engagement online via Canvas. This date is reported to the Department of Education for federal financial aid recipients.
This is a hybrid course that meets on every Monday/Wednesday. Students are expected to attend all sessions. The course modality is Hybrid for 4334-003 and 5334-002 and Online for 5334-902. The instructor intends to use online as much as possible. It is possible that all learning activities, quizzes, and exams will be online. However, if situation changes, the instructor may conduct exams, quizzes, and even lectures in person.
Emergency Exit Procedures
Should we experience an emergency event that requires evacuation of the building, students should exit the room and move toward the nearest exit. When exiting the building during an emergency, do not take an elevator but use the stairwells instead. Faculty members and instructional staff will assist students in selecting the safest route for evacuation and will make arrangements to assist individuals with disabilities.
Student Success Programs
UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring by appointment, drop-in tutoring, etutoring, supplemental instruction, mentoring (time management, study skills, etc.), success coaching, TRIO Student Support Services, and student success workshops. For additional information, please email resources@uta.edu, or view the Maverick Resources website.
The IDEAS Center (https://www.uta.edu/ideas/) (2nd Floor of Central Library) offers FREE tutoring and mentoring to all students with a focus on transfer students, sophomores, veterans and others undergoing a transition to UT Arlington. Students can drop in or check the schedule of available peer tutors at www.uta.edu/IDEAS, or call (817) 272-6593.
The English Writing Center (411LIBR)
The Writing Center offers FREE tutoring in 15-, 30-, 45-, and 60-minute face-to-face and online sessions to all UTA students on any phase of their UTA coursework. Register and make appointments online at the Writing Center (https://uta.mywconline.com). Classroom visits, workshops, and specialized services for graduate students and faculty are also available. Please see Writing Center: OWL for detailed information on all our programs and services.
The Library’s 2nd floor Academic Plaza (http://library.uta.edu/academic-plaza) offers students a central hub of support services, including IDEAS Center, University Advising Services, Transfer UTA and various college/school advising hours. Services are available during the library’s hours of operation.
Librarian to Contact
Each academic unit has access to Librarians by Academic Subject that can assist students with research projects, tutorials on plagiarism and citation references as well as support with databases and course reserves.
Emergency Phone Numbers
In case of an on-campus emergency, call the UT Arlington Police Department at 817-272-3003 (non-campus phone), 2-3003 (campus phone). You may also dial 911. Non-emergency number 817-272-3381
Library Information
Research or General Library Help
Ask for Help
- Academic Plaza Consultation Services (uta.edu/academic-plaza)
- Ask Us (uta.edu/)
- Research Coaches (http://libguides.uta.edu/researchcoach)
Resources
- Library Tutorials(uta.edu/how-to)
- Subject and Course Research Guides(uta.edu)
- Librarians by Subject (library.uta.edu/subject-librarians)
- A to Z List of Library Databases(uta.edu/az.php)
- Course Reserves (https://uta.summon.serialssolutions.com/#!/course_reserves)
- Study Room Reservations (uta.edu/)
#######
Course Summary:
Date | Details | Due |
---|---|---|