Skip to content

Latest commit

 

History

History
347 lines (268 loc) · 20.1 KB

README.md

File metadata and controls

347 lines (268 loc) · 20.1 KB

Data Science

Note: Maybe updated in the future with additional information

  • Course: CSC 405/605 - Data Science
  • Schedule: Monday and Wednesday 5:30 pm - 6:45 pm
  • Instructor: Dr. Kelechi M. Ikegwu
  • Location:  219 Petty

Class Discussions: https://discord.gg/N2rhnw7h3J

  • #welcome-and-rules channel contain rules for the channel
  • Class Announcements will be in #announcements
  • #resoucres will contain interesting or useful articals about data science
  • #class-dicussions channel for class discussions
  • #group-formation for forming groups
  • Use #off-topic channel for any topics related to data science
  • Office Hrs: Thursday 5:00 pm - 6:00 pm via Zoom only (email for appointment and Zoom link)

  • Email: kmikegwu@uncg.edu

Course Description

In a world with ever increasing data generated both by humans and machines alike, the field of computer science has seen a transition from computation-intensive solutions to data-intensive ones. Often in such a scenario, solutions to real-world problems can be derived/learned by analyzing disparate, complex and messy datasets using Data Science methods and approaches.

This course is highly interactive, and will explore the theories, techniques, and the tools necessary to gain insights from such datasets. Using a problem-based learning philosophy, students are expected to make use of such technologies to design data solutions that can process and analyze real-world datasets for a variety of scientific, social, and environmental challenges.

The core topics addressed by the course will be:

  • Programming with Data
  • Data Mining, Munging, Wrangling
  • Statistics, Analytics, Representation, Visualization
  • Introduction to Applied Machine-Learning

Prerequisites

CSC 339 (Programming Languages) OR Programming experience (Instructor Permission Required)

Textbooks

There is no required text for the course. Relevant Articles and publicly avaiable books will be available for download. Class slides will also be available for download.

Course Overview

This course is highly interactive and based on the problem-based learning philosophy; students are expected to make use of said technologies to design highly scalable systems that can process and analyze real-world datasets for a variety of scientific, social, and environmental challenges.

Course Topics and Schedule (Tentative)

  1. Introduction to Data Science: (Week 1)

    • Class Syllabus and Introduction
    • Class Project discussion and assignment
  2. Startup Tools and Programming (Weeks 2-3)

    • Programming
      1. Re/Introduction to Python
      2. IPython, IPython-Notebook
    • Data Science Reproducibility
      1. Setting up your Repository – Data, Code, and Documentation
      2. Using Version Control with Git
    • Final Project Discussions - Goals and Requirements
  3. Data Munging, Wrangling, Cleaning (Week 4-5)

    • Data Structures for Data Science
    • Data Manipulation
      1. Selection - Indexing
      2. Handling Missing Data
      3. Aggregation
      4. Descriptive Statistics
      5. Merging / Join
      6. Working with Date-Time
    • Project Review - Stage I
  4. Data and Statistics (Week 6-9)

    • Distributions
    • Point Estimates
    • Statistical Hypothesis Testing
    • Correlation
    • Distribution Estimators
      1. MoM, MLE, KDE
    • Project Review - Stage II
  5. Introduction to Applied Data Modeling: (Weeks 10-12)

    • Applied Machine Learning
    • Regression and Feature Selection
    • Bias versus Variance
    • Clustering and Dimensionality Reduction
    • Validation and Model Performance
    • Project Review - Stage III
  6. Data Visualization (Week 13-14)

    • Graph Generation
      1. Types of Graphs
      2. Customizing Plots
      3. Visualizing Errors
      4. Interactive / Dynamic Graphs
    • Visualization Best Practices
    • Project Review - Stage IV
  • Project Presentations: (Week 15 – Final’s Week)

Class slides and ipython notebooks will be available here.

Grading

Grade Max% to Min%
A 100% to 94%
A- < 94% to 90%
B+ < 90% to 87%
B < 87% to 84%
B- < 84% to 80%
C+ < 80% to 77%
C < 77% to 74%
C- < 74% to 70%
D+ < 70% to 67%
D < 67% to 64%
D- < 64% to 60%
F < 60% to 59%
  1. Class / Homework Assignments (4): 30%

    Four programming based in-class homework assignments will be given covering the utilization of the tools learned in class. Absolutely no collaboration on assignments. Students have to upload (Notebooks) individual assignments to GitHub. Listed below are the homework assignments for the class:

    1. TBD
    2. TBD
    3. TBD
    4. TBD
  2. Final Project: 70%

    The final project of the class will focus on the end to end development of an analytical model. The project will be split into four stages:

    • Stage I Data/Project Understanding,
    • Stage II Modeling,
    • Stage III Basic Machine Learning, and
    • Stage IV Visualization and Dashboard.

    This will be a team-based effort, where in first week of the course the students split into teams of 4-5 students. After completing each stage, the teams will have to give a short presentation (3-5 mins) and a report (1 page) of their progress with the project. The projects will be open-source and the teams will have to use GitHub as their code repository. Upon completion of the project the teams will present their software along with the results in form of a presentation (20 minutes).

    1. Each Stage of the Final Project has 17.5 points. They will be equally weighted for the project final score.
      1. Each stage has deliverables of:
        1. Report
        2. Code Jupyter/IPython Notebooks
        3. Presentation
      2. To get the full points in each stage you need to finish all of the deliverables.
    2. Graduate Students Only: For Stage IV, 80 percent of your points is from your project and 20 percent of your points is for the project report. Minimum 5 pages for single author, 8 for 2 authors, and 12 for 3 authors (figures and references included). Template:. Example: (Due: 04/20/2022)

Total: 100%

Deadlines

Note: Time of deadline is 11:59 PM

Category Sub-Category Deadline
Assignment * Github Setup 01/19/2022
* Assignment 1 01/30/2022
* Assignment 2 02/13/2022
* Assignment 3 03/06/2022
* Assignment 4 04/17/2022
Project Groups Formed 01/19/2022
* Stage I 02/20/2022
* Stage II 03/16/2022
* Stage III 04/06/2022
* Stage IV 04/27/2022

Submissions

  • Github Setup:

    • Create a private Github repository (under your own account) and call it --- CSC-405-605_Spring-2022_Assignment.
    • Send me and our TA access to the repository,
      • My username: ikegwukc
      • Our TA is: (TBD)
    • Create a folder within the repository /Assignment_1
      • Create two sub-folders /src and /data
      • Work on your assignment (under /src)
        • IPython notebook only(.ipynb). Python will not do (.py).
        • Comment your code appropriately in Markdown.
    • Enter the link to your assignment solution in the assignment text entry (on canvas) once you are done with your solution.
    • Your notebook should contain the output of your cells. If there is no output rendered we will not grade it.
    • No collaboration at all in assignments
  • Project:

    • Your code and documentation will reside in a project repository.
    • The structure of the repository should be maintained as such.
      • /src - code and notebooks
        • /team
          • /stage_X
        • /member
          • /{member_name}
            • /stage_X
      • /data - data folder for the repository
        • /stage_X
      • /utility - utility or scripts
      • /doc - documentation - project reports and presentations
      • README.md - Description of project, deliverables, team members (see Stage I for details)
      • all src files (notebooks) should use relative path.
    • Each project has separate deliverables - notebooks need to be updated into the repository for grading. We will grade the status of repository at the time of deadline.
    • Each team makes a recorded presentation of their project stage and uploads it to canvas. Top presentations will be discussed in class.
    • No collaboration on member tasks.

Communication

Discord channel for class discussions and team creation: https://discord.gg/N2rhnw7h3J. The channel should be used for discussing general questions related to assignments and projects. Use this channel to ask questions and find anwsers to already responded quesitons. If the question has been already answered in the channel, I will not be responding to emails. Emails are a one-to-one conversation which takes a lot of time hence the channel is there to broadcast information and have more community oriented discussion. Do not share code or screenshots of code in the channel. Email should be the last step and can be used to ask student specific questions.

Presentation Pointers

  • You are going to be reviewed on the following criterion:
    • Criterion 1 (C1): Organize/Create information/slides in a manner appropriate for the intended audience
    • Criterion 2 (C2): Deliver information in a manner appropriate for the intended audience
    • Criterion 3 (C3): Relate to the intended audience
  • For each criterion the evaluations/scoring are based on (higher the better):
    • 4 - Exceeds Criteria: Excellent organization; information is well organized. Clear introduction; main points well stated and argued, with smooth transition to next point. Clear summary and conclusion.
    • 3 - Meets Criteria: Satisfactory organization; clear introduction; main points are well stated; some transitions are somewhat sudden. Clear conclusion.
    • 2 - Progressing to Criteria: Information is somewhat organized. Audience may have difficulty following presentation in areas.
    • 1 - Below Expectations: Presentation is unorganized. Introduction unclear. Audience has difficulty following presentation. Presentation contains abrupt jumps; some of the main points and conclusion are unclear.

Project Teams:

Note: Teams along with Team repostories will be listed here.

Academic Honesty Policy

The instructor will deal strictly with any violations of academic honesty and integrity in this course. See Academic Integrity Policy (Link). for more details. Absolutely no discussion, collaboration, copying, and sharing on assignments. This includes coping from the internet. Any student who violates this policy will receive “F” in the course. The instructor will report the case to the university.

Attendance Policy

Attendance is required for all the class meetings. If you will be absent for any class it is your responsibility to catch up on class materials.

Special Needs and/or Disabilities

Students with disabilities should have documentation from the Office of Accessibility Resources & Services (Link). This documentation should be provided to the instructor for review. In the case of major provisions such as separate testing environment or test-readers, the student must make arrangements with Office of Accessibility Resources & Services so that suitable accommodations can be provided.

COVID Statement

As we return for spring 2022, all students, faculty, and staff are required to uphold UNCG’s culture of care by actively engaging in behaviors that limit the spread of COVID-19. These actions include, but are not limited to:

  • Following face-covering guidelines
  • Engaging in proper hand-washing hygiene
  • Self-monitoring for symptoms of COVID-19
  • Staying home when ill

Complying with directions from health care providers or public health officials to quarantine or isolate if ill or exposed to someone who is ill Completing a self-report when experiencing COVID-19 symptoms, testing positive for COVID-19, or being identified as a close contact of someone who has tested positive Staying informed about the University's policies and announcements via the COVID-19 website

Instructors will have seating charts for their classes. These are important for facilitating contact tracing should there be a confirmed case of COVID-19. Students must sit in their assigned seats at every class meeting. Students may move their chairs in class to facilitate group work, as long as instructors keep seating chart records. Students should not eat or drink during class time.

A limited number of disposable masks will be available in classrooms for students who have forgotten theirs. Face coverings are also available for purchase in the UNCG Campus Bookstore. Students who do not follow masking requirements will be asked to put on a face covering or leave the classroom to retrieve one and only return when they follow the basic standards of safety and care for the UNCG community. Once students have a face covering, they are permitted to re-enter a class already in progress. Repeated issues may result in conduct action. The course policies regarding attendance and academics remain in effect for partial or full absence from class due to lack of adherence with face covering and other requirements.

For instances where the Office of Accessibility Resources and Services (OARS) has granted accommodations regarding wearing face coverings, students should contact their instructors to develop appropriate alternatives to class participation and/or activities as needed. Instructors or the student may also contact OARS (336.334.5440) who, in consultation with Student Health services, will review requests for accommodations.

Super Useful Links :)

Will update as needed with useful links.

Jupyter Notebooks

Exploratory Data Analytics:

Feature Engineering:

Missing Value Analysis and Cleaning:

Pandas and Big Data:

Group Assignments

Group Number Name
0 1 Venkata sai phani raj kondapalli
1 1 Jaya Krishna mundru
2 1 Akhilesh Pathi
3 1 Harinath Sirigiri
4 1 Sardar Karan Singh
5 2 Kavya Manne
6 2 Rajitha Panchumarthi
7 2 Ramya panchumarthi
8 2 Soujanya Vemireddy
9 3 Suqoya Rhodes
10 3 Dillon Halbert
11 3 Japp Galang
12 3 Hayes, Priscilla M.
13 3 Zhu, Pengxu
14 4 Gunakar Reddy Panyala
15 4 Karthik Reddy Kanduri
16 4 Balram Krishna Kantipudi
17 4 Chakradhar Reddy Parne
18 4 Lakshmi Gayathri Kurri
19 5 Vishnu Vardhan Vankayalapati
20 5 Satya Sai Srimannarayana Sarma Bolloju
21 5 Rahul Sathya Gunti
22 5 Mahesh Krishna Reddy Vanga
23 5 Rahul Boga
24 6 Sri Lakshmi Jahnavi Mandalapu
25 6 Sai Manideep Chittiprolu
26 6 Sowmya Tella
27 6 Tejasai
28 6 Sai Venkatesh
29 7 PRANEETH ALURU
30 7 Akash Suresh
31 7 Nikhil Bolisetty
32 7 Apoorva Gnana Saraswathi Tangirala
33 7 Cheedu Venkat Narayan Reddy
34 8 Jagamoni, Sravya
35 8 Vijay Bodapati
36 8 Vineeth Reddy
37 8 Vadapally Ramyasree
38 8 Sai Nikhil kakkireni