Python is a popular scripting language which includes the procedural, functional and object-oriented programming paradigms. It has become a popular programming language to teach to beginning programmers because of (i) its simple syntax, denoted in part by indentation, (ii) its dynamic typing, (iii) its built-in, high-level data structures and their useful pre-defined operations, and (iv) its many "helper" libraries aimed at accoomplishing practical tasks. Our small Python project will explore the use of the regular expressions library (re), the use of files for I/O, and the use of dictionaries.
We will be using the Spyder graphical IDE based on the Anaconda Python framework
for Python 3.5:
https://www.continuum.io/why-anaconda
The download page on this website offers versions of Python 3.5 for Windows,
Mac and Linux; please choose the Python 3.5 version. You may choose to develop
your Python code on any environent, but it will be graded in the Anaconda
Python 3.5 Spyder IDE.
The tasks you are asked to perform will involve source code obtained from various webpages including:
You can obtain the HTML source of the department's people page through commands in your browser to show the page source, which then can be saved to a file your own machine and accessed in Python through file I/O.
Using the Departmental of Computer Science top-level people webpage source code as input, you are to find each faculty member's contact information; that is, you need to identify the faculty member by full name, and then find his/her email, office, phone number, and personal website. Your Python code will print these out in alphabetical order of faculty last names, printing the faculty full name followed by the contact info mentioned above.
Your task is to find all the dates on the lecture page
of the class website and to count the number of dates found in the months January
through May 2016. Your output for this task should be simple; here is an example:
January: 4
February: 7
and so forth....
Your task is to find the last time each of the CS5314 class webpages was updated or changed. Your output, in lexicographical order by webpage name, should be a list of pairs: webpage name (i.e., the rightmost filename in the URL) and last updated time and date for that webpage. Notice that all the update sentences appear at the bottom of each webpage. Moreover, the times and dates listed follow a pattern describable by a regular expression.
Helpful websites for development: Several websites have been listed on piazza.com that can help in developing your Python codes. Pythex is a a place to test your Python regular expressions. Python tutor offers a visualization of the heap as you step through Python code.
Grading: The project is due on Weds, April 27th. The project will be worth 100 points; it is intended to be smaller in scale and time needed than the previous projects in Prolog or Scheme. Do not waste time on making fancy output for these tasks; however since we will be grading this project manually, make sure the output is clear and answers the task presented. The usual rules about lateness will apply. Submissions will be accepted for 24 hours after the posted deadline but they will be worth a maximum of 80 points (i.e., 20% of full credit for this project.)
Rubric: This assignment is worth a total of 100 points. The points will be awarded mostly on the basis of the correctness of your Python code for each task with 10 points reserved for comments. The comments should include the following: (i) 1-2 sentences about each of your regular expressions and what desired construct they match, (ii) a 1-2 line comment that explains the functionality of each Python function you write, and (iii) a comment to explain any parts of the assignment that your submission fails to solve. The correctness points will be divided thusly: Task 1 - 40 points, Task 2 - 20 points, and Task 3 - 30 points.