Seán O'Shea home page header About

My software knowledge

Advanced

I am advanced user of MS Excel, Sage 50 Accounts, Tas Books and CCH Accounts Production.

Intermediate

I am intermediate level user of Microsoft Word, Microsoft Powerpoint, Sage Accounts Production and Sage Micropay.

Learning

As part of the IS&IT degree, I have acheived competency in Python and I am also learning C, HTML and CSS. Independently I am currently learning VBA and powershell scripting.

Python

A question on the continuous assessment for the Programming Paradigms and Data Structures module was:

Write a program that reads in the file 102400.txt and stores it in an array of integers. The program should use this array to create smaller sub-arrays of size 6400, 12800, 25600, 51200, 102400 and find the average time for both quickSort and mergeSort to sort each i.e. each algorithm should run five times on each sub-array (loops should be used).

I set out below my code for this question.

"""
Dublin Institute of Technology
DT249 - stage 1 - 2017/18 - Programming Paradigms & Data Structures
Assignment 1 - Question 4
Code written by Seán O'Shea - March 2018

Brief:
Write a program that reads in the file 102400.txt and stores it in an
array of integers. The program should use this array to create smaller
sub-arrays of size 6400, 12800, 25600, 51200, 102400 and find the
average time for both quickSort and mergeSort to sort each i.e. each
algorithm should run five times on each sub-array (loops should be used).
"""

import time, sys
from num2words import num2words
from time import gmtime, strftime


# Definitions of algorithms functions, code by David Leonard, Lecturer
# See Lecture 4 slides - Sorting Algorithms, p 32, 43, 92 & 102.
def merge (a1, a2, a):
    i = j = 0
    while i + j < len(a):
        if j == len(a2) or (i < len(a1) and a1[i] < a2[j]):
            a[i+j] = a1[i]
            i += 1
        else:
            a[i+j] = a2[j]
            j += 1


def sort(a):
    n = len(a)
    if n < 2:
        return
    mid = n // 2
    a1 = a[0:mid]
    a2 = a[mid:n]
    sort(a1)
    sort(a2)
    merge(a1, a2, a)


def partition(a, lo, hi):
    i = (lo - 1)
    pivot = a[hi]
    for j in range(lo, hi):
        if a[j] <= pivot:
            i = i + 1
            a[i],a[j] = a[j],a[i]
    a[i+1],a[hi] = a[hi],a[i+1]
    return(i+1)


def quickSort(a, lo, hi):
    if lo < hi:
        pi = partition(a, lo, hi)
        quickSort(a, lo, pi-1)
        quickSort(a, pi+1, hi)


# Function to check if a string is a valid representation of an integer.
def reps_Ints(a):
    try:
        int(a)
        return True
    except ValueError:
        return False


# Print the time the program starts running.
print("Program started:", strftime("%A %d %B %Y %I:%M:%S%p", gmtime()))

# Verify that a filename has been passed to the program.
try:
    file_name = sys.argv[1]
except:
    file_name = input("Please specify a file located in the current directory "
                      "to be sorted: ")

# Open the specified file, check if it is a .txt file and composed of
# integers, and create a list of the integers in the file.
if file_name[-4:] == ".txt":
    f = open(file_name, "r")
    f_strings = f.read().splitlines()
    f.close()
    all_data = []
    if all((reps_Ints(string)) for string in f_strings):
        for string in f_strings:
            all_data.append(int(string))
        print(file_name, "is loaded into the program.")
    else:
        print(file_name, "contains non integer strings and will not be loaded "
                         "into the program.")
        print("The program is terminated.")
        sys.exit()
else:
    print(file_name, "is not a txt file and will not be loaded into the "
                     "program.")
    print("The program is terminated.")
    sys.exit()

print()

# Check if the list of integers has been created, if not exit the program.
if not all_data:
    print("No file was loaded. The program is terminated.")
    sys.exit()

# Within a nested list, create slices of the main array which form the
# sub-arrays to be sorted.
base_list = [all_data[0:6400], all_data[0:12800], all_data[0:25600],
            all_data[0:51200], all_data]

# Set up empty dictionaries to record each timing in a "raw" format in
# order to print raw data which can be copied to a spreadsheet.
copy_pasta_merge = {}
copy_pasta_quick = {}

# Loop through each array in 'baselist' and set up variables for total
# duration and count. The range for loop block clones and merge sorts
# each array five times by referring back to the unsorted array in
# 'baselist', records the timings, inserts the timings into the raw
# timings dictionary at the key corresponding to the length of the
# array, prints the time taken on each iteration, and prints the
# average duration over the five iterations. The following loop repeats
# these steps for the quickSort algorithm.
for sub_array in base_list:
    if not len(sub_array) in copy_pasta_merge:
        copy_pasta_merge[len(sub_array)] = []
    total_dur = 0
    count = 0
    print("Merge sort durations for an array of length", len(sub_array),
          "integers:")
    for i in range(5):
        array = sub_array[:]
        start = time.clock()
        sort(array)
        stop = time.clock()
        dur = stop - start
        total_dur = total_dur + dur
        count += 1
        for k, v in copy_pasta_merge.items():
            if len(array) == k:
                v.append(dur)
        print("The", num2words(count, to='ordinal'), "iteration takes",
              dur, "seconds.")
    avge_dur = total_dur / count
    print("The average duration using merge sort", num2words(count),
          "times for", len(array), "integers is", avge_dur, "seconds.")
    print()

print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print()

for sub_array in base_list:
    if not len(sub_array) in copy_pasta_quick:
        copy_pasta_quick[len(sub_array)] = []
    total_dur = 0
    count = 0
    print("Quick sort durations for an array of length", len(sub_array),
          "integers:")
    for i in range(5):
        array = sub_array[:]
        start = time.clock()
        quickSort(array, 0, len(array) - 1)
        stop = time.clock()
        dur = stop - start
        total_dur = total_dur + dur
        count += 1
        for k, v in copy_pasta_quick.items():
            if len(array) == k:
                v.append(dur)
        print("The", num2words(count, to='ordinal'), "iteration takes",
              dur, "seconds.")
    avge_dur = total_dur / count
    print("The average duration using quick sort", num2words(count),
          "times for", len(array), "integers is", avge_dur, "seconds.")
    print()

# Print statements for the raw timing data and program complete timestamp.
print("Merge sort timings data for use in a spreadsheet:")
for k, v in copy_pasta_merge.items():
    print(k, str(v).strip('[]').replace(',', ''))
print()
print("Quick sort timings data for use in a spreadsheet:")
for k, v in copy_pasta_quick.items():
    print(k, str(v).strip('[]').replace(',', ''))
print()
print("Program complete:", strftime("%A %d %B %Y %I:%M:%S%p", gmtime()))

Code is hidden on low resolution devices. Please visit this page on a desktop or tablet to inspect the code.