Stable Marriage Algorithm using Python

Aim

Write a matching algorithm to assign students to tutors using Python.

Background

I recently watched The Secret Rules of Modern Living: Algorithms which inspired me to think about the Stable Marriage Algorithm. This documentary covers much more complicated matching algorithms than I think about/build here and it is well worth watching. More about the Stable Marriage Algorithm here.

The example below is for a single hour block in which six students have requested help. There are 10 tutors available with varying qualifications that make them suitable for teaching at a certain skill level. In this example all students and tutors are skilled in Biology. I have simplified the approach so that the outcome is obvious. For example, Sam should be matched with Steve, George should be matched with Zed etc.

sma.py

# skill represents the required level of help 
students = [{"id": 1, "name": "Sam", "skill": 1, "topic": "Biology"},
            {"id": 2, "name": "George", "skill": 2, "topic": "Biology"},
            {"id": 3, "name": "Jannet", "skill": 3, "topic": "Biology"},
            {"id": 4, "name": "Jason", "skill": 4, "topic": "Biology"},
            {"id": 5, "name": "Alison", "skill": 5, "topic": "Biology"},
            {"id": 6, "name": "Tom", "skill": 6, "topic": "Biology"}]


# skill represents qualified level to teach
tutors = [{"id": 1, "name": "Trish", "skill": 10, "topic": "Biology"},
            {"id": 2, "name": "Callum", "skill": 9, "topic": "Biology"},
            {"id": 3, "name": "Sally", "skill": 8, "topic": "Biology"},
            {"id": 4, "name": "Emma", "skill": 7, "topic": "Biology"},
            {"id": 5, "name": "Keith", "skill": 6, "topic": "Biology"},
            {"id": 6, "name": "Ian", "skill": 5, "topic": "Biology"},
            {"id": 7, "name": "Kev", "skill": 4, "topic": "Biology"},
            {"id": 8, "name": "Casper", "skill": 3, "topic": "Biology"},
            {"id": 9, "name": "Zed", "skill": 2, "topic": "Biology"},
            {"id": 10, "name": "Steve", "skill": 1, "topic": "Biology"}]


def checkMatch(s, t):
    """
    This function is implemented in main() and 
    provides the primary method of ranking 
    student/tutor preferences.

    Preferences are ranked based on the returned 
    value here. High values closer to 3 are 
    considered better preferences. Improvements 
    to this alogorithm could be made by replacing 
    this function.

    args
    ----
      s: dictionary
         - contains:
           - name (string) 
           - skill (integer)  
           - topic (string)

      t: dictionary
         - contains:
           - name
           - skill
           - topic


    returns
    -------
    float from 0 to 3.0


    example
    -------
    student = {"id": 1, 
               "name": "Sam", 
               "skill": 6, 
               "topic": "Biology"}

    tutor = {"id": 10, 
               "name": "Sara", 
               "skill": 7, 
               "topic": "Biology"}

    checkMatch(student, tutor)
    >> 2.857142857142857


    working
    -------
    value = 0
    topic = 0 + 1 (both in Biology)
    skill = 1 + 1 (student skill less than tutor)
    bonus = 2 + (6/7) 
    total = 2.857

    """
    value = 0
    if s["topic"] == t["topic"]:
        value += 1
        if s["skill"] <= t["skill"]:
            value += 1
            bonus = s["skill"]/t["skill"]
            value += bonus
    else:
       pass
    return value     



def main(studentData, tutorData):
    """
    Returns a list of lists containing preferences.

    args
    ----
      studentData: list of dictionaries
         - Each dictionary contains:
           - id (integer)
           - name (string) 
           - skill (integer)  
           - topic (string)

      tutorData: list of dictionaries
         - Each dictionary contains:
           - id (integer)
           - name (string)
           - skill (integer)
           - topic (string)

    Notes
    -----
      * Dictionary keys are hard wired (e.g. "name")
      * Not widely tested
      * Assumes len(studentData) == len(tutorData)
      * Assumes that students and tutors prefer to 
        have a close match in skill requests (students) 
        and skill qualifications (tutors)


    """
    matches = []
    studentList = [s["name"] for s in studentData]
    while len(studentList) > 0:
        # loop through student list
        # student list is derived from a list of dicts
        for s in studentList:
            logger.info("--- Starting on {} ---".format(s))
            logger.info("Student List contains {}".format(
                studentList))
            # create an empty preference list
            # this will hold preference scores for every tutor
            # for each student.
            prefs = list()
            # get data for given student name
            # pull the first object because it is an array 
            # containing only one object.
            s = [st for st in studentData if st["name"] == s][0]
            # loop through tutors
            for t in tutorData:
                # calculate match score for student/tutor combo
                # match score weights skill requirements of 
                # the student with skill of the tutor.
                # note that this means that a student of skill 5
                # is better fit with a tutor of skill 5 than 
                # a tutor of skill 10 (the best).
                match = [s["name"], t["name"], checkMatch(s, t)]
                # add each match to the preference list.
                prefs.append(match)
            # sort preferences so that high scores are 
            # looked at first.
            prefs = sorted(prefs, 
                           key=lambda x: float(x[2]), 
                           reverse=True)
            # traverse preferences for each student. 
            for p in prefs:
                # pull the data from matches 
                sMatches = [sm[0] for sm in matches] # students
                tMatches = [tm[1] for tm in matches] # tutors
                # ask if the current s/t assignment has already
                # made its way to the matches list 
                # note that the matches list is acting like
                # a results set. When this process ends the 
                # data in matches is what will be looked at.
                sInMatches = p[0] in sMatches
                tInMatches = p[1] in tMatches
                # most of the work happens here in this 
                # if else block 
                if (sInMatches == False and tInMatches == False):
                    if (len(studentList) != 1):
                        # remove student from list...
                        # removing students from the studentList 
                        # is a primary component of this 
                        # procedure. When the studentList is 
                        # empty the while loop will end.
                        logger.info("Adding student/tutor combo: {}\n".format(p))
                        matches.append(p)
                        studentList.remove(s["name"]) 
                        break
                    else:
                        # If the code gets to this point then it 
                        # means that there is only one match 
                        # possible. Remaining student must go 
                        # with remaining tutor. There are no 
                        # better options.
                        logger.info("Final match: {}\n".format(p))
                        matches.append(p)
                        studentList.remove(s["name"]) 
                        break
                else:
                    # if the tutor is in the matches dataset
                    # then action is required for 1) the student 
                    # they are currently paired with as well as
                    # 2) reassigning the tutor to a student with 
                    # a better fit.
                    if tInMatches:
                        ind = tMatches.index(p[1])
                        mat = matches[ind]
                        # determine if student is better fit
                        if (mat[2] >= p[2]):
                            # we can simply skip first preferences 
                            # if the score for this student tutor 
                            # combo is no better than that already 
                            # provided in the schedule.
                            logger.info(
                              "skipping tutor: {} has better match with {}".format(
                              p[1], mat[0]))
                            continue
                        else:
                            # for all other occasions we are going 
                            # to remove the tutor from the results 
                            # set and put the student they were 
                            # assigned back into the studentList for 
                            # reassignment. The matches dataset is 
                            # updated.
                            try:
                                logger.info("Removing {} from matches".format(mat))
                                matches.remove(mat)
                                logger.info("Putting {} back in studentList".format(mat[0]))
                                studentList.append(mat[0])
                                logger.info("Removing {} from studentList".format(p[0]))
                                studentList.remove(p[0])
                                logger.info("Adding {} to matches\n".format(p))
                                matches.append(p)
                                break
                            except:
                                logger.critical("Failed to update")
                                break
    # return matches
    return matches                 



# libs
import random
import pandas as pd
import logging

# logging
logging.basicConfig(level=logging.CRITICAL)
logger = logging.getLogger(__name__)
# Uncomment the following line to turn 
# on INFO messages. It is recommended that 
# you change range(x) to range(1) if reviewing 
# this process. 
#logger.setLevel("INFO")
#logger.setLevel("CRITICAL")

# iterate
runningSum = 0
runningOut = []
for i in range(50):
    tutorid = random.sample(
        range(1, len(tutors)+1),len(students))
    tutorData = [] 
    # Randomly select tutors
    # assumes there are more 
    # tutors than students.
    for tid in tutorid:
       for item in tutors:
           if item["id"] == tid:
               tutorData.append(item)
    # match
    ma = main(students, tutorData)
    # quatify suitability
    sa = sum([x[2] for x in ma])
    # save matches if they are improved
    if (sa > runningSum): 
        runningSum = sa
        runningOut = ma


# create dataframe
print(pd.DataFrame(runningOut))

Output (n=1 iteration)

INFO:__main__:--- Starting on Sam ---
INFO:__main__:Student List contains ['Sam', 'George', 'Jannet', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Sam', 'Steve', 3.0]

INFO:__main__:--- Starting on Jannet ---
INFO:__main__:Student List contains ['George', 'Jannet', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Jannet', 'Kev', 2.75]

INFO:__main__:--- Starting on Alison ---
INFO:__main__:Student List contains ['George', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Alison', 'Ian', 3.0]

INFO:__main__:--- Starting on George ---
INFO:__main__:Student List contains ['George', 'Jason', 'Tom']
INFO:__main__:skipping tutor: Kev has better match with Jannet
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:Adding student/tutor combo: ['George', 'Keith', 2.3333333333333335]

INFO:__main__:--- Starting on Tom ---
INFO:__main__:Student List contains ['Jason', 'Tom']
INFO:__main__:Removing ['George', 'Keith', 2.3333333333333335] from matches
INFO:__main__:Putting George back in studentList
INFO:__main__:Removing Tom from studentList
INFO:__main__:Adding ['Tom', 'Keith', 3.0] to matches

INFO:__main__:--- Starting on Jason ---
INFO:__main__:Student List contains ['Jason', 'George']
INFO:__main__:Removing ['Jannet', 'Kev', 2.75] from matches
INFO:__main__:Putting Jannet back in studentList
INFO:__main__:Removing Jason from studentList
INFO:__main__:Adding ['Jason', 'Kev', 3.0] to matches

INFO:__main__:--- Starting on Jannet ---
INFO:__main__:Student List contains ['George', 'Jannet']
INFO:__main__:skipping tutor: Kev has better match with Jason
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:skipping tutor: Keith has better match with Tom
INFO:__main__:Adding student/tutor combo: ['Jannet', 'Emma', 2.4285714285714284]

INFO:__main__:--- Starting on George ---
INFO:__main__:Student List contains ['George']
INFO:__main__:skipping tutor: Kev has better match with Jason
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:skipping tutor: Keith has better match with Tom
INFO:__main__:skipping tutor: Emma has better match with Jannet
INFO:__main__:Final match: ['George', 'Trish', 2.2]

>>> 
>>> # create dataframe
... print(pd.DataFrame(runningOut))
        0      1         2
0     Sam  Steve  3.000000
1  Alison    Ian  3.000000
2     Tom  Keith  3.000000
3   Jason    Kev  3.000000
4  Jannet   Emma  2.428571
5  George  Trish  2.200000

Output (n=50 iterations)

print(pd.DataFrame(runningOut))

        0       1    2
0     Sam   Steve  3.0
1  Jannet  Casper  3.0
2  Alison     Ian  3.0
3  George     Zed  3.0
4     Tom   Keith  3.0
5   Jason     Kev  3.0

results matching ""

    No results matching ""