Stable Marriage Algorithm using Python
Aim
Write a matching algorithm to assign students to tutors using Python.
Background
I recently watched The Secret Rules of Modern Living: Algorithms which inspired me to think about the Stable Marriage Algorithm. This documentary covers much more complicated matching algorithms than I think about/build here and it is well worth watching. More about the Stable Marriage Algorithm here.
The example below is for a single hour block in which six students have requested help. There are 10 tutors available with varying qualifications that make them suitable for teaching at a certain skill level. In this example all students and tutors are skilled in Biology. I have simplified the approach so that the outcome is obvious. For example, Sam should be matched with Steve, George should be matched with Zed etc.
sma.py
# skill represents the required level of help
students = [{"id": 1, "name": "Sam", "skill": 1, "topic": "Biology"},
{"id": 2, "name": "George", "skill": 2, "topic": "Biology"},
{"id": 3, "name": "Jannet", "skill": 3, "topic": "Biology"},
{"id": 4, "name": "Jason", "skill": 4, "topic": "Biology"},
{"id": 5, "name": "Alison", "skill": 5, "topic": "Biology"},
{"id": 6, "name": "Tom", "skill": 6, "topic": "Biology"}]
# skill represents qualified level to teach
tutors = [{"id": 1, "name": "Trish", "skill": 10, "topic": "Biology"},
{"id": 2, "name": "Callum", "skill": 9, "topic": "Biology"},
{"id": 3, "name": "Sally", "skill": 8, "topic": "Biology"},
{"id": 4, "name": "Emma", "skill": 7, "topic": "Biology"},
{"id": 5, "name": "Keith", "skill": 6, "topic": "Biology"},
{"id": 6, "name": "Ian", "skill": 5, "topic": "Biology"},
{"id": 7, "name": "Kev", "skill": 4, "topic": "Biology"},
{"id": 8, "name": "Casper", "skill": 3, "topic": "Biology"},
{"id": 9, "name": "Zed", "skill": 2, "topic": "Biology"},
{"id": 10, "name": "Steve", "skill": 1, "topic": "Biology"}]
def checkMatch(s, t):
"""
This function is implemented in main() and
provides the primary method of ranking
student/tutor preferences.
Preferences are ranked based on the returned
value here. High values closer to 3 are
considered better preferences. Improvements
to this alogorithm could be made by replacing
this function.
args
----
s: dictionary
- contains:
- name (string)
- skill (integer)
- topic (string)
t: dictionary
- contains:
- name
- skill
- topic
returns
-------
float from 0 to 3.0
example
-------
student = {"id": 1,
"name": "Sam",
"skill": 6,
"topic": "Biology"}
tutor = {"id": 10,
"name": "Sara",
"skill": 7,
"topic": "Biology"}
checkMatch(student, tutor)
>> 2.857142857142857
working
-------
value = 0
topic = 0 + 1 (both in Biology)
skill = 1 + 1 (student skill less than tutor)
bonus = 2 + (6/7)
total = 2.857
"""
value = 0
if s["topic"] == t["topic"]:
value += 1
if s["skill"] <= t["skill"]:
value += 1
bonus = s["skill"]/t["skill"]
value += bonus
else:
pass
return value
def main(studentData, tutorData):
"""
Returns a list of lists containing preferences.
args
----
studentData: list of dictionaries
- Each dictionary contains:
- id (integer)
- name (string)
- skill (integer)
- topic (string)
tutorData: list of dictionaries
- Each dictionary contains:
- id (integer)
- name (string)
- skill (integer)
- topic (string)
Notes
-----
* Dictionary keys are hard wired (e.g. "name")
* Not widely tested
* Assumes len(studentData) == len(tutorData)
* Assumes that students and tutors prefer to
have a close match in skill requests (students)
and skill qualifications (tutors)
"""
matches = []
studentList = [s["name"] for s in studentData]
while len(studentList) > 0:
# loop through student list
# student list is derived from a list of dicts
for s in studentList:
logger.info("--- Starting on {} ---".format(s))
logger.info("Student List contains {}".format(
studentList))
# create an empty preference list
# this will hold preference scores for every tutor
# for each student.
prefs = list()
# get data for given student name
# pull the first object because it is an array
# containing only one object.
s = [st for st in studentData if st["name"] == s][0]
# loop through tutors
for t in tutorData:
# calculate match score for student/tutor combo
# match score weights skill requirements of
# the student with skill of the tutor.
# note that this means that a student of skill 5
# is better fit with a tutor of skill 5 than
# a tutor of skill 10 (the best).
match = [s["name"], t["name"], checkMatch(s, t)]
# add each match to the preference list.
prefs.append(match)
# sort preferences so that high scores are
# looked at first.
prefs = sorted(prefs,
key=lambda x: float(x[2]),
reverse=True)
# traverse preferences for each student.
for p in prefs:
# pull the data from matches
sMatches = [sm[0] for sm in matches] # students
tMatches = [tm[1] for tm in matches] # tutors
# ask if the current s/t assignment has already
# made its way to the matches list
# note that the matches list is acting like
# a results set. When this process ends the
# data in matches is what will be looked at.
sInMatches = p[0] in sMatches
tInMatches = p[1] in tMatches
# most of the work happens here in this
# if else block
if (sInMatches == False and tInMatches == False):
if (len(studentList) != 1):
# remove student from list...
# removing students from the studentList
# is a primary component of this
# procedure. When the studentList is
# empty the while loop will end.
logger.info("Adding student/tutor combo: {}\n".format(p))
matches.append(p)
studentList.remove(s["name"])
break
else:
# If the code gets to this point then it
# means that there is only one match
# possible. Remaining student must go
# with remaining tutor. There are no
# better options.
logger.info("Final match: {}\n".format(p))
matches.append(p)
studentList.remove(s["name"])
break
else:
# if the tutor is in the matches dataset
# then action is required for 1) the student
# they are currently paired with as well as
# 2) reassigning the tutor to a student with
# a better fit.
if tInMatches:
ind = tMatches.index(p[1])
mat = matches[ind]
# determine if student is better fit
if (mat[2] >= p[2]):
# we can simply skip first preferences
# if the score for this student tutor
# combo is no better than that already
# provided in the schedule.
logger.info(
"skipping tutor: {} has better match with {}".format(
p[1], mat[0]))
continue
else:
# for all other occasions we are going
# to remove the tutor from the results
# set and put the student they were
# assigned back into the studentList for
# reassignment. The matches dataset is
# updated.
try:
logger.info("Removing {} from matches".format(mat))
matches.remove(mat)
logger.info("Putting {} back in studentList".format(mat[0]))
studentList.append(mat[0])
logger.info("Removing {} from studentList".format(p[0]))
studentList.remove(p[0])
logger.info("Adding {} to matches\n".format(p))
matches.append(p)
break
except:
logger.critical("Failed to update")
break
# return matches
return matches
# libs
import random
import pandas as pd
import logging
# logging
logging.basicConfig(level=logging.CRITICAL)
logger = logging.getLogger(__name__)
# Uncomment the following line to turn
# on INFO messages. It is recommended that
# you change range(x) to range(1) if reviewing
# this process.
#logger.setLevel("INFO")
#logger.setLevel("CRITICAL")
# iterate
runningSum = 0
runningOut = []
for i in range(50):
tutorid = random.sample(
range(1, len(tutors)+1),len(students))
tutorData = []
# Randomly select tutors
# assumes there are more
# tutors than students.
for tid in tutorid:
for item in tutors:
if item["id"] == tid:
tutorData.append(item)
# match
ma = main(students, tutorData)
# quatify suitability
sa = sum([x[2] for x in ma])
# save matches if they are improved
if (sa > runningSum):
runningSum = sa
runningOut = ma
# create dataframe
print(pd.DataFrame(runningOut))
Output (n=1 iteration)
INFO:__main__:--- Starting on Sam ---
INFO:__main__:Student List contains ['Sam', 'George', 'Jannet', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Sam', 'Steve', 3.0]
INFO:__main__:--- Starting on Jannet ---
INFO:__main__:Student List contains ['George', 'Jannet', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Jannet', 'Kev', 2.75]
INFO:__main__:--- Starting on Alison ---
INFO:__main__:Student List contains ['George', 'Jason', 'Alison', 'Tom']
INFO:__main__:Adding student/tutor combo: ['Alison', 'Ian', 3.0]
INFO:__main__:--- Starting on George ---
INFO:__main__:Student List contains ['George', 'Jason', 'Tom']
INFO:__main__:skipping tutor: Kev has better match with Jannet
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:Adding student/tutor combo: ['George', 'Keith', 2.3333333333333335]
INFO:__main__:--- Starting on Tom ---
INFO:__main__:Student List contains ['Jason', 'Tom']
INFO:__main__:Removing ['George', 'Keith', 2.3333333333333335] from matches
INFO:__main__:Putting George back in studentList
INFO:__main__:Removing Tom from studentList
INFO:__main__:Adding ['Tom', 'Keith', 3.0] to matches
INFO:__main__:--- Starting on Jason ---
INFO:__main__:Student List contains ['Jason', 'George']
INFO:__main__:Removing ['Jannet', 'Kev', 2.75] from matches
INFO:__main__:Putting Jannet back in studentList
INFO:__main__:Removing Jason from studentList
INFO:__main__:Adding ['Jason', 'Kev', 3.0] to matches
INFO:__main__:--- Starting on Jannet ---
INFO:__main__:Student List contains ['George', 'Jannet']
INFO:__main__:skipping tutor: Kev has better match with Jason
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:skipping tutor: Keith has better match with Tom
INFO:__main__:Adding student/tutor combo: ['Jannet', 'Emma', 2.4285714285714284]
INFO:__main__:--- Starting on George ---
INFO:__main__:Student List contains ['George']
INFO:__main__:skipping tutor: Kev has better match with Jason
INFO:__main__:skipping tutor: Ian has better match with Alison
INFO:__main__:skipping tutor: Keith has better match with Tom
INFO:__main__:skipping tutor: Emma has better match with Jannet
INFO:__main__:Final match: ['George', 'Trish', 2.2]
>>>
>>> # create dataframe
... print(pd.DataFrame(runningOut))
0 1 2
0 Sam Steve 3.000000
1 Alison Ian 3.000000
2 Tom Keith 3.000000
3 Jason Kev 3.000000
4 Jannet Emma 2.428571
5 George Trish 2.200000
Output (n=50 iterations)
print(pd.DataFrame(runningOut))
0 1 2
0 Sam Steve 3.0
1 Jannet Casper 3.0
2 Alison Ian 3.0
3 George Zed 3.0
4 Tom Keith 3.0
5 Jason Kev 3.0