Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New practice exercise - Perceptron #678

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3c62326
New practice exercise - Perceptron
depial Oct 3, 2023
7b291e5
Merge remote-tracking branch 'origin/main' into depial-main
cmcaine Jan 30, 2024
053cb63
Update runtests.jl
depial Jan 31, 2024
c821340
Update runtests.jl
depial Jan 31, 2024
a4bc759
Update runtests.jl
depial Jan 31, 2024
185e88c
Delete exercises/practice/perceptron/testtools.jl
depial Jan 31, 2024
475efe3
Update config.json
depial Feb 4, 2024
4a24b2c
Update tests.toml
depial Feb 4, 2024
e4760e2
Update instructions.md
depial Feb 4, 2024
39ba0fa
Update instructions.md
depial Feb 16, 2024
735743f
Update instructions.md
depial Feb 16, 2024
121e5fa
Update exercises/practice/perceptron/perceptron.jl
depial Feb 16, 2024
dd96c6c
Update exercises/practice/perceptron/.meta/example.jl
depial Feb 16, 2024
f7d054f
Merge branch 'exercism:main' into main
depial Feb 17, 2024
7bcaf89
Update example.jl
depial Feb 17, 2024
4ac9805
Update config.json
depial Feb 17, 2024
c8d226b
Update example.jl
depial Feb 22, 2024
fe7eb26
Update runtests.jl
depial Feb 22, 2024
bed28be
Update tests.toml
depial Feb 22, 2024
9d9ce4b
Update config.json
depial Feb 22, 2024
76f21f2
Create introduction.md
depial Feb 22, 2024
ff9ce68
Update instructions.md
depial Feb 22, 2024
9c8193d
Update runtests.jl
depial Feb 23, 2024
5ac2eec
Update runtests.jl
depial Feb 23, 2024
7ec87e8
Update runtests.jl
depial Feb 23, 2024
0af5de7
Merge branch 'exercism:main' into main
depial Mar 12, 2024
233f618
Adding new practice exercise Binary Search Tree
depial Mar 12, 2024
f444359
config changes
depial Mar 12, 2024
d963209
Merge branch 'main' of https://github.com/depial/julia
depial Mar 12, 2024
c947630
Update config.json
depial Mar 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,21 @@
"practices": [],
"prerequisites": [],
"difficulty": 2
},
{
"uuid": "b43a938a-7bd2-4fe4-b16c-731e2e25e747",
"practices": [],
"prerequisites": [],
"slug": "perceptron",
"name": "Perceptron",
"difficulty": 3,
"topics": [
"machine learning",
"loops",
"arrays",
"logic",
"math"
]
}
]
},
Expand Down
50 changes: 50 additions & 0 deletions exercises/practice/perceptron/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Instructions

### Introduction
[Perceptron](https://en.wikipedia.org/wiki/Perceptron) is one of the oldest and bestestly named machine learning algorithms out there. Since it is also quite simple to implement, it's a favorite place to start a machine learning journey. Perceptron is what is known as a linear classifier, which means that, if we have two labled classes of objects, for example in 2D space, it will search for a line that can be drawn to separate them. If such a line exists, Perceptron is guaranteed to find one. See Perceptron in action separating black and white dots below!

<p align="center">
<a title="Miquel Perelló Nieto, CC BY 4.0 &lt;https://creativecommons.org/licenses/by/4.0&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Perceptron_training_without_bias.gif"><img width="512" alt="Perceptron training without bias" src="https://upload.wikimedia.org/wikipedia/commons/a/aa/Perceptron_training_without_bias.gif"></a>
</p>

### Details
The basic idea is fairly straightforward. As illustrated above, we cycle through the objects and check if they are on the correct side of our guess at a line. If one is not, we make a correction and continue checking the objects against the corrected line. Eventually the line is adjusted to correctly separate all the objects and we have what is called a decision boundary!

Why is this of any use? The decision boundary found can then help us in predicting what a new, unlabeled, object would likely be classified as by seeing which side of the boundary it is on.

#### A Brief Word on Hyperplanes
What we have been calling a line in 2D can be generalized to something called a [hyperplane](https://en.wikipedia.org/wiki/Hyperplane), which is a convenient representation, and, if you follow the classic Perceptron algorithm, you will have to pick an initial hyperplane to start with. How do you pick your starting hyperplane? It's up to you! Be creative! Or not... Actually perceptron's convergence times are sensitive to conditions such as the initial hyperplane and even the order the objects are looped through, so you might not want to go too wild.

We will be playing in a two dimensional space, so our separating hyperplane will simply be a 1D line. You might remember the standard equation for a line as $y = ax+b$, where $a,b \in \Re$, however, in order to help generalize the idea to higher dimensions, it's convenient to reformulate this equation as $w_0 + w_1x + w_2y = 0$. This is the form of the hyperplane we will be using, so your output should be $[w_0, w_1, w_2]$. In machine learning, ${w_0,w_1,w_2}$ are usually referred to as weights.

Scaling a hyperplane by a positive value gives an equivalent hyperplane (e.g. $[w_0, w_1, w_2] = [\alpha \cdot w_0, \alpha \cdot w_1, \alpha \cdot w_2]$ with $\alpha > 0$). However, it should be noted that there is a difference between the normal vectors (the green arrow in illustration above) of hyperplanes scaled by a negative value (e.g. $[w_0, w_1, w_2]$ vs $[-w_0, -w_1, -w_2]$) in that the normal to the negative hyperplane points in opposite direction of that of the positive one. By convention, the perceptron normal points towards the class defined as positive, so this property will be checked but not result in a test failure.

#### Updating
Checking if an object is on one side of a hyperplane or another can be done by checking the normal vector which points to the object. The value will be positive, negative or zero, so all of the objects from a class should have normal vectors with the same sign. A zero value means the object is on the hyperplane, which we don't want to allow since its ambiguous. Checking the sign of a normal to a hyperplane might sound like it could be complicated, but it's actually quite easy. Simply plug in the coordinates for the object into the equation for the hyperplane and check the sign of the result. For example, we can look at two objects $v_1,v_2$ in relation to the hyperplane $[w_0, w_1, w_2] = [1, 1, 1]$:

$$\large v_1$$

$$[x_1, y_1] = [2, 2]$$

$$w_0 + w_1 \cdot x_1 + w_2 \cdot y_1 = 1 + 1 \cdot 2 + 1 \cdot 2 = 5 > 0$$


$$\large v_2$$

$$[x_2,y_2]=[-2,-2]$$

$$w_0 + w_1 \cdot x_2 + w_2 \cdot y_2 = 1 + 1 \cdot (-2) + 1 \cdot (-2) = -3 < 0$$

If $v_1$ and $v_2$ have the labels $1$ and $-1$ (like we will be using), then the hyperplane $[1, 1, 1]$ is a valid decision boundary for them since the signs match.

Now that we know how to tell which side of the hyperplane an object lies on, we can look at how perceptron updates a hyperplane. If an object is on the correct side of the hyperplane, no update is performed on the weights. However, if we find an object on the wrong side, the update rule for the weights is:

$$[w_0', w_1', w_2'] = [w_0 + l_{class}, w_1 + x \cdot l_{class}, w_2 + y \cdot l_{class}]$$

Where $l_{class}=\pm 1$, according to the class of the object (i.e. its label), $x,y$ are the coordinates of the object, the $w_i$ are the weights of the hyperplane and the $w_i'$ are the weights of the updated hyperplane.

This update is repeated for each object in turn, and then the whole process repeated until there are no updates made to the hyperplane. All objects passing without an update means they have been successfully separated and you can return your decision boundary!

Notes:
- Although the perceptron algorithm is deterministic, a decision boundary depends on initialization and is not unique in general, so the tests accept any hyperplane which fully separates the objects.
- The tests here will only include linearly separable classes, so a decision boundary will always be possible (i.e. no need to worry about non-separable classes).
20 changes: 20 additions & 0 deletions exercises/practice/perceptron/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"authors": [
"depial"
],
"contributors": [
"cmcaine"
],
"files": {
"solution": [
"perceptron.jl"
],
"test": [
"runtests.jl"
],
"example": [
".meta/example.jl"
]
},
"blurb": "Write your own machine learning classifier"
}
8 changes: 8 additions & 0 deletions exercises/practice/perceptron/.meta/example.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
function perceptron(points, labels)
θ, pnts = [0, 0, 0], vcat.(1, points)
while true
θ_0 = θ
foreach(i -> labels[i]*θ'*pnts[i] ≤ 0 && (θ += labels[i]*pnts[i]), eachindex(pnts))
θ_0 == θ && return θ
end
end
14 changes: 14 additions & 0 deletions exercises/practice/perceptron/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[728853d3-24de-4855-a452-6520b67dec23]
description = "Initial set"

[ed5bf871-3923-47ca-8346-5d640f9069a0]
description = "Initial set w/ opposite labels"

[15a9860e-f9be-46b1-86b2-989bd878c8a5]
description = "Hyperplane cannot pass through origin"

[52ba77fc-8983-4429-91dc-e64b2f625484]
description = "Hyperplane nearly parallel with y-axis"

[3e758bbd-5f72-447d-999f-cfa60b27bc26]
description = "Increasing Populations"
3 changes: 3 additions & 0 deletions exercises/practice/perceptron/perceptron.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function perceptron(points, labels)
# Perceptronize!
end
86 changes: 86 additions & 0 deletions exercises/practice/perceptron/runtests.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
using Test, Random
include("perceptron.jl")

function runtestset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include some tests with manually specified input data of just a few points to make this more approachable and also as good practice (e.g. first few tests could be spaces with just 2-4 points placed manually).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure if I've understood, but there are four tests with (six) manually specified points to illustrate a couple of different possible orientations of a hyperplane (testset "Low population"). After that the 40 pseudorandomly generated tests begin (testset "Increasing Populations"). Was this what you meant?

Copy link
Contributor

@cmcaine cmcaine Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we could take some of the manually specified examples out of the runtestset() function and test them without the support functions, just so that it's less mystifying to a student reading the tests.

And maybe we should have the student write their own function for finding the computed label of a point?

e.g.

# Student must implement both `perceptron(points, labels) -> boundary (a vector of 3 weights)` and `classify(boundary, point)`

@testset "Boundary is a vector of 3 weights" begin
    boundary = perceptron([[0,0], [3, 3]], [1, -1])
    @test eltype(boundary) <: Real
    @test length(boundary) == 3
end

@testset "Originally provided points should be classified correctly" begin
    boundary = perceptron([[0,0], [3, 3]], [1, -1])
    @test classify(boundary, [0, 0]) == 1
    @test classify(boundary, [3, 3]) == -1
end

@testset "Given 3 labeled points, an unseen point is classified correctly" begin
    # Adding more points constrains the location of the boundary so that we can test
    # the classification of unseen points.
    boundary = perceptron([[0,0], [1, 0], [0, 1]], [-1, 1, 1])
    @test classify(boundary, [0, 0]) == -1
    @test classify(boundary, [2, 5]) == 1
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I can throw a couple in there like your first examples, but I've got a question about the classify:
Due to the wide range of possible decision boundaries returned by Perceptron, beyond carefully selected examples, I'm not sure if testing classification of unseen points is viable. Also, since the algo is effectively just the three parts of classify + update + repeat, couldn't requiring a separate classification function introduce an unnecessary constraint on the task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

I don't see why it would constrain the student or the task for us to ask them to provide a classify function. What do you mean by that?

I agree that testing with unseen points is a bit tricky, but it's not super hard given that we control the input data and it's not too bad for us to either only include hand-written tests or to do some geometry to ensure our unseen pointd will always be classified correctly by a valid linear boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the learning rate trivial?

Sorry, over-generalization from me there :) I guess I'm considering the basic (pedagogical) Perceptron, i.e. one provided with small, dense, separable populations, since, under separability, the learning rate doesn't affect either the final loss or the upper bound of the number of errors the algorithm can make. That said, I was wrong to say the initial hyperplane affects these, since it doesn't either. It just seems non-trivial to me because it can be returned.

I'll try to make the necessary changes to the PR today and/or tomorrow :)

Copy link
Contributor Author

@depial depial Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything you would add (e.g. subtyping) to my suggestion for a slug, or do you think I could just copy and paste?

Edit: In my initial suggestion for the slug, conversion to Float64 is enforced, but the exercise (as presented) could also be handled entirely with integers. Should I drop all references to Float64? Something like:

mutable struct Perceptron
    # instantiates Perceptron with a decision boundary 
    # this struct can remain unmodified
    dbound
    Perceptron() = new([0, 0, 0])
end

function fit!(model::Perceptron, points, labels)
    # updates the field dbound of model (model.dbound) and returns it as a valid decision boundary
    # your code here
end

function predict(model::Perceptron, points)
    # returns a vector of the predicted labels of points against the model's decision boundary
    # your code here
end

It might make it appear cleaner and less intimidating for students unfamiliar with the type system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the struct is really adding much, so I'd remove it. You can pick what you like, tho.

If I try the exercise and hate the struct then I might veto it, but not until then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include some tests with manually specified input data of just a few points to make this more approachable and also as good practice (e.g. first few tests could be spaces with just 2-4 points placed manually).

Agree with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this

Could you be more specific? There are already four tests with six manually specified points which check for different possible orientations of a decision boundary.

Beyond this, we've had an extensive conversation about using "unseen points". (TLDR: I believe testing unseen points to be potentially more detrimental to understanding than beneficial)

The other ideas for tests were of the type to see if the student is returning the correct object (vector of three real numbers), etc. Is there something else you were thinking?


@testset "Low population" begin
@testset "Initial set" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [1, 1]]
labels = [1, 1, -1, -1, 1, 1]
reference = [1, 2, 1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Initial set w/ opposite labels" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [1, 1]]
labels = [-1, -1, 1, 1, -1, -1]
reference = [-1, -2, -1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Hyperplane cannot pass through origin" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [-1, -1]]
labels = [1, 1, -1, -1, 1, 1]
reference = [-1, 3, 3]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Hyperplane nearly parallel with y-axis" begin
points = [[0, 50], [0, -50], [-2, 0], [1, 50], [1, -50], [2, 0]]
labels = [-1, -1, -1, 1, 1, 1]
reference = [2, 0, -1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
end

@testset "Increasing Populations" begin
for n in 10:50
points, labels, reference = population(n, 25)
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
end

end


function population(n, bound)
# Builds a population of n points with labels {1, -1} in area bound x bound around a reference hyperplane
# Returns linearly separable points, labels and reference hyperplane

vertical = !iszero(n % 10) #every tenth test has vertical reference hyperplane
x, y, b = rand(-bound:bound), rand(-bound:bound)*vertical, rand(-bound÷2:bound÷2)
y_intercept = -b ÷ (iszero(y) ? 1 : y)
points, labels, hyperplane = [], [], [b, x, y]
while n > 0
# points are centered on y-intercept, but not x-intercept so distributions can be lopsided
point = [rand(-bound:bound), y_intercept + rand(-bound:bound)]
label = point' * [x, y] + b
if !iszero(label)
push!(points, point)
push!(labels, sign(label))
n -= 1
end
end

points, labels, hyperplane
end

function dotest(points, labels, hyperplane, reference)
points = vcat.(1, points)
test = reduce(hcat, points)' * hyperplane .* labels
if all(>(0), test)
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nSeparated! And the normal points towards the positively labeled side\n")
return true
elseif all(<(0), test)
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nSeparated! But the normal points towards the negatively labeled side\n")
return true
else
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nThe sides are not properly separated...\n")
return false
end
end

Random.seed!(42) # set seed for deterministic test set
runtestset()
Loading