You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To start with the scraping, we're going to implement the RMP scraper for professors. We're going to be using their GraphQL API for this.
To start off, we need to get information on how their api works.
If we go to our schools page https://www.ratemyprofessors.com/school/881 and open the network tab in devtools, we can see what requests we're sending to get the professor data.
When scrolling down and clicking 'show more', we can see in the network tab that a request is being sent to https://www.ratemyprofessors.com/graphql. Clicking on the request and going to the Request tab in the new menu tells us what data we're sending to this endpoint
Here we can see that we're sending a query along with a variables object, so we'll need to replicate this for our scraper.
Also notice in the headers tab, we have a Authorization header that says Basic dGVzdDp0ZXN0. If we decode the dGVzdDp0ZXN0 using (https://www.base64decode.org/)[base64], we can see it says test:test, which will be the username:password we need to send the request with.
First, in our root directory of the repo, we'll make the following files: scraper/scrapers/rmp.py and scraper/main.py
In our rmp.py, we'll start by importing requests and having our constants at the top: RMP_URL = "https://www.ratemyprofessors.com/graphql"
AUTH_USERNAME = "test"
AUTH_PASSWORD = "test"
Then, we'll define a function called get_professors and store our query in a variable from our devtools.
Since we want all of the professors in one request, we can update this text in the first fragment to give us the first 10000 professors. teachers(query: $query, first: 10000, after: "") {
We'll also have a variables dictionary with the same structure as our variables.
We then want to send the post request passing in all our data
To start with the scraping, we're going to implement the RMP scraper for professors. We're going to be using their GraphQL API for this.
To start off, we need to get information on how their api works.
If we go to our schools page https://www.ratemyprofessors.com/school/881 and open the network tab in devtools, we can see what requests we're sending to get the professor data.
When scrolling down and clicking 'show more', we can see in the network tab that a request is being sent to
data:image/s3,"s3://crabby-images/9d2db/9d2db745dda38ad3a1d05f2821b5ca244f373591" alt="image"
https://www.ratemyprofessors.com/graphql
. Clicking on the request and going to the Request tab in the new menu tells us what data we're sending to this endpointHere we can see that we're sending a query along with a variables object, so we'll need to replicate this for our scraper.
Also notice in the headers tab, we have a
Authorization
header that saysBasic dGVzdDp0ZXN0
. If we decode thedGVzdDp0ZXN0
using (https://www.base64decode.org/)[base64], we can see it saystest:test
, which will be theusername:password
we need to send the request with.First, in our root directory of the repo, we'll make the following files:
scraper/scrapers/rmp.py
andscraper/main.py
In our
rmp.py
, we'll start by importing requests and having our constants at the top: RMP_URL = "https://www.ratemyprofessors.com/graphql"AUTH_USERNAME = "test"
AUTH_PASSWORD = "test"
Then, we'll define a function called get_professors and store our query in a variable from our devtools.
Since we want all of the professors in one request, we can update this text in the first fragment to give us the first 10000 professors.
teachers(query: $query, first: 10000, after: "") {
We'll also have a variables dictionary with the same structure as our variables.
We then want to send the post request passing in all our data
If everything went well, something like this should be printed to the console
The text was updated successfully, but these errors were encountered: