A Rust application that scrapes product data from the Alvord Polk website using product numbers stored in an Excel file.
- Reads product numbers from an Excel file.
- Sends HTTP requests to search for products on the website.
- Matches product numbers and retrieves detailed product information.
- Extracts product data and stores it in a
HashMap
. - Utilizes fake user agents to avoid detection while scraping.
This project uses the following Rust crates:
calamine
- For reading Excel files.reqwest
- For making HTTP requests.scraper
- For parsing HTML and extracting data.fake_user_agent
- For generating fake user-agent strings.
-
Clone the repository:
git clone https://github.com/waqassahmed03/alvord-polk-scraper.git cd alvord-polk-scraper
-
Build the project:
cargo build --release
-
Run the project:
cargo run
- Place your Excel file containing product numbers in the
input
folder with the nameproduct_numbers.xlsx
. - Run the application to start scraping.
- The output will be displayed in the console, including the matched product details.
.
├── LICENSE
├── Cargo.lock
├── Cargo.toml # Project dependencies and metadata
├── src
│ └── main.rs # Main Rust source code
└── input
└── product_numbers.xlsx # Excel file containing product numbers
The application will print the product data in the following format:
{
"Size": "4.0MM",
"Decimal Equiv.(in.)": "0.1575",
"Length Overall(in.)": "4",
"EDP Number": "01069",
"Length of Flute(in.)": "1",
"Price": "$32.05",
"Diameter of Shank": ".1510-.1500",
"description": " High Speed Steel, Straight Shank, 127-1 Right Hand Spiral, Right Hand Cut ",
"In Stock": "Available for Shipment",
}
Feel free to fork this repository, open issues, or submit pull requests.
You can contact me at waqassahmed03@gmail.com
This project is licensed under the MIT License. See the LICENSE file for more details.