In this lab, we'll use Databricks to wrangle data in the blob storage Before you follow the steps, please make sure concept of ADF
- Mount the blob storage to Databricks
- Python
- Dataframe
Create a service from here Azure Portal #Create Azure Databricks to create Azure Databricks
Type Azure Databricks workspace name, e.g.) azhol92
Select appropreate Azure subscription
Select 'Use Existing' and find your hands-on lab resource group name from the drop box e.g.) azhol-92-rg
Select 'West US' for location
Select 'Premium (+Role-based access controls)' for Pricing Tier
Pin Azure Databricks on your Azure Portal dashboard
Open your Azure Databricks workspace or open Azure Databricks from your browser
Click 'Cluster' icon on the left panel of screen
Click '+ Create Cluster' and fill out the form like following
Name | Value |
---|---|
Cluster Name | azhol92 |
Cluster Mode | Standard Mode |
Databricks Runtime Ver. | 4.2. |
Python Ver. | 3 |
Driver Type | same as worker |
Worker Type | Standard DS3 v2 |
Workers | 2 |
Enable autoscaling | Uncheck |
Auto Termination | Check, 10 |
Click 'Workspace' on Azure Databricks Portal
Click 'Users', click on little icon next to user name and then click 'Import' to import existing notebook in to your Azure Databricks workspace
Select 'URL', Copy below url and paste it to import window
https://raw.githubusercontent.com/xlegend1024/az-cloudscale-adv-analytics/master/AzureDatabricks/02.datawrangling.ipynb
Click 'Import' button, then the notebook will automatically open on your browser
Click 'Detached' menu and then Select your cluster on the list.
Find blob strage account name and key and then update Notebook parameters
Run differen languages (Python, Scala, R, SQL) to load data from blob to Databricks Data wrangling
Run the commnad from Notebook on Databricks
Optionaly we can run Machine Learning in Databricks