This document is designed to help you get started with KQL, particularly in the context of Microsoft Advanced Hunting. KQL is a powerful query language used to extract and analyze data from large datasets. This guide will introduce you to the basics of KQL and provide tips on how to explore and query your data effectively.
- Introduction to KQL
- Understanding Data Tables
- Basic Query Structure
- Common Operators and Commands
- Filtering Data with
where
- Projecting and Extending Data
- Sorting and Ordering Data
- Aggregating Data
- Exploring Table Schemas
- Using Functions and Operators
- Joining Tables
- Best Practices
- Additional Resources
Kusto Query Language (KQL) is a powerful language used to query large datasets stored in Azure Data Explorer, Microsoft Sentinel, Microsoft Defender for Endpoint, and other Microsoft services. It allows you to perform complex queries and data analysis with ease.
- Efficient Data Retrieval: Quickly extract relevant data from massive datasets.
- Advanced Analytics: Perform aggregations, joins, and statistical analysis.
- Flexible Syntax: Build queries that suit your specific needs.
- Integration: Works seamlessly with Microsoft security tools for threat hunting.
In Microsoft Advanced Hunting, data is organized into tables, each containing records of specific event types. Some commonly used tables include:
DeviceProcessEvents
: Information about process creation and termination.DeviceNetworkEvents
: Network connection events.DeviceFileEvents
: File creation, modification, and deletion events.DeviceLogonEvents
: User logon and authentication events.AlertEvents
: Security alerts generated by Microsoft Defender.
Understanding the structure and content of these tables is essential for effective querying.
A KQL query typically starts with a table name, followed by a series of operators that manipulate the data.
Syntax:
TableName
| Operator1
| Operator2
| ...
Example:
DeviceProcessEvents
| where FileName == "malicious.exe"
| project Timestamp, DeviceName, FileName, ProcessCommandLine
| order by Timestamp desc
This query retrieves process events where the file name is malicious.exe
, selects specific columns, and orders the results by the most recent events.
where
: Filters records based on a condition.project
: Selects specific columns to include in the output.extend
: Creates new calculated columns.summarize
: Aggregates data using functions likecount()
,sum()
, oravg()
.order by
: Sorts the results based on specified columns.join
: Combines data from two tables based on a common key.parse_json()
: Parses a JSON string into an object.
The where
operator filters records to include only those that meet specified conditions.
Example:
DeviceNetworkEvents
| where RemoteIP == "192.168.1.10"
Common Comparison Operators:
==
: Equal to!=
: Not equal tocontains
: Contains a substringstartswith
: Starts with a substringin
: In a list of valuesbetween()
: Within a range
Example with Multiple Conditions:
DeviceProcessEvents
| where FileName == "cmd.exe" and ProcessCommandLine contains "net user"
Selects specific columns to include in the output, reducing the amount of data returned.
Example:
| project Timestamp, DeviceName, FileName, FolderPath
Adds new columns or calculates new values based on existing data.
Example:
| extend FileExtension = split(FileName, ".")
This creates a new column FileExtension
by splitting FileName
on the dot character.
The order by
operator sorts the query results based on specified columns, either ascending (asc
) or descending (desc
).
Example:
| order by Timestamp desc
Aggregates data using functions and groups results by specified columns.
Common Aggregate Functions:
count()
: Counts the number of records.sum(ColumnName)
: Sums the values in a column.avg(ColumnName)
: Calculates the average value.max(ColumnName)
: Finds the maximum value.min(ColumnName)
: Finds the minimum value.
Example:
DeviceFileEvents
| where ActionType == "FileDeleted"
| summarize DeletionCount = count() by DeviceName
| order by DeletionCount desc
This query counts the number of file deletions per device and orders the results by the highest counts.
Understanding the structure of a table helps in crafting accurate queries. The | getschema
command displays the columns and data types in a table.
How to Use getschema
:
TableName
| getschema
Example:
DeviceProcessEvents
| getschema
Use this command to:
- Identify available fields.
- Understand data types (e.g.,
string
,datetime
,int
). - Plan which columns to use in your queries.
KQL provides a variety of functions to manipulate and analyze data.
String Functions:
tolower()
: Converts a string to lowercase.toupper()
: Converts a string to uppercase.trim()
: Removes leading and trailing spaces.
Date and Time Functions:
ago()
: Calculates a datetime value relative to now.datetime()
: Converts a string to a datetime value.
Example:
DeviceLogonEvents
| where Timestamp > ago(1d)
This filters events that occurred within the last day.
Combine data from two tables to enrich your results.
Syntax:
Table1
| join kind=JoinType (
Table2
| WhereClause
) on CommonField
Example:
DeviceProcessEvents
| where FileName == "powershell.exe"
| join kind=inner (
DeviceNetworkEvents
| where RemoteIPType == "Public"
) on DeviceId
| project Timestamp, DeviceName, FileName, RemoteIP, ProcessCommandLine
This query finds devices where powershell.exe
ran and made network connections to public IPs.
- Start Simple: Begin with basic queries and gradually add complexity.
- Filter Early: Use
where
clauses at the beginning to limit data processing. - Verify Field Names: Use
| getschema
or sample data to confirm field names. - Comment Your Queries: Use
//
to add comments and explanations. - Limit Results for Testing: Use
| take 10
or| limit 10
during development. - Be Mindful of Data Types: Ensure you use appropriate functions for the data type.
- KQL Documentation: Microsoft Docs - Kusto Query Language
- Advanced Hunting Query Language: Microsoft Docs - Advanced Hunting
- Sample Queries: Explore built-in queries or community samples to learn more.
Step 1: Filter for PowerShell Executions
DeviceProcessEvents
| where FileName == "powershell.exe"
Step 2: Filter for Suspicious Command Lines
| where ProcessCommandLine contains "-EncodedCommand"
Step 3: Project Relevant Fields
| project Timestamp, DeviceName, AccountName, ProcessCommandLine, InitiatingProcessFileName
Step 4: Order Results
| order by Timestamp desc
Full Query:
DeviceProcessEvents
| where FileName == "powershell.exe"
| where ProcessCommandLine contains "-EncodedCommand"
| project Timestamp, DeviceName, AccountName, ProcessCommandLine, InitiatingProcessFileName
| order by Timestamp desc
Explanation:
- This query identifies instances where
powershell.exe
was run with the-EncodedCommand
parameter, which is often used to obfuscate malicious scripts.
KQL is a versatile language that empowers you to explore and analyze data effectively. By mastering the basic operators and understanding how to manipulate data, you can uncover valuable insights and enhance your threat-hunting capabilities.
Happy querying!