-
Notifications
You must be signed in to change notification settings - Fork 4
Troubleshooting common errors
This page describes common errors, common non-errors, and general troubleshooting.
Check Cloudwatch logs to see if the containers correctly started. Run the diagnostic queries to check if the data layer is still syncing. (Header sync and extract diffs rarely change, so the usual culprit is a change made to the transformation layer containers).
To diagnose missing urn or auction data, it's easiest to compare event and storage diff data. For every state changing event, we should have a corresponding storage diff. Let's use an urn as an example. This query returns the current ink
and art
storage values, and embeds frob
events for this urn. Within the embedded frob
event, we're also embedding the historical ink
and art
state of the urn—that is, the ink
and art
value at the block height of the corresponding frob
event:
getUrn(ilkIdentifier: "ETH-A", urnIdentifier: "0xbB7497BAaF231B8b7D92e0cFf9BCf4F2018C2d2d") {
# current `ink` and `art` (data comes from storage diffs)
ink
art
# frob events (data comes from events)
frobs(first: 3) {
# there are probably more than 3 frobs, check here
totalCount
nodes {
tx {
# block height of the event
blockHeight
}
# dink and dart parameters from the event
dink
dart
urn {
# ink and art state at the time of the event
ink
art
}
}
}
}
}
The above query returns a response like this:
{
"data": {
"getUrn": {
"ink": "280130575429533717604",
"art": "553661142717366142318575",
"frobs": {
"totalCount": 105,
"nodes": [
{
"tx": {
"blockHeight": "13767861"
},
"dink": "0",
"dart": "47198512683467635276324",
"urn": {
"ink": "280130575429533717604",
"art": "553661142717366142318575"
}
},
{
"tx": {
"blockHeight": "13760904"
},
"dink": "0",
"dart": "3776167629227625039464",
"urn": {
"ink": "280130575429533717604",
"art": "506462630033898507042251"
}
},
{
"tx": {
"blockHeight": "13760893"
},
"dink": "57670312298673291211",
"dart": "0",
"urn": {
"ink": "280130575429533717604",
"art": "502686462404670882002787"
}
}
]
}
}
},
"meta": {
"graphqlQueryCost": 3
}
}
These events show one dink
(change to ink
) and two darts
(change to art
). We should expect to see corresponding storage diffs for this urn at the block heights of these events. Here's one way to look for them:
{
art1: allVatUrnArts(first: 100, filter: {storageDiffByDiffId: {blockHeight: {equalTo: "13767861"}}}) {
nodes {
art
rawUrnByUrnId {
identifier
}
}
}
art2: allVatUrnArts(first: 100, filter: {storageDiffByDiffId: {blockHeight: {equalTo: "13760904"}}}) {
nodes {
art
rawUrnByUrnId {
identifier
}
}
}
ink1: allVatUrnInks(first: 100, filter: {storageDiffByDiffId: {blockHeight: {equalTo: "13760893"}}}) {
nodes {
ink
rawUrnByUrnId {
identifier
}
}
}
}
We could also look at a lower level, by querying diffs directly and embedding any associated vatUrnArtsByDiffId
:
{
allStorageDiffs(first: 100, filter: {blockHeight: {equalTo: "13767861"}}) {
nodes {
address
storageKey
storageValue
vatUrnArtsByDiffId(first: 1) {
nodes {
rawUrnByUrnId {
identifier
}
}
}
}
}
}
Here, we expect to see one diff whose identifier matches the urn's address, 0xbB7497BAaF231B8b7D92e0cFf9BCf4F2018C2d2d
.
In case of missing urn data, we can use the backfill urns script to populate it.
This worked example is for urns, but you can take a similar approach for other types of data, like auctions: look for the event history, match it with the storage history, and compare to see if any data is missing.
In the case of missing non-urn data, you will have to run a storage backfill over the block range of missing data.
Queries may time out in production, especially large summary queries like getUrnsByIlk
and allClips
for large collateral types like ETH-A. If a query times out, the GraphQL API will return a message like this:
{
"errors": [
{
"message": "canceling statement due to statement timeout",
"locations": [
{
"line": 2,
"column": 3
}
],
"path": [
"getUrnsByIlk"
]
}
],
"data": {
"getUrnsByIlk": null
},
"meta": {
"graphqlQueryCost": 158
}
}
In most cases, these queries are already limited to a maximum page size, and the end user can tune this parameter. However, in some cases we may need to add or change the limit. You can tune page size using the Postgraphile pagination cap parameters (start here), or by adding a max results parameter to the underlying SQL function.
There are a few noisy log messages that cat can look concerning, but are not.
Unique constraint violations in log files (header sync, MCD execute). We occasionally receive duplicate headers due to re-orgs, which will log a message like this:
{
"blockNumber": 13849045,
"headerHash": "0xa0d85d3fef24a4d0c6e7ed32752140da61bc43dbbbe8b070abc75c0f65b90fab",
"headerId": 6721269,
"level": "warning",
"msg": "error marking header checked: pq: insert or update on table \"checked_headers\" violates foreign key constraint \"checked_headers_header_id_fkey\"",
"time": "2021-12-21T14:28:48Z"
}
Connection errors in header sync. Occasionally header sync will fail to connect to RPC, but will retry. These are only an issue if the failed connection persists:
{
"SubCommand": "headerSync",
"level": "error",
"msg": "headerSync: ValidateHeaders failed: error creating validation window: Post \"https://geth0.mainnet.makerops.services/rpc\": dial tcp 3.222.28.184:443: connect: connection refused",
"time": "2021-12-21T11:34:25Z"
}
- First, check if headers and diffs are correctly syncing using the diagnostic queries.
- Look in Sentry for application errors.
- Look at Cloudwatch logs for each container.
- Look at ECS logs for each service.