Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate tracing #6339

Open
wants to merge 126 commits into
base: unstable
Choose a base branch
from
Open

Conversation

ThreeHrSleep
Copy link
Member

@ThreeHrSleep ThreeHrSleep commented Sep 2, 2024

Issue Addressed

Tracing Integration

Proposed Changes

  • replace slog & log with tracing throughout the codebase
  • implement custom crit log
  • make relevant changes in the formatter
  • replace sloggers
  • re-write SSE logging components

cc: @macladson @eserilev

@ThreeHrSleep ThreeHrSleep marked this pull request as draft September 2, 2024 05:04
@chong-he chong-he added work-in-progress PR is a work-in-progress UX-and-logs labels Sep 3, 2024
@ThreeHrSleep ThreeHrSleep changed the title feat: tracing integration feat: Integrate tracing Sep 4, 2024
@ThreeHrSleep ThreeHrSleep force-pushed the tracing-integration branch 2 times, most recently from 23491cd to 4b63358 Compare September 9, 2024 11:07
Add retropgf funding (sigp#6324)

Migrate from `ethereum-types` to `alloy-primitives` (sigp#6078)

* Remove use of ethers_core::RlpStream

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core

* Remove old code

* Simplify keccak call

* Remove unused package

* Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core

* Merge branch 'unstable' into remove_use_of_ethers_core

* Run clippy

* Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core

* Check all cargo fmt

* migrate to alloy primitives init

* fix deps

* integrate alloy-primitives

* resolve dep issues

* more changes based on dep changes

* add TODOs

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core

* Revert lock

* Add BeaconBlocksByRange v3

* continue migration

* Revert "Add BeaconBlocksByRange v3"

This reverts commit e3ce7fc.

* impl hash256 extended trait

* revert some uneeded diffs

* merge conflict resolved

* fix subnet id rshift calc

* rename to FixedBytesExtended

* debugging

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives

* fix failed test

* fixing more tests

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core

* introduce a shim to convert between the two u256 types

* move alloy to wrokspace

* align alloy versions

* update

* update web3signer test certs

* refactor

* resolve failing tests

* linting

* fix graffiti string test

* fmt

* fix ef test

* resolve merge conflicts

* remove udep and revert cert

* cargo patch

* cyclic dep

* fix build error

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives

* resolve conflicts, update deps

* merge unstable

* fmt

* fix deps

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives

* resolve merge conflicts

* resolve conflicts, make necessary changes

* Remove patch

* fmt

* remove file

* merge conflicts

* sneaking in a smol change

* bump versions

* Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives

* Updates for peerDAS

* Update ethereum_hashing to prevent dupe

* updated alloy-consensus, removed TODOs

* cargo update

* endianess fix

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives

* fmt

* fix merge

* fix test

* fixed_bytes crate

* minor fixes

* convert u256 to i64

* panic free mixin to_low_u64_le

* from_str_radix

* computbe_subnet api and ensuring we use big-endian

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives

* fix test

* Simplify subnet_id test

* Simplify some more tests

* Add tests to fixed_bytes crate

* Merge branch 'unstable' into migrate-to-alloy-primitives

update libp2p to version 0.54 (sigp#6249)

* update libp2p to version 0.54.0

* address review

* Merge branch 'unstable' of github.com:sigp/lighthouse into update-libp2p

* Merge branch 'update-libp2p' of github.com:sigp/lighthouse into update-libp2p

Drop block data from BlockError and BlobError (sigp#5735)

* Drop block data from BlockError and BlobError

* Debug release tests

* Fix release tests

* Revert unnecessary changes to lighthouse_metrics

Update manual checkpoint sync content in Lighthouse book (sigp#6338)

* Update manual checkpiont sync

* Update faq

* Minor revision

* Minor revision

Light Client Bug Fix (sigp#6299)

* Light Client Bug Fix

Ignore Rust 1.82 warnings about void patterns (sigp#6357)

* Ignore Rust 1.82 warnings about void patterns

Enable `large_stack_frames` lint (sigp#6343)

* Enable `large_stack_frames` lint

Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric (sigp#6340)

* Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric, and move metrics into the compute function, recording only successful computation.

* Move `discard_timer_on_break` usage to caller site.

* Merge branch 'unstable' into compute-data-column-metric

* Merge branch 'unstable' into compute-data-column-metric

Metadata request ordering (sigp#6336)

* Send metadata request ordering

* Merge branch 'unstable' into metadata-order

Clarify validator monitor block log (sigp#6342)

* Clarify validator monitor block log

* Merge branch 'unstable' into clarify-block-log

Check known parent on rpc blob process (sigp#5893)

* Check known parent on rpc blob process

* fix test

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into blob-unknown-parent

Remove beta tag from gossipsub 1.2 (sigp#6344)

* Remove the beta tag from gossipsub v1.2

* fix clippy

* Merge branch 'unstable' into remove-beta-tag

Fix lints for Rust 1.81 (sigp#6363)

* Fix lints for Rust 1.81

Use tikv-jemallocator instead of jemallocator (sigp#6354)

* Use tikv-jemallocator instead of jemallocator

* Merge branch 'unstable' into tikv-jemallocator

* Bump tikv-jemallocator and tikv-jemalloc-ctl

Improve `get_custody_columns` validation, caching and error handling (sigp#6308)

* Improve `get_custody_columns` validation, caching and error handling.

* Merge branch 'unstable' into get-custody-columns-error-handing

* Fix failing test and add more test.

* Fix failing test and add more test.

* Merge branch 'unstable' into get-custody-columns-error-handing

* Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function.

* Add condition when calling `compute_custody_subnets_from_metadata` and update logs.

* Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway.

* Add `peers_per_custody_subnet_count` to track peer csc and supernodes.

* Disconnect peers with invalid metadata and find other peers instead.

* Fix sampling tests.

* Merge branch 'unstable' into get-custody-columns-error-handing

* Merge branch 'unstable' into get-custody-columns-error-handing

Delete legacy payload reconstruction (sigp#6213)

* Delete legacy payload reconstruction

* Delete unneeded failing test

* Merge remote-tracking branch 'origin/unstable' into remove-more-ethers

* Merge remote-tracking branch 'origin/unstable' into remove-more-ethers

* Cleanups

simplify rpc codec logic (sigp#6304)

* simplify rpc codec logic

* Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec

* Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec

* Merge branch 'unstable' of github.com:sigp/lighthouse into simply-rpc-codec

* Merge branch 'unstable' into simplify-rpc-codec

* Merge branch 'unstable' into simplify-rpc-codec
Revert "Merge branch 'sigp:stable' into tracing-integration"

This reverts commit 668157c, reversing
changes made to ee25fbe.
@eserilev
Copy link
Collaborator

eserilev commented Oct 1, 2024

Tracing
image

Slog
image

@eserilev
Copy link
Collaborator

eserilev commented Oct 1, 2024

at the moment this is what a crit log looks like

its labeled as ERROR but with an error_type = crit. Im wondering if we can tweak the formatting here to display CRIT in purple instead of ERROR (as we have currently for our slog impl)

image

Copy link
Collaborator

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far!

Most of my suggested changes revolve around making sure we stick to display formatting vs debug formatting.

I think we have some options to simplify the crit! macro a bit

For crit level errors we'll want to display the Crit log similarly to how they are currently displayed with slog. This will involve implementing a custom tracing layer that takes in those crit logs and formats them appropriately.

In situations where we use slog:o! I think we can replace those with a tracing span

Theres some instances where we are still importing slog, we should try removing those if possible

beacon_node/beacon_chain/src/builder.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/light_client_server_cache.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/migrate.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/otb_verification_service.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/state_advance_timer.rs Outdated Show resolved Hide resolved
validator_client/src/sync_committee_service.rs Outdated Show resolved Hide resolved
validator_client/src/sync_committee_service.rs Outdated Show resolved Hide resolved
validator_client/src/validator_store.rs Outdated Show resolved Hide resolved
validator_client/src/validator_store.rs Outdated Show resolved Hide resolved
validator_client/src/validator_store.rs Outdated Show resolved Hide resolved
@michaelsproul michaelsproul added v6.1.0 New release c. Q1 2025 backwards-incompat Backwards-incompatible API change labels Dec 2, 2024
@jimmygchen jimmygchen requested a review from jxs as a code owner January 8, 2025 23:52
Copy link

mergify bot commented Jan 8, 2025

⚠️ The sha of the head commit of this PR conflicts with #6772. Mergify cannot evaluate rules on this PR. ⚠️

Copy link
Member

@macladson macladson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About halfway through now. I've left a question but mostly has been minor formatting issues where cargo fmt isn't doing its job.

I went ahead and pushed a commit to fix what I found, I figured it was faster that way

beacon_node/beacon_chain/src/beacon_block_streamer.rs Outdated Show resolved Hide resolved
let log = self
.log
.new(slog::o!("peer_id" => peer_id.to_string(), "connection_id" => connection_id.to_string()));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing logging verbosity here if we aren't using connection_id or peer_id anymore?

@@ -306,8 +334,8 @@ where

fn handle_established_outbound_connection(
&mut self,
connection_id: ConnectionId,
peer_id: PeerId,
_connection_id: ConnectionId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Copy link
Member

@macladson macladson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a few unused bits of code lying around. I think in some cases we probably need to restore some of them as I suspect we are losing visibility without them, but others we can just remove.

)
},
"gradual_data_column_publication",
);
}
}

#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code now unused? Can we remove it?

@@ -192,7 +206,7 @@ impl<E: EthSpec, B: BatchConfig> BatchInfo<E, B> {

/// Return the number of times this batch has failed downloading and failed processing, in this
/// order.
pub fn failed_attempts(&self) -> (usize, usize) {
pub fn _failed_attempts(&self) -> (usize, usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing anything but this being unused? Can we remove it?

@@ -40,6 +42,7 @@ pub type ProcessingResult = Result<KeepChain, RemoveChain>;

/// Reasons for removing a chain
#[derive(Debug)]
#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More dead code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like logs have been completely removed from this file

/// Run the bootnode given the CLI configuration.
#[allow(unused_variables)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More unused code?

use std::process::Command;
use std::process::Output;

fn run_cmd(cmd_line: &str) -> Result<Output, std::io::Error> {
// TODO(tracing) do we need this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of dead code here too

lcli/src/main.rs Outdated
is_restricted: true,
sse_logging: false, // No SSE Logging in LCLI
},
"", // TODO(tracing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo here

@@ -34,8 +32,11 @@ use {
#[cfg(not(target_family = "unix"))]
use {futures::channel::oneshot, std::cell::RefCell};

const LOG_CHANNEL_SIZE: usize = 16384;
const SSE_LOG_CHANNEL_SIZE: usize = 2048;
// TODO(tracing) do we need this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change ready-for-review The code is ready for review UX-and-logs v6.1.0 New release c. Q1 2025
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants