Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding local implementation for queue based measuring #1998

Merged
merged 8 commits into from
Jun 27, 2024

Conversation

gustavogaldinoo
Copy link
Contributor

Adding local implementation for queue based measuring and tests

Copy link
Contributor

@DonggeLiu DonggeLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gustavogaldinoo!
Could you please have look at the presubmit failure or run make format?

return measured_snapshots

def measure_manager_inner_loop(experiment: str, max_cycle: int, request_queue,
response_queue, queued_snapshots):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is not your mistake but rather a common confusion caused by legacy FuzzBench code.
We have typing hinting for some functions but not for others. Sometimes this also happens on the parameter level.
It would be great to type-hint the new code, if it is not too much trouble.
It's also low priority so feel free to leave it for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the queues especifically, I will probably try to come up with a more generic type in the future, as we'll use it for local experiments and also for cloud experiments, with a cloud type of queue

experiment/measurer/measure_manager.py Outdated Show resolved Hide resolved
experiment/measurer/measure_manager.py Outdated Show resolved Hide resolved
experiment/measurer/test_measure_manager.py Outdated Show resolved Hide resolved
experiment/measurer/measure_manager.py Show resolved Hide resolved
experiment/measurer/measure_worker.py Show resolved Hide resolved
@@ -44,20 +44,21 @@
from database import models
from experiment.build import build_utils
from experiment.measurer import coverage_utils
from experiment.measurer import datatypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's import this as experiment.measurer.datatypes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the thought process behind this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datatypes can mean anything, by importing it as expeirment.measurer.datatypes, it's much clearer what it contains.

NUM_RETRIES = 3
RETRY_DELAY = 3
FAIL_WAIT_SECONDS = 30
SNAPSHOT_QUEUE_GET_TIMEOUT = 1
SNAPSHOTS_BATCH_SAVE_SIZE = 100
NUM_WORKERS = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHy hardcode this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I started like this because it was easier, and forgot to go back and change it.

Should it be an argument passed when starting the experiment? Any tips on possible default values?

region_coverage)
local_experiment = experiment_utils.is_local_experiment()
if local_experiment:
measure_manager_loop(experiment, max_total_time, measurers_cpus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to start with this only in local experiments? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we would have to wait until we had implemented pub sub queues before using this to non local experiments, but I guess we can already allow it to both local and non local, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be able to run in prod. But I guess the disadvantage of doing so, is that once we had pub/sub queues we would never need to run this in prod again.
Your choice, we will get better testing if we run this in prod, but it may reveal more problems than we care to fix.

Comment on lines 761 to 762
else:
pool_args = (measurers_cpus,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this condition first, and then return to avoid so much nesting.

return pool_args


# pylint: disable=too-many-locals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on line 771 otherwise it is applied to the rest of the file.

experiment/measurer/measure_manager.py Show resolved Hide resolved
service/test_automatic_run_experiment.py Outdated Show resolved Hide resolved
case _:
logger.error('Type of response object not mapped! %s',
type(response_object))
if isinstance(response_object, datatypes.ReescheduleRequest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to have one data type, with a field to request reschedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a little tricky to do, because before having reeschedules, we used to return a models.Snapshot type in some of our functions. We might not want to add a reeschedule field to it, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we can return the response object instead can't we?

logger.error('Type of response object not mapped! %s',
type(response_object))
if isinstance(response_object, datatypes.ReescheduleRequest):
# Need to reeschedule measurement task, remove from the set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid using the word "schedule" since it means something else in the scheduler. Maybe "retry".
Also please end comments with period.
And why are we removing it from the set? That's more important for a comment to explain, I can see the line removing it, the "why" is the important part.

@gustavogaldinoo gustavogaldinoo force-pushed the local-queue-based-measuring branch from a3c43d2 to bf47d79 Compare June 26, 2024 14:35
Comment on lines +17 to +20
SnapshotMeasureRequest = collections.namedtuple(
'SnapshotMeasureRequest', ['fuzzer', 'benchmark', 'trial_id', 'cycle'])

RetryRequest = collections.namedtuple(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't we make a single datatype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I thought it would be more explicit to create a datatype specifically for the retry, but since both of them are measurement requests, and hold the same fields, I guess its not necessary. Just removed the RetryRequest datatype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh shoot. I messed up here.
I think there should be a response object that includes a snapshot or retry bool.
BUt response and request should be different. Sorry.

After you undo this, you can land.

@@ -44,20 +44,20 @@
from database import models
from experiment.build import build_utils
from experiment.measurer import coverage_utils
from experiment.measurer.datatypes import (RetryRequest, SnapshotMeasureRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No please don't import classes or functions. We import modules. See the styleguide.
Also can you import like this:
import experiment.measurer.datatypes as measurer_datatypes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. I thought this is what you meant before in another comment. Just changed it to the import you suggested now

NUM_RETRIES = 3
RETRY_DELAY = 3
FAIL_WAIT_SECONDS = 30
SNAPSHOT_QUEUE_GET_TIMEOUT = 1
SNAPSHOTS_BATCH_SAVE_SIZE = 100
MEASURE_MANAGER_LOOP_TIMEOUT = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not well named.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions? I changed it to "MEASUREMENT_LOOP_WAIT" now, but not sure if thats any better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes WAIT is better.

request_queue = manager.Queue()
response_queue = manager.Queue()

# Since each worker is gonna be in forever loop, we dont need result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write "going to be in an infinite loop"

response_queue = manager.Queue()

# Since each worker is gonna be in forever loop, we dont need result
# return. Workers life scope will end automatically when there are no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workers'

}
local_measure_worker = measure_worker.LocalMeasureWorker(config)
measure_trial_coverage_args = [()] * measurers_cpus
_result = pool.starmap_async(local_measure_worker.measure_worker_loop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need map if you are not passing any arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pool.apply_async calls!

if not measurers_cpus:
logger.info('Number of measurer CPUs not passed as argument. using %d',
multiprocessing.cpu_count())
measurers_cpus = multiprocessing.cpu_count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this variable and use it in the log instead of calling the function again.

def get_pool_args(measurers_cpus, runners_cpus):
"""Return pool args based on measurer cpus and runner cpus arguments."""
pool_args = ()
if measurers_cpus is not None and runners_cpus is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this so we return early if they are None, and then we can have less nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we no longer need that function, its currently only being used in the old measure_loop method. Should I remove them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix the nesting nonetheless

response queue. In this scenario, we want to remove the snapshot identifier
from the queued_snapshots set, as this allows the measurement task to be
retried in the future"""
# Use normal queue here as multiprocessing queue gives flaky tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end sentences with a period.

@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for measure_manager.py."""

import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like good tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@gustavogaldinoo gustavogaldinoo merged commit 048a72d into master Jun 27, 2024
5 checks passed
@gustavogaldinoo gustavogaldinoo deleted the local-queue-based-measuring branch June 27, 2024 15:28
DonggeLiu added a commit that referenced this pull request Aug 3, 2024
DanBlackwell added a commit to DanBlackwell/fuzzbench that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants