Adding local implementation for queue based measuring #1998

gustavogaldinoo · 2024-06-19T20:48:17Z

Adding local implementation for queue based measuring and tests

experiment/measurer/measure_worker.py

DonggeLiu

Thanks, @gustavogaldinoo!
Could you please have look at the presubmit failure or run make format?

DonggeLiu · 2024-06-20T00:41:38Z

experiment/measurer/measure_manager.py

+    return measured_snapshots
+
+def measure_manager_inner_loop(experiment: str, max_cycle: int, request_queue,
+                               response_queue, queued_snapshots):


nit: This is not your mistake but rather a common confusion caused by legacy FuzzBench code.
We have typing hinting for some functions but not for others. Sometimes this also happens on the parameter level.
It would be great to type-hint the new code, if it is not too much trouble.
It's also low priority so feel free to leave it for later.

For the queues especifically, I will probably try to come up with a more generic type in the future, as we'll use it for local experiments and also for cloud experiments, with a cloud type of queue

experiment/measurer/measure_manager.py

experiment/measurer/test_measure_manager.py

experiment/measurer/measure_manager.py

experiment/measurer/measure_worker.py

service/test_automatic_run_experiment.py

jonathanmetzman · 2024-06-19T20:49:06Z

experiment/measurer/measure_manager.py

@@ -44,20 +44,21 @@
 from database import models
 from experiment.build import build_utils
 from experiment.measurer import coverage_utils
+from experiment.measurer import datatypes


Let's import this as experiment.measurer.datatypes.

What's the thought process behind this?

datatypes can mean anything, by importing it as expeirment.measurer.datatypes, it's much clearer what it contains.

jonathanmetzman · 2024-06-20T14:40:05Z

experiment/measurer/measure_manager.py

 NUM_RETRIES = 3
 RETRY_DELAY = 3
 FAIL_WAIT_SECONDS = 30
 SNAPSHOT_QUEUE_GET_TIMEOUT = 1
 SNAPSHOTS_BATCH_SAVE_SIZE = 100
+NUM_WORKERS = 4


WHy hardcode this?

Hmmm, I started like this because it was easier, and forgot to go back and change it.

Should it be an argument passed when starting the experiment? Any tips on possible default values?

jonathanmetzman · 2024-06-20T14:43:48Z

experiment/measurer/measure_manager.py

-                 region_coverage)
+    local_experiment = experiment_utils.is_local_experiment()
+    if local_experiment:
+        measure_manager_loop(experiment, max_total_time, measurers_cpus,


Is the goal to start with this only in local experiments? Why?

I thought we would have to wait until we had implemented pub sub queues before using this to non local experiments, but I guess we can already allow it to both local and non local, right?

Yeah, it should be able to run in prod. But I guess the disadvantage of doing so, is that once we had pub/sub queues we would never need to run this in prod again.
Your choice, we will get better testing if we run this in prod, but it may reveal more problems than we care to fix.

jonathanmetzman · 2024-06-20T14:45:19Z

experiment/measurer/measure_manager.py

+        else:
+            pool_args = (measurers_cpus,)


Let's do this condition first, and then return to avoid so much nesting.

jonathanmetzman · 2024-06-20T14:45:53Z

experiment/measurer/measure_manager.py

+    return pool_args
+
+
+# pylint: disable=too-many-locals


This should be on line 771 otherwise it is applied to the rest of the file.

experiment/measurer/measure_manager.py

service/test_automatic_run_experiment.py

jonathanmetzman · 2024-06-20T16:08:08Z

experiment/measurer/measure_manager.py

-                case _:
-                    logger.error('Type of response object not mapped! %s',
-                                 type(response_object))
+            if isinstance(response_object, datatypes.ReescheduleRequest):


I think it makes more sense to have one data type, with a field to request reschedule.

This might be a little tricky to do, because before having reeschedules, we used to return a models.Snapshot type in some of our functions. We might not want to add a reeschedule field to it, right?

No, but we can return the response object instead can't we?

jonathanmetzman · 2024-06-20T16:09:09Z

experiment/measurer/measure_manager.py

-                    logger.error('Type of response object not mapped! %s',
-                                 type(response_object))
+            if isinstance(response_object, datatypes.ReescheduleRequest):
+                # Need to reeschedule measurement task, remove from the set


Let's avoid using the word "schedule" since it means something else in the scheduler. Maybe "retry".
Also please end comments with period.
And why are we removing it from the set? That's more important for a comment to explain, I can see the line removing it, the "why" is the important part.

jonathanmetzman · 2024-06-26T14:38:50Z

experiment/measurer/datatypes.py

+SnapshotMeasureRequest = collections.namedtuple(
+    'SnapshotMeasureRequest', ['fuzzer', 'benchmark', 'trial_id', 'cycle'])
+
+RetryRequest = collections.namedtuple(


Why didn't we make a single datatype?

That's a good idea. I thought it would be more explicit to create a datatype specifically for the retry, but since both of them are measurement requests, and hold the same fields, I guess its not necessary. Just removed the RetryRequest datatype

Oh shoot. I messed up here.
I think there should be a response object that includes a snapshot or retry bool.
BUt response and request should be different. Sorry.

After you undo this, you can land.

jonathanmetzman · 2024-06-26T14:42:19Z

experiment/measurer/measure_manager.py

@@ -44,20 +44,20 @@
 from database import models
 from experiment.build import build_utils
 from experiment.measurer import coverage_utils
+from experiment.measurer.datatypes import (RetryRequest, SnapshotMeasureRequest)


No please don't import classes or functions. We import modules. See the styleguide.
Also can you import like this:
import experiment.measurer.datatypes as measurer_datatypes

Oops, sorry. I thought this is what you meant before in another comment. Just changed it to the import you suggested now

jonathanmetzman · 2024-06-26T14:43:01Z

experiment/measurer/measure_manager.py

 NUM_RETRIES = 3
 RETRY_DELAY = 3
 FAIL_WAIT_SECONDS = 30
 SNAPSHOT_QUEUE_GET_TIMEOUT = 1
 SNAPSHOTS_BATCH_SAVE_SIZE = 100
+MEASURE_MANAGER_LOOP_TIMEOUT = 10


This is not well named.

Do you have any suggestions? I changed it to "MEASUREMENT_LOOP_WAIT" now, but not sure if thats any better

Yes WAIT is better.

jonathanmetzman · 2024-06-26T14:43:27Z

experiment/measurer/measure_manager.py

+        request_queue = manager.Queue()
+        response_queue = manager.Queue()
+
+        # Since each worker is gonna be in forever loop, we dont need result


can you write "going to be in an infinite loop"

jonathanmetzman · 2024-06-26T14:43:38Z

experiment/measurer/measure_manager.py

+        response_queue = manager.Queue()
+
+        # Since each worker is gonna be in forever loop, we dont need result
+        # return. Workers life scope will end automatically when there are no


jonathanmetzman · 2024-06-26T14:45:32Z

experiment/measurer/measure_manager.py

+        }
+        local_measure_worker = measure_worker.LocalMeasureWorker(config)
+        measure_trial_coverage_args = [()] * measurers_cpus
+        _result = pool.starmap_async(local_measure_worker.measure_worker_loop,


You don't need map if you are not passing any arguments.

Changed to pool.apply_async calls!

jonathanmetzman · 2024-06-26T14:47:36Z

experiment/measurer/measure_manager.py

+    if not measurers_cpus:
+        logger.info('Number of measurer CPUs not passed as argument. using %d',
+                    multiprocessing.cpu_count())
+        measurers_cpus = multiprocessing.cpu_count()


Take this variable and use it in the log instead of calling the function again.

jonathanmetzman · 2024-06-26T14:48:22Z

experiment/measurer/measure_manager.py

+def get_pool_args(measurers_cpus, runners_cpus):
+    """Return pool args based on measurer cpus and runner cpus arguments."""
+    pool_args = ()
+    if measurers_cpus is not None and runners_cpus is not None:


Change this so we return early if they are None, and then we can have less nesting.

Actually, I think we no longer need that function, its currently only being used in the old measure_loop method. Should I remove them?

I can fix the nesting nonetheless

jonathanmetzman · 2024-06-26T14:50:47Z

experiment/measurer/test_measure_manager.py

+    response queue. In this scenario, we want to remove the snapshot identifier
+    from the queued_snapshots set, as this allows the measurement task to be
+    retried in the future"""
+    # Use normal queue here as multiprocessing queue gives flaky tests


end sentences with a period.

jonathanmetzman · 2024-06-26T14:54:47Z

experiment/measurer/test_measure_manager.py

@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Tests for measure_manager.py."""
-
 import os


These look like good tests!

This reverts commit 048a72d.

…#1998)" This reverts commit 048a72d.

gustavogaldinoo requested review from DonggeLiu and jonathanmetzman June 19, 2024 20:48

gustavogaldinoo commented Jun 19, 2024

View reviewed changes

experiment/measurer/measure_worker.py Outdated Show resolved Hide resolved

DonggeLiu approved these changes Jun 20, 2024

View reviewed changes

gustavogaldinoo commented Jun 20, 2024

View reviewed changes

service/test_automatic_run_experiment.py Outdated Show resolved Hide resolved

jonathanmetzman reviewed Jun 20, 2024

View reviewed changes

gustavogaldinoo added 5 commits June 26, 2024 14:32

Adding local implementation for queue based measuring

0afcea2

Fixing tests and format

748b47d

Initializing logs for every local worker

4570abb

Fixes based on PR feedback

77f1aa6

Removing hardcoded value for num workers

bf47d79

gustavogaldinoo force-pushed the local-queue-based-measuring branch from a3c43d2 to bf47d79 Compare June 26, 2024 14:35

Remove outdated test file to match master branch

4d7d0ac

jonathanmetzman reviewed Jun 26, 2024

View reviewed changes

Addressing some PR feedback

db0e45c

jonathanmetzman approved these changes Jun 27, 2024

View reviewed changes

Recreating retry request datatype

503a1b1

gustavogaldinoo merged commit 048a72d into master Jun 27, 2024
5 checks passed

gustavogaldinoo deleted the local-queue-based-measuring branch June 27, 2024 15:28

DonggeLiu added a commit that referenced this pull request Aug 3, 2024

Revert "Adding local implementation for queue based measuring (#1998)"

9fb13da

This reverts commit 048a72d.

DanBlackwell added a commit to DanBlackwell/fuzzbench that referenced this pull request Aug 6, 2024

Revert "Adding local implementation for queue based measuring (google…

15beef1

…#1998)" This reverts commit 048a72d.

Adding local implementation for queue based measuring #1998

Adding local implementation for queue based measuring #1998

Conversation

gustavogaldinoo commented Jun 19, 2024

DonggeLiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment