Performance of WebDAV search requests with many shared files #37061

starypatyk · 2023-03-06T22:24:12Z

I have noticed that when many files are shared between users (e.g. via Talk) the WebDAV search requests are quite slow. This is visible in the Photos app when entering the initial page or in the Android client when entering the Media tab. I suspect that this issue #35776 might have the same cause.

After some analysis I have found that in such situation the database query that retrieves the list of files is very complex and takes significant time to execute. For each shared file the following section is added to the SQL WHERE clause:

((`storage` = :dcValueNN) AND (`path_hash` = :dcValueMM)) OR

In my environment I see queries with over 250 sections like this. That makes the query use over 500 separate parameters.

Since usually many files are shared between two users, the storage part of the filter is repeated many times. This provides an opportunity to refactor the query to use the following form (and provide all path_hash values for the same storage in an array):

((`storage` = :dcValueNN) AND (`path_hash` IN (:dcValueMM))) OR

This reduces the number of parameters significantly (in my case to about 25) and seems easier to optimize by the database engine.

In this PR I propose a change to the QuerySearchHelper::searchInCaches method that generates the query refactored as described above.

In my environment (MySQL database, ~250k rows in the oc_filecache table) I measured a very nice improvement in the query execution time (retrieval of the first 200 images in the Photos app):

before the change: ~3700 ms
after the change: ~750 ms

Comments? Thoughts?

apps/files_sharing/lib/Cache.php

come-nc · 2023-03-09T15:56:26Z

lib/private/Files/Cache/QuerySearchHelper.php

@@ -127,9 +161,7 @@ public function searchInCaches(ISearchQuery $searchQuery, array $caches): array
 				));
 		}

-		$storageFilters = array_values(array_map(function (ICache $cache) {
-			return $cache->getQueryFilterForStorage();


From what I can see this was using path and not path hash, did I miss something? Where was the md5 calculation before?

This is indeed tricky to follow.

SearchComparison objects are created with the path comparison here:

server/apps/files_sharing/lib/Cache.php

Line 202 in 1cadbaf

new SearchComparison(ISearchComparison::COMPARE_EQUAL, 'path', $this->getGetUnjailedRoot()),

Once the whole filter part of the query is constructed, it is passed to the searchOperatorToDBExpr method of the SearchBuilder here:

server/lib/private/Files/Cache/QuerySearchHelper.php

Line 169 in 1cadbaf

$searchExpr = $this->searchBuilder->searchOperatorToDBExpr($builder, $filter);

Inside the SearchBuilder the query tree is analysed and the SearchComparison object is converted to a set of values to create an SQL operator here:

server/lib/private/Files/Cache/SearchBuilder.php

Line 142 in 1cadbaf

private function getOperatorFieldAndValue(ISearchComparison $operator) {

The getOperatorFieldAndValue method has a special case for the path field that in some cases converts it to path_hash = md5(path):

server/lib/private/Files/Cache/SearchBuilder.php

Line 176 in 1cadbaf

$field = 'path_hash';

Sometimes the path comparison is not converted into path_hash. If I understand correctly the hint used above can be set by the query optimizer invoked from:

server/lib/private/Files/Cache/QuerySearchHelper.php

Line 167 in 1cadbaf

$this->queryOptimizer->processOperator($filter);

The actual setting of the hint is here:

server/lib/private/Files/Search/QueryOptimizer/PathPrefixOptimizer.php

Line 41 in 1cadbaf

if ($a instanceof ISearchComparison && $b instanceof ISearchComparison && $a->getField() === 'path' && $b->getField() === 'path') {

If I understand all of this correctly, for single file shares the path_hash comparison was always used before my changes.

Then I would feel a lot better if someone who actually understand DB queries perf can review this. I do not have enough knowledge.

We have to be sure your changes does not slow down some other usecases. And it’s complicated to test because some of the optimisations we have are database-specific I think.

come-nc · 2023-03-13T10:49:38Z

Thoughts:

Maybe the support for IN in the SearchBuilder should work on path and automagically convert to path_hash like it’s done for =.
If possible this should be split into one PR that adds the support for IN and one that uses it to ease revert in case we get into trouble.

starypatyk · 2023-03-14T21:41:05Z

@come-nc

Maybe the support for IN in the SearchBuilder should work on path and automagically convert to path_hash like it’s done for =.

Seems doable - but feels more complex and likely more difficult to analyse. On the other hand, I am now wondering if the PathPrefixOptimizer is used at all after my changes. I am not sure how to verify this. I see that the QueryOptimizer is a relatively new piece of code (Aug 2021) created by @icewind1991.

If possible this should be split into one PR that adds the support for IN and one that uses it to ease revert in case we get into trouble.

Certainly possible. Should I go ahead and split this into a separate PR?

XueSheng-GIT · 2023-06-07T15:38:02Z

@starypatyk if you need someone to do some testing, just let me know. I've a lot of shares and search is really slow (takes longer than a minute).

AndyScherzinger · 2023-06-08T20:01:45Z

lib/private/Files/Cache/QuerySearchHelper.php

+				ISearchBinaryOperator::OPERATOR_AND,
+				[
+					new SearchComparison(ISearchComparison::COMPARE_EQUAL, 'storage', $storage),
+					new SearchComparison(ISearchComparison::COMPARE_IN, 'path_hash', array_map(fn ($path) => md5($path), $paths))


Be careful with "IN" it'll break i.e. on Oracle for 1000+ items.

Indeed. Would it make sense to provide a workaround at a lower level?

Rewrite the query with a sequence of ((x IN (list1-1000)) or (x IN (list1001-2000)) ...) e.g. here:

server/lib/private/DB/QueryBuilder/ExpressionBuilder/OCIExpressionBuilder.php

Line 121 in ca9d6db

public function in($x, $y, $type = null): string {

I thought this might be resolved already by Doctrine DBAL, but apparently a similar workaround has been rejected. 😞

I can't tell since I am not an expert on php/doctrine, but @nickvergessen @juliushaertl or @icewind1991 might know.

We are using (IN(1-1000) OR IN(1001-2000)) in mail: https://github.com/nextcloud/mail/blob/0da32c97f64e0cdf950368257447ec8fa6fe7fea/lib/Db/MessageMapper.php#L986-L992.

Not a big fan, but there is no other way if you want to work with all IDs in one result.

starypatyk · 2023-06-08T21:30:51Z

@XueSheng-GIT

@starypatyk if you need someone to do some testing, just let me know. I've a lot of shares and search is really slow (takes longer than a minute).

Would be great to know if these changes improve performance in other cases. 👍

Would you be able to pick up my changes directly from GitHub? I have rebased this PR today - it should be easier to apply it on a recent Nextcloud version.

XueSheng-GIT · 2023-06-20T18:43:35Z

I've applied this pull to my testing instance (NC 27.0.0). I've added 1,000 shares via talk to a test account (share_NNNN.txt).
Because this is especially about webdav search, i took the official reference to perform a search: https://docs.nextcloud.com/server/27/developer_manual/client_apis/WebDAV/search.html

Search for filename 'test' (I hope that's the right way to do this, I never used webdav search so far ;) ):

$ time curl -u username:pass 'https://cloud.domain.invalid/remote.php/dav/' -X SEARCH -H "content-Type: text/xml" --data '<?xml version="1.0" encoding="UTF-8"?>
 <d:searchrequest xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">
     <d:basicsearch>
         <d:select>
             <d:prop>
                 <oc:fileid/>
                 <d:displayname/>
                 <d:getcontenttype/>
                 <d:getetag/>
                 <oc:size/>
             </d:prop>
         </d:select>
         <d:from>
             <d:scope>
                 <d:href>/files/username</d:href>
                 <d:depth>infinity</d:depth>
             </d:scope>
         </d:from>
         <d:where>
             <d:like>
                 <d:prop>
                     <d:displayname/>
                 </d:prop>
                 <d:literal>test</d:literal>
             </d:like>
         </d:where>
         <d:orderby/>
    </d:basicsearch>
</d:searchrequest>'
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns"/>

real	0m31,679s
user	0m0,043s
sys	0m0,008s

Although, I've applied the patch, the query log still includes all those blocks:
(("storage" = $39) AND ("path" = $40)) OR

Full log:
webdav-search_patched.log

@starypatyk, anything special which must be done after applying this pull?

starypatyk · 2023-06-20T20:52:21Z

Hi @XueSheng-GIT,

The patch should "just work" - no special configuration required.

For the next two weeks I will not be able to diagnose this. I should have time in early July.

starypatyk · 2023-07-06T22:10:56Z

Hi @XueSheng-GIT,

Your WebDAV search query looks correct. I have tried it on my dev instance and it behaves as expected - i.e. the sequence of individual conditions like:

((`storage` = :dcValue5) AND (`path` = :dcValue6)) OR

gets replaced by a much shorter list of:

((`storage` = :dcValue5) AND (`path_hash` IN (:dcValue6))) OR

I do not understand yet what may be the reason this does not work for you. I have committed additional diagnostic code that might help me diagnose the issue.

Would you be able to pick the newest version of code from this PR and re-run your tests?

Please make sure to set logging level to DEBUG as described in the documentation and provide relevant fragment of the nextcloud.log file. There should be many lines similar to this:

{"reqId":"vJekSFwbXgIqyHuokfVX","level":0,"time":"2023-07-06T22:07:54+00:00","remoteAddr":"192.168.21.4","user":"admin","app":"no app in context","method":"SEARCH","url":"/remote.php/dav/?log_secret=f9e9137b0bfde6498e70b75c91f6a5c8","message":"generateStorageFilters: cache: OCA\\Files_Sharing\\Cache","userAgent":"curl/7.81.0","version":"28.0.0.0","data":[]}

XueSheng-GIT · 2023-07-07T12:45:47Z

@starypatyk thanks for following up.

I've picked up the newest version. Loglevel was already set to DEBUG on my test instance.
nextcloud.log contains many of those lines. But Search query is still unchanged and slow.

{"reqId":"fbrG5wnYyA9kLkOlSfVt","level":0,"time":"2023-07-07T14:29:16+02:00","remoteAddr":"fd64:c2a2:5c5e:e201:703c:a494:309:f1e7","user":"usename","app":"no app in context","method":"SEARCH","url":"/remote.php/dav/","message":"generateStorageFilters: cache: OCA\\FilesAccessControl\\CacheWrapper","userAgent":"curl/7.81.0","version":"27.0.0.8","data":[]}

Do you need a more complete log? I don't want to paste it here for privacy reasons.

starypatyk · 2023-07-07T21:56:06Z

Hi @XueSheng-GIT,

Thank you for the log line. It does help much 👍 A key difference is here:

My environment:

"generateStorageFilters: cache: OCA\\Files_Sharing\\Cache"

Your environment:

"generateStorageFilters: cache: OCA\\FilesAccessControl\\CacheWrapper"

I suspect that you use the File access control app and my code is not prepared to handle this case.

I need to rethink the approach.

XueSheng-GIT · 2023-07-09T05:50:08Z

I've disabled File access control to double check the results.

nextcloud log now contains:

{"reqId":"KeYhbHbr8HH0mwj2J6kC","level":0,"time":"2023-07-09T07:28:42+02:00","remoteAddr":"fd64:c2a2:5c5e:e201:b442:4193:b930:e5c8","user":"username","app":"no app in context","method":"SEARCH","url":"/remote.php/dav/","message":"generateStorageFilters: cache: OCA\\Files_Sharing\\Cache","userAgent":"curl/7.81.0","version":"27.0.0.8","data":[]}

Webdav search used:

$ time curl -u username:pass 'https://cloud.domain.invalid/remote.php/dav/' -X SEARCH -H "content-Type: text/xml" --data '<?xml version="1.0" encoding="UTF-8"?>
 <d:searchrequest xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">
     <d:basicsearch>
         <d:select>
             <d:prop>
                 <oc:fileid/>
                 <d:displayname/>
                 <d:getcontenttype/>
                 <d:getetag/>
                 <oc:size/>
             </d:prop>
         </d:select>
         <d:from>
             <d:scope>
                 <d:href>/files/username</d:href>
                 <d:depth>infinity</d:depth>
             </d:scope>
         </d:from>
         <d:where>
             <d:like>
                 <d:prop>
                     <d:displayname/>
                 </d:prop>
                 <d:literal>search-string</d:literal>
             </d:like>
         </d:where>
         <d:orderby/>
    </d:basicsearch>
</d:searchrequest>'
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns"/>

real	0m1,117s
user	0m0,052s
sys	0m0,004s

pgsql query log:
webdav-search_patched+disabledfac.log

That's a major increase in search speed. 31s reduced to 1s!!!
Thanks a lot for working on this @starypatyk!

starypatyk · 2023-07-09T12:07:27Z

Hi @XueSheng-GIT,

That's a major increase in search speed. 31s reduced to 1s!!!

Great to know that this change can help for your case! Thanks 👍

I am now considering a different approach that should also work with the File access control plugin.

BTW - The queries in your logs seem to suggest that you use PostgreSQL as the database backend. Could you please confirm/deny?

XueSheng-GIT · 2023-07-09T12:30:03Z

BTW - The queries in your logs seem to suggest that you use PostgreSQL as the database backend. Could you please confirm/deny?

Yes, I'm using postgreSQL.

starypatyk · 2023-07-11T19:54:04Z

Hi @XueSheng-GIT,

I have implemented a slightly different approach. The code should now work also when the File access control plugin is active.

Would you be able to get the latest changes and verify?

XueSheng-GIT · 2023-07-12T18:25:31Z

After applying the latest changes, webdav search fails (with and without File access control enabled). Reverting to the previous patch makes it work again.

Here's my search and the error message.

curl -u username:pass 'https://cloud.domain.invalid/remote.php/dav/' -X SEARCH -H "content-Type: text/xml" --data '<?xml version="1.0" encoding="UTF-8"?>
 <d:searchrequest xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">
     <d:basicsearch>
         <d:select>
             <d:prop>
                 <oc:fileid/>
                 <d:displayname/>
                 <d:getcontenttype/>
                 <d:getetag/>
                 <oc:size/>
             </d:prop>
         </d:select>
         <d:from>
             <d:scope>
                 <d:href>/files/username</d:href>
                 <d:depth>infinity</d:depth>
             </d:scope>
         </d:from>
         <d:where>
             <d:like>
                 <d:prop>
                     <d:displayname/>
                 </d:prop>
                 <d:literal>search-string</d:literal>
             </d:like>
         </d:where>
         <d:orderby/>
    </d:basicsearch>
</d:searchrequest>'

Error message (response within the console):

<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>TypeError</s:exception>
  <s:message>OC\Files\Cache\QuerySearchHelper::checkStorageAndPathFilter(): Argument #1 ($operator) must be of type OCP\Files\Search\ISearchBinaryOperator, OC\Files\Search\SearchComparison given, called in /var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php on line 109</s:message>
</d:error>

nextcloud.log

{"reqId":"GPoTv8fK8thbreYbQvtz","level":3,"time":"2023-07-12T20:15:42+02:00","remoteAddr":"fd64:c2a2:5c5e:e201:1a88:be31:f4b:6ec","user":"username","app":"webdav","method":"SEARCH","url":"/remote.php/dav/","message":"OC\\Files\\Cache\\QuerySearchHelper::checkStorageAndPathFilter(): Argument #1 ($operator) must be of type OCP\\Files\\Search\\ISearchBinaryOperator, OC\\Files\\Search\\SearchComparison given, called in /var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php on line 109","userAgent":"curl/7.81.0","version":"27.0.0.8","exception":{"Exception":"TypeError","Message":"OC\\Files\\Cache\\QuerySearchHelper::checkStorageAndPathFilter(): Argument #1 ($operator) must be of type OCP\\Files\\Search\\ISearchBinaryOperator, OC\\Files\\Search\\SearchComparison given, called in /var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php on line 109","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php","line":109,"function":"checkStorageAndPathFilter","class":"OC\\Files\\Cache\\QuerySearchHelper","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php","line":129,"function":"generateStorageFilters","class":"OC\\Files\\Cache\\QuerySearchHelper","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php","line":218,"function":"applySearchConstraints","class":"OC\\Files\\Cache\\QuerySearchHelper","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/Node/Folder.php","line":239,"function":"searchInCaches","class":"OC\\Files\\Cache\\QuerySearchHelper","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Files/FileSearchBackend.php","line":191,"function":"search","class":"OC\\Files\\Node\\Folder","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Files/LazySearchBackend.php","line":63,"function":"search","class":"OCA\\DAV\\Files\\FileSearchBackend","type":"->"},{"file":"/var/www/nextcloud/3rdparty/icewind/searchdav/src/DAV/SearchHandler.php","line":82,"function":"search","class":"OCA\\DAV\\Files\\LazySearchBackend","type":"->"},{"file":"/var/www/nextcloud/3rdparty/icewind/searchdav/src/DAV/SearchPlugin.php","line":119,"function":"handleSearchRequest","class":"SearchDAV\\DAV\\SearchHandler","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/event/lib/WildcardEmitterTrait.php","line":89,"function":"searchHandler","class":"SearchDAV\\DAV\\SearchPlugin","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":472,"function":"emit","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":253,"function":"invokeMethod","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":321,"function":"start","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Server.php","line":364,"function":"exec","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/apps/dav/appinfo/v2/remote.php","line":35,"function":"exec","class":"OCA\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/remote.php","line":172,"args":["/var/www/nextcloud/apps/dav/appinfo/v2/remote.php"],"function":"require_once"}],"File":"/var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php","Line":83,"message":"OC\\Files\\Cache\\QuerySearchHelper::checkStorageAndPathFilter(): Argument #1 ($operator) must be of type OCP\\Files\\Search\\ISearchBinaryOperator, OC\\Files\\Search\\SearchComparison given, called in /var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php on line 109","exception":[],"CustomMessage":"OC\\Files\\Cache\\QuerySearchHelper::checkStorageAndPathFilter(): Argument #1 ($operator) must be of type OCP\\Files\\Search\\ISearchBinaryOperator, OC\\Files\\Search\\SearchComparison given, called in /var/www/nextcloud/lib/private/Files/Cache/QuerySearchHelper.php on line 109"},"id":"64aeedd504caf"}

This test was done on NC 27.0.0.
@starypatyk, any idea why it does fail now?

starypatyk · 2023-07-12T20:44:15Z

Hi @XueSheng-GIT,

Sorry, this was my mistake. 😞

I hope that I have corrected it in my last commit. If possible, please try again.

XueSheng-GIT · 2023-07-13T04:08:47Z

Thanks! A quick test was succesful with the updated pull and as fast as the previous approach. Tested with and without File access control.
I'm just on my mobile, thus a bit limited with testing. I will collect some logs today or tomorrow to verify that everything looks as expected.

XueSheng-GIT · 2023-07-13T17:54:05Z

Here are the logs using the latest pull on NC 27.0.0 with File access control enabled (same webdav search as before #37061 (comment)).

pgsql query log:
webdav-search_patched-4.log

nextcloud log:
nextcloud_patched-4.log

Search speed is still amazing (same as with previous version of the pull). I was not able to see a difference with/without File access control enabled.

I've also applied this patch to one of my production systems (NC 26.0.3). At least initial testing went fine so far.

@starypatyk, thanks again for your effort! Hope that this will find its way to a stable release.
Just let me know if you need further testing/logs from my side.

EDIT:
Forgot to mention and maybe it's obvious, but unified search of the web interface is now fast too! This used to time out on my test instance (>60s). Now the results are shown within 1-2s.

starypatyk · 2023-07-13T19:41:03Z

Hi @XueSheng-GIT,

Thanks for your testing and for the logs. 👍 Good to know that my changes help.

I now need to do some clean-up before submitting it for final review.

lib/private/Files/Cache/QuerySearchHelper.php

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com> (cherry picked from commit 0485c07)

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com> (cherry picked from commit 1cadbaf)

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

starypatyk · 2023-07-14T21:43:42Z

Hi @AndyScherzinger, @come-nc, @icewind1991,

After conversation with @XueSheng-GIT, I have updated this PR recently. It seems that these changes can provide even greater performance improvements is some cases - see e.g.: #35776 (comment)

Would you be able to review my changes again?

I have changed the approach slightly - currently I post-process the queries generated by getQueryFilterForStorage() calls.

I tried to put the optimization code in an additional QueryOptimizerStep in the existing QueryOptimizer code, but I was not happy with the result. The code to find the original storage/path conditions was quite awkward and complex. Doing this directly after obtaining the storageFilters seemed more natural:

server/lib/private/Files/Cache/QuerySearchHelper.php

Line 154 in 2987c0d

$optimizedStorageFilters = $this->optimizeStorageFilters($storageFilters);

I do not plan any more changes, unless there are additional suggestions from the review.

There are a few additional topics that I think could be addressed in follow-up PRs:

This might break on Oracle when the IN list gets longer than 1000 items. I do not know if it is better to always split the list here when generating the query tree, or in the QueryBuilder specific for Oracle:

server/lib/private/DB/QueryBuilder/ExpressionBuilder/OCIExpressionBuilder.php

Line 121 in ca9d6db

public function in($x, $y, $type = null): string {
There is a related issue [Bug]: Unified Query Allows Unwieldy SQL Statement With External Storages #38653. At first sight my optimizations will not help in this case. But it seems quite straightforward to include additional optimization for this case in the same manner.
I am not sure how my changes interfere with the QueryOptimizer.

come-nc

Could you also test the opposite case, for a search with not a lot of shared files, which is currently fast (<1s), does your PR add visible overhead?

come-nc · 2023-07-17T13:47:36Z

lib/private/Files/Cache/QuerySearchHelper.php

+				$storage = $a->getValue();
+				/** @psalm-suppress UndefinedInterfaceMethod */
+				$path = $b->getValue();
+				\OC::$server->getLogger()->debug("QuerySearchHelper::checkStorageAndPathFilter: storage=" . $storage . " " . "path=" . $path);


Suggested change

\OC::$server->getLogger()->debug("QuerySearchHelper::checkStorageAndPathFilter: storage=" . $storage . " " . "path=" . $path);

$this->logger->debug("QuerySearchHelper::checkStorageAndPathFilter: storage=" . $storage . " " . "path=" . $path);

I forgot to remove the logging. This was useful during development, quite unlikely to be useful in the future. Removed in f86dab9

come-nc · 2023-07-17T13:49:41Z

lib/private/Files/Cache/QuerySearchHelper.php

+		// Create filters for single file shares
+		$singleFileFilters = [];
+		foreach ($storageToPathsMap as $storage => $paths) {
+			\OC::$server->getLogger()->debug("QuerySearchHelper::optimizeStorageFilters: storage=" . $storage . " " . "paths=" . implode(", ", $paths));


Suggested change

\OC::$server->getLogger()->debug("QuerySearchHelper::optimizeStorageFilters: storage=" . $storage . " " . "paths=" . implode(", ", $paths));

$this->logger->debug("QuerySearchHelper::optimizeStorageFilters: storage=" . $storage . " " . "paths=" . implode(", ", $paths));

I forgot to remove the logging. This was useful during development, quite unlikely to be useful in the future. Removed in f86dab9

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

starypatyk · 2023-07-18T21:24:37Z

Hi @come-nc,

Could you also test the opposite case, for a search with not a lot of shared files, which is currently fast (<1s), does your PR add visible overhead?

I have not been able to measure the overhead. In my "production" environment I have a lot of shared files, so this does not apply. When I tried to measure differences in my dev environment with no shared files, variation of timing between the runs (100-140 ms) was much higher than the difference between original code and my PR (~ 1 ms).

When there are no shared files the query should not be modified at all by my optimization code, so query execution time should not be affected.

Regarding time required to perform the optimization itself - this is an in-memory operation that should grow roughly linearly with the number of shares. So, even with thousands of shares should be negligible compared to the query execution time.

icewind1991 · 2023-09-21T12:51:43Z

I've started implementing something similar but at a level lower and more generalized: #40555

It applies similar transformations on the search operation before building the db query.

starypatyk · 2023-09-25T19:33:05Z

@icewind1991

I've started implementing something similar but at a level lower and more generalized: #40555

It applies similar transformations on the search operation before building the db query.

Looks great. 👍 I will close this PR when #40555 is merged.

skjnldsv · 2024-02-23T18:45:23Z

Looks great. 👍 I will close this PR when #40555 is merged.

#40555 is merged.

starypatyk added enhancement 3. to review Waiting for reviews feature: dav performance 🚀 labels Mar 6, 2023

github-advanced-security bot found potential problems Mar 6, 2023

View reviewed changes

apps/files_sharing/lib/Cache.php Fixed Show fixed Hide fixed

szaimen requested review from icewind1991, rullzer, a team, ArtificialOwl and come-nc and removed request for a team March 6, 2023 23:03

starypatyk force-pushed the dav-search-use-sql-in branch from 6291a67 to 1cadbaf Compare March 7, 2023 22:18

come-nc reviewed Mar 9, 2023

View reviewed changes

come-nc requested a review from artonge March 13, 2023 10:47

XueSheng-GIT mentioned this pull request Jun 6, 2023

[Bug]: Unified Query Allows Unwieldy SQL Statement With External Storages #38653

Open

9 tasks

starypatyk force-pushed the dav-search-use-sql-in branch from 1cadbaf to 7c958d1 Compare June 8, 2023 19:48

AndyScherzinger reviewed Jun 8, 2023

View reviewed changes

XueSheng-GIT mentioned this pull request Jul 9, 2023

[Bug]: File and tag search slow if a lot of shares present #35776

Closed

9 tasks

github-advanced-security bot found potential problems Jul 13, 2023

View reviewed changes

lib/private/Files/Cache/QuerySearchHelper.php Fixed Show fixed Hide fixed

lib/private/Files/Cache/QuerySearchHelper.php Fixed Show fixed Hide fixed

starypatyk added 7 commits July 14, 2023 22:17

Use SQL IN operator for single file shares

51b8e53

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com> (cherry picked from commit 0485c07)

Code scanning / Psalm

f4486d9

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com> (cherry picked from commit 1cadbaf)

DIagnostic logging

e063d9f

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

Alternative approach - refactor generated filter

7814687

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

Fix incorrect assumption in checkStorageAndPathFilter

f649864

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

Small refactoring and code clean-up

5e1e692

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

psalm false positive

2987c0d

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

starypatyk force-pushed the dav-search-use-sql-in branch from 6075f81 to 2987c0d Compare July 14, 2023 20:19

come-nc reviewed Jul 17, 2023

View reviewed changes

Remove forgotten debug logging

f86dab9

Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>

blizzz added this to the Nextcloud 29 milestone Nov 23, 2023

skjnldsv added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Feb 23, 2024

skjnldsv closed this Feb 23, 2024

skjnldsv removed the 2. developing Work in progress label Feb 23, 2024

skjnldsv deleted the dav-search-use-sql-in branch February 23, 2024 18:45

skjnldsv removed this from the Nextcloud 29 milestone Feb 24, 2024

	\OC::$server->getLogger()->debug("QuerySearchHelper::checkStorageAndPathFilter: storage=" . $storage . " " . "path=" . $path);
	$this->logger->debug("QuerySearchHelper::checkStorageAndPathFilter: storage=" . $storage . " " . "path=" . $path);

Performance of WebDAV search requests with many shared files #37061

Performance of WebDAV search requests with many shared files #37061

Conversation

starypatyk commented Mar 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

come-nc commented Mar 13, 2023

starypatyk commented Mar 14, 2023

XueSheng-GIT commented Jun 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

starypatyk commented Jun 8, 2023

XueSheng-GIT commented Jun 20, 2023

starypatyk commented Jun 20, 2023

starypatyk commented Jul 6, 2023 • edited Loading

XueSheng-GIT commented Jul 7, 2023 • edited Loading

starypatyk commented Jul 7, 2023

XueSheng-GIT commented Jul 9, 2023 • edited Loading

starypatyk commented Jul 9, 2023

XueSheng-GIT commented Jul 9, 2023

starypatyk commented Jul 11, 2023

XueSheng-GIT commented Jul 12, 2023

starypatyk commented Jul 12, 2023

XueSheng-GIT commented Jul 13, 2023

XueSheng-GIT commented Jul 13, 2023 • edited Loading

starypatyk commented Jul 13, 2023

starypatyk commented Jul 14, 2023

come-nc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

starypatyk commented Jul 18, 2023

icewind1991 commented Sep 21, 2023

starypatyk commented Sep 25, 2023

skjnldsv commented Feb 23, 2024 • edited Loading

starypatyk commented Jul 6, 2023 •

edited

Loading

XueSheng-GIT commented Jul 7, 2023 •

edited

Loading

XueSheng-GIT commented Jul 9, 2023 •

edited

Loading

XueSheng-GIT commented Jul 13, 2023 •

edited

Loading

skjnldsv commented Feb 23, 2024 •

edited

Loading