Skip to content

Commit

Permalink
rust: add basic skipping
Browse files Browse the repository at this point in the history
The switch from enumberate() to a regular loop is actually slower for
the single-stepping case, but the performance gains from being able to
skip large chunks of data with minimal comparisons more than makes up
for it.

This is the most basic optimization for this type of algorithm and this
is the simplified version of it. If we're not in a match, check 20
characters ahead and if that also isn't a match, skip 20. If it is a
match, then just continue on with checking each byte.
  • Loading branch information
rmg committed Jun 29, 2023
1 parent 9003c0e commit 46270c3
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ implementations compare to each other.
| ripgrep | 0m1.709s | 0m1.541s | 0m0.147s |
| simple (Go) | 0m1.737s | 0m1.594s | 0m0.142s |
| simple (Rust) | 0m1.461s | 0m1.325s | 0m0.131s |
| skip (Rust) | 0m0.231s | 0m0.105s | 0m0.124s |
| simple (Node) | 0m6.458s | 0m6.043s | 0m0.627s |
| custom (C) | **0m0.222s** | **0m0.079s** | **0m0.141s** |

Expand Down
15 changes: 13 additions & 2 deletions main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,26 @@ fn hit(needle: &[u8]) {
fn scan_slice(inb: &[u8]) -> usize {
let mut count = 0;
let len = inb.len();
for (i, &b) in inb.into_iter().enumerate() {
let mut i = 0usize;
while i < len {
let b = inb[i];
if count == 0 && i+20 < len {
let bs = inb[i+20];
if !bs.is_ascii_digit() && !(b'a'..=b'f').contains(&bs) {
i += 20;
continue;
}
}
if b.is_ascii_digit() || (b'a'..=b'f').contains(&b) {
count += 1;
i += 1;
continue
}
if count == 40 {
hit(&inb[i-40..i]);
}
count = 0
count = 0;
i += 1;
}
if count == 40 {
hit(&inb[len-40..]);
Expand Down

0 comments on commit 46270c3

Please sign in to comment.