Speedup Trilogy#escape by 3 to 5x #212

byroot · 2024-12-03T11:32:22Z

Copying characters one by one is very inneficient, especially since each time the buffer capacity has to be checked again and again.

Using memcpy for chunks of bytes that don't need any escaping, we can save a lot of time.

Before:

ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
            2k-plain    23.802k i/100ms
       2k-few-escape    22.693k i/100ms
Calculating -------------------------------------
            2k-plain    239.746k (± 0.5%) i/s    (4.17 μs/i) -      1.214M in   5.063403s
       2k-few-escape    227.900k (± 0.6%) i/s    (4.39 μs/i) -      1.157M in   5.078454s

After:

Warming up --------------------------------------
            2k-plain    94.975k i/100ms
       2k-few-escape    61.998k i/100ms
Calculating -------------------------------------
            2k-plain      1.039M (± 1.0%) i/s -      5.224M in   5.025882s
       2k-few-escape    658.756k (± 1.3%) i/s -      3.348M in   5.082975s

Bench:

require "trilogy"
require "benchmark/ips"
t = Trilogy.new(database: "test")

plain = "A" * 2000
few_escapes = (("a" * 19) + "\n") * 100
Benchmark.ips do |x|
  x.report("2k-plain") { t.escape(plain) }
  x.report("2k-few-escape") { t.escape(few_escapes) }
end

@jhawthorn @matthewd

jhawthorn · 2024-12-19T01:36:54Z

inc/trilogy/buffer.h

+ *   TRILOGY_OK     - The character was appended to the buffer
+ *   TRILOGY_SYSERR - A system error occurred, check errno.
+ */
+int trilogy_buffer_puts(trilogy_buffer_t *buffer, const uint8_t *ptr, size_t len);


It's a nitpick, but puts to me implies a newline (like C's puts or Ruby's puts). Maybe we call this trilogy_buffer_write?

jhawthorn · 2024-12-19T01:37:52Z

src/client.c

+    // so might as well pre-expand the buffer.
+    CHECKED(trilogy_buffer_expand(b, len));
+
+    uint8_t *cursor = (uint8_t *)str;


It's technically valid as-is, but we should be able to preserve str's const

Suggested change

uint8_t *cursor = (uint8_t *)str;

const uint8_t *cursor = (const uint8_t *)str;

jhawthorn · 2024-12-19T01:38:06Z

src/client.c

-                CHECKED(trilogy_buffer_putc(b, '\''));
+    if (conn->server_status & TRILOGY_SERVER_STATUS_NO_BACKSLASH_ESCAPES) {
+        while (cursor < end) {
+            uint8_t *next_backslash = memchr(cursor, '\'', (size_t)(end - cursor));


This isn't actually the next backslash, it's the next quote. Maybe we call this next_escape?

jhawthorn · 2024-12-19T01:38:36Z

Looks great to me other than minor naming issues!

Copying characters one by one is very inneficient, especially since each time the buffer capacity has to be checked again and again. Using `memcpy` for chunks of bytes that don't need any escaping, we can save a lot of time. Before: ``` ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- 2k-plain 23.802k i/100ms 2k-few-escape 22.693k i/100ms Calculating ------------------------------------- 2k-plain 239.746k (± 0.5%) i/s (4.17 μs/i) - 1.214M in 5.063403s 2k-few-escape 227.900k (± 0.6%) i/s (4.39 μs/i) - 1.157M in 5.078454s ``` After: ``` Warming up -------------------------------------- 2k-plain 94.975k i/100ms 2k-few-escape 61.998k i/100ms Calculating ------------------------------------- 2k-plain 1.039M (± 1.0%) i/s - 5.224M in 5.025882s 2k-few-escape 658.756k (± 1.3%) i/s - 3.348M in 5.082975s ``` Bench: ```ruby require "trilogy" require "benchmark/ips" t = Trilogy.new(database: "test") plain = "A" * 2000 few_escapes = (("a" * 19) + "\n") * 100 Benchmark.ips do |x| x.report("2k-plain") { t.escape(plain) } x.report("2k-few-escape") { t.escape(few_escapes) } end ```

jhawthorn · 2024-12-19T21:17:49Z

Thank you!

byroot force-pushed the opt-escape branch 8 times, most recently from bad7364 to 9c56566 Compare December 3, 2024 16:05

jhawthorn reviewed Dec 19, 2024

View reviewed changes

byroot force-pushed the opt-escape branch from 9c56566 to 3c0b787 Compare December 19, 2024 07:46

byroot requested a review from jhawthorn December 19, 2024 07:46

jhawthorn approved these changes Dec 19, 2024

View reviewed changes

jhawthorn merged commit b517fcf into trilogy-libraries:main Dec 19, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup Trilogy#escape by 3 to 5x #212

Speedup Trilogy#escape by 3 to 5x #212

byroot commented Dec 3, 2024

jhawthorn Dec 19, 2024

jhawthorn Dec 19, 2024

jhawthorn Dec 19, 2024

jhawthorn commented Dec 19, 2024

jhawthorn commented Dec 19, 2024

	uint8_t cursor = (uint8_t )str;
	const uint8_t cursor = (const uint8_t )str;

Speedup Trilogy#escape by 3 to 5x #212

Speedup Trilogy#escape by 3 to 5x #212

Conversation

byroot commented Dec 3, 2024

jhawthorn Dec 19, 2024

Choose a reason for hiding this comment

jhawthorn Dec 19, 2024

Choose a reason for hiding this comment

jhawthorn Dec 19, 2024

Choose a reason for hiding this comment

jhawthorn commented Dec 19, 2024

jhawthorn commented Dec 19, 2024