Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grep: add -o option #353

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

andrewliebenow
Copy link
Contributor

No description provided.

@andrewliebenow andrewliebenow force-pushed the grep-dash-o branch 3 times, most recently from 0c6ae8b to 86ffd89 Compare October 26, 2024 06:12
@andrewliebenow
Copy link
Contributor Author

Added an additional change to prevent grep from aborting/failing when the input contains non-UTF-8 bytes

@jgarzik
Copy link
Contributor

jgarzik commented Nov 3, 2024

Questions,

  1. I see that BSD and Linux grep both have -o. Is this option widely used?

  2. This is a significant change. Is there a performance impact, vs current code? A microbenchmark would be nice.

@andrewliebenow
Copy link
Contributor Author

I ran into a build script that was trying to use -o. I can't remember which program it was I was compiling, I'll try to find it.

Most of the changes are related to switching from UTF-8 types to raw bytes types. It's not too uncommon to want to use text processing tools on "mostly UTF-8 files" (in my case I was trying to search my shell history file, which somehow got some non-UTF-8 bytes in it).

Performance is mostly unchanged:

❯ hyperfine ' ./grep-after-pull-request -F adjective ./webster ' ' ./grep-before-pull-request -F adjective ./webster '
Benchmark 1:  ./grep-after-pull-request -F adjective ./webster 
  Time (mean ± σ):      96.5 ms ±   1.2 ms    [User: 88.4 ms, System: 7.1 ms]
  Range (min … max):    95.2 ms … 100.3 ms    30 runs
 
Benchmark 2:  ./grep-before-pull-request -F adjective ./webster 
  Time (mean ± σ):      94.7 ms ±   0.8 ms    [User: 87.1 ms, System: 6.8 ms]
  Range (min … max):    93.5 ms …  96.3 ms    30 runs
 
Summary
   ./grep-before-pull-request -F adjective ./webster  ran
    1.02 ± 0.02 times faster than  ./grep-after-pull-request -F adjective ./webster 
❯ hyperfine ' ./grep-after-pull-request adjective ./webster ' ' ./grep-before-pull-request adjective ./webster '                                                                 
Benchmark 1:  ./grep-after-pull-request adjective ./webster 
  Time (mean ± σ):     191.8 ms ±   3.9 ms    [User: 181.6 ms, System: 8.7 ms]
  Range (min … max):   183.3 ms … 195.6 ms    15 runs
 
Benchmark 2:  ./grep-before-pull-request adjective ./webster 
  Time (mean ± σ):     210.7 ms ±   3.5 ms    [User: 202.0 ms, System: 7.1 ms]
  Range (min … max):   206.6 ms … 220.4 ms    14 runs
 
Summary
   ./grep-after-pull-request adjective ./webster  ran
    1.10 ± 0.03 times faster than  ./grep-before-pull-request adjective ./webster 

"./webster" is "The 1913 Webster Unabridged Dictionary" from https://sun.aei.polsl.pl/~sdeor/index.php?page=silesia (just under 40 mebibytes).

@andrewliebenow
Copy link
Contributor Author

I couldn't find the program I had been trying to compile, but here's an interesting statistic:

I have a directory containing 42 Git repositories I've cloned. 4 of those repositories contain scripts that use grep -o:

https://github.com/hinto-janai/festival
https://github.com/landley/toybox
https://github.com/libjxl/libjxl
https://github.com/uutils/coreutils

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants