Issue search results

Filter by

18 results

(58 ms)inJuliaText/WordTokenizers.jl (press backspace or delete to remove)

JuliaText/WordTokenizers.jl
HTML_Entities dependency doesn't work with PackageCompiler

Hi, I hope you are doing great Thank you for your effort in this package Just to report that the dependency HTML_Entities avoid this package to work with PackageCompiler This issue was fixed in this ...

AbrJA

Opened
on Aug 13, 2024

JuliaText/WordTokenizers.jl
Unable to install WordTokenizers.jl

julia Pkg.activate(temp=true) Activating new project at `/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_iVQTya` julia Pkg.add( WordTokenizers ) Resolving package versions... some logging ...

ablaom

Opened
on Apr 8, 2024

JuliaText/WordTokenizers.jl
Sentence tokenization must ignore newline as whitespace in the default mode.

Several times the paragraphs have new lines copied from the source document (particularly when copied from PDF) and they should be ignored when sentences are tokenized. This is the text taken from copying ...

sambitdash

Opened
on Feb 26, 2021

JuliaText/WordTokenizers.jl
Interest in Improving Sentence Tokenization

Hi @Ayushk4 - I was suggested by @oxinabox and @aviks to ping you. I am interested in investigating and improving the sentence tokenizers part of WordTokenizers.jl. Would that be of interest to you if ...

TheCedarPrince

Opened
on Jan 18, 2021

JuliaText/WordTokenizers.jl
Lowercasing each token in tokenize function

The tokenize function returns a vector of words(strings) when input string is passed. It doesn t lowercase each word by default. For example: julia text = This is a this sentence This is a this sentence ...

shikhargoswami

Opened
on Jan 4, 2021

JuliaText/WordTokenizers.jl
InitError on julia 1.5

I get an InitError with WordTokenizers on Julia 1.5 when using WordTokenizers in a package. Even if the package is almost empty. Here is an MWE (@v1.5) pkg generate TestToken # in shell $ cd TestToken/src ...

chengchingwen

Opened
on Aug 26, 2020

JuliaText/WordTokenizers.jl
Release latest version

I think it is better to release a new version of WordTokenizers with Statistical Tokenizer. It also serves as a dependency for TextAnalysis.ALBERT

tejasvaidhyadev

Opened
on Aug 25, 2020

JuliaText/WordTokenizers.jl
Benchmark against Rust library

We should benchmark agaisnt https://github.com/huggingface/tokenizers I don t expect for us to win, but it gives us a line to target against.

oxinabox

Opened
on Feb 5, 2020

JuliaText/WordTokenizers.jl
Add statistical tokenization algorithms

BERT and related models have been using statistical tokenization algorithms. These work well on out-of-vocab words with ML models. High-speed implementations of BPE / WordPiece etc. will be good additions ...

Ayushk4

Opened
on Jan 30, 2020

JuliaText/WordTokenizers.jl
Sentence spliting of sentences with out whitespace after period

julia WordTokenizers.split_sentences( This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. ) 7-element Array{SubString{String},1}: This is a sentence.Laugh Out Loud. Keep coding. ...

oxinabox

Opened
on Oct 11, 2019

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

JuliaText/WordTokenizers.jl
HTML_Entities dependency doesn't work with PackageCompiler

JuliaText/WordTokenizers.jl
Unable to install WordTokenizers.jl

JuliaText/WordTokenizers.jl
Sentence tokenization must ignore newline as whitespace in the default mode.

JuliaText/WordTokenizers.jl
Interest in Improving Sentence Tokenization

JuliaText/WordTokenizers.jl
Lowercasing each token in tokenize function

JuliaText/WordTokenizers.jl
InitError on julia 1.5

JuliaText/WordTokenizers.jl
Release latest version

JuliaText/WordTokenizers.jl
Benchmark against Rust library

JuliaText/WordTokenizers.jl
Add statistical tokenization algorithms

JuliaText/WordTokenizers.jl
Sentence spliting of sentences with out whitespace after period

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:JuliaText/WordTokenizers.jl language:Julia

Filter by

State

Advanced

18 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.