Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of node client #300

Merged
merged 8 commits into from
Jan 20, 2025
Merged

First version of node client #300

merged 8 commits into from
Jan 20, 2025

Conversation

m-chadda
Copy link
Collaborator

  • pdf to image
  • extension error
  • cropping error
  • getting iamges back, not working
  • .
  • pyscripts
  • fixed scaling factor for page
  • new compose
  • scaling factor issues
  • fixed pdla deploy
  • added image url stuff :
  • added image url stuff
  • task creation has image_url
  • rm threadlock and run in threadpool in pdla
  • image file path fixes
  • s3 upload issues
  • done with ocr
  • get tasks error fixed
  • process lock on / route
  • Refactored frontend
  • Rebased with main succesfully
  • Added OCR strategy and Task stats in viewer & status view
  • main test
  • removed segment density
  • Moved dashboard to pages
  • Added curl command placeholder for no tasks
  • deployed new pdla
  • added timing for ocr
  • added timing for ocr
  • table html to mkd
  • table html to mkd
  • table html to mkd
  • Cleaned up svg errors, added oCR slection on upload form, images optimized
  • fixing mkd table issue
  • fixing mkd table issue
  • adding \n as string literal
  • adding \n as string literal
  • | is not |?
  • doing it on frontend
  • doing it on frontend
  • base
  • frontend markdown resolved
  • Rebasing with main
  • Added tables markdown css, html viewing options, API key fixes for small screen menu dropdown
  • updated google timeout
  • list item fixes
  • list item ol start from give digit
  • clean ul
  • migrations
  • new migs
  • new api
  • new api spec
  • wiping pg
  • updating gpu
  • updated gpu
  • separated out task workers
  • new workers - fast - highquality
  • cleaned up pyscripts
  • refactoring the models
  • syntax error fix
  • fixed syntax error
  • fixed syntax error
  • fixed pyscripts
  • new dockers for processors
  • new processor names
  • cleanup
  • Need to add chunking strategy in upload form
  • Updating with main
  • Added HTML viewing fixes and chunking strategy button on upload form
  • No tasks state text prompt added
  • task service docker created
  • docker updated
  • docker updated
  • docker updated
  • added uv venv
  • added uv venv
  • new kube
  • paddle ocr issue
  • created init
  • paddle ocr issues, going to build with gpu
  • added libgomp1 to dockerfile
  • added openCV deps in docker
  • updated num of workers
  • repushing
  • new kube
  • unexpected end of data error
  • dockerfile updated becuase cudnn error
  • new kube
  • updated env for docker image
  • dockerignore update
  • docker image inly copying over essentials
  • docker image inly copying over essentials
  • fixed COPY
  • kube
  • deploying new frontend
  • task ocr parsed to float
  • get content updated
  • prints
  • idek
  • ready for deployment
  • deploying new version
  • deploying new version
  • deploymetn with ocr
  • deployment successful
  • new cmds
  • Fixed bug for API pricing calculator - fast pricing is now correct
  • updated api spec
  • Fixed billing amount bugs
  • better tests
  • stripe tested on dev
  • invoicing done
  • added authorization
  • clean up
  • rm unused code
  • deployed
  • token incorrect
  • fixed incorrect server url
  • rm cron from self deploy
  • deployed newest version
  • updated pdla
  • updated pdla
  • added
  • added threadpool
  • pdla code updated
  • clean up services
  • pdla fast new deployment
  • deploying higher fast throughput
  • updated number of connections
  • invoicing fixes
  • comment purge
  • deployed invoicer update
  • updated max dims for segments
  • updated task
  • cloudflare full strict
  • Added small fixes for frontend and OCRBBox view on hover + OCRText for OCRBBox on hover
  • Small fix - Removed random console.log
  • updating queue consumer
  • udpaed secrets
  • updated deployment file
  • Fixed MZ and Safari bugs for upload componenty on landing
  • silly
  • Removed speed approximations for Segmentation model cards
  • file support
  • deployed web
  • silly
  • added pollyfill for promise with resolvers
  • libre office integration
  • Fixed chunk_length undefined bug
  • open api spec
  • new venv
  • Removed speed approximations for Segmentation model cards
  • deployed web
  • added pollyfill for promise with resolvers
  • Fixed chunk_length undefined bug
  • open api spec
  • new venv
  • sending to task service
  • invalid header
  • idk headers in pdfs suck
  • updates to generate
  • qwen
  • qwen batch
  • qwen batch
  • qwen batch
  • qwen batch
  • qwen fixes
  • qwen fixes
  • qwen fixes
  • qwen batch
  • docx
  • ppt, docx support added
  • lfg
  • lfg
  • added task
  • updated accept list
  • merged
  • fixed conflicts in qwen
  • LLM based ocr for tables added
  • updated html logic
  • updated cost values
  • fixed conflict in process
  • fixed circular deps
  • fixed circular deps
  • updated default value
  • removed print statements
  • deploying new version with llm table support
  • deployed new version with docx, pptx support
  • added dropzone type
  • added module declaration for react-dropzone
  • deployed with LLM table support
  • updated num_workers
  • updated example kube env
  • adding o1 support
  • added latex api
  • testing o1
  • removed o1 stuff
  • fixed monthly order for usage and multi file type support
  • notices
  • Update README.md
  • Update LICENSE
  • Delete LICENSE
  • Create LICENSE
  • Create COMMERCIAL_LICENSE
  • bug fixes
  • prompting
  • Fixed free tier rollover on dashboard
  • Api dialog width fix
  • updated conversion
  • Added file support byline
  • added ppocr to table with llm
  • camelot for tables
  • deploying new web and backend
  • adding o1 support
  • added latex api
  • testing o1
  • removed o1 stuff
  • fixed monthly order for usage and multi file type support
  • bug fixes
  • prompting
  • updated conversion
  • added ppocr to table with llm
  • camelot for tables
  • Small fix in byline for homepage
  • adding sliding window to ppocr
  • img to table
  • img to table
  • ocr result fix
  • ocr result fix
  • fixed pyscripts scaling issues and time logging
  • fixed
  • Added a lot of phone fixes
  • moved file conversion to process, added pdf url to output
  • fixed pyscripts and gen signed url
  • rm prints
  • Removed authentication from phone
  • Fixed up pricing page header and styling
  • cleanup - unused code
  • cleanup - unused code
  • default payment method
  • new deployment
  • cooked
  • desc month
  • month desc
  • reverting
  • reverting
  • updated example secrets
  • pdf_url
  • added new logs for debugging
  • removed logs
  • adding logs to upload task
  • newset version deployed
  • changed api schema
  • implemented new segment models in task
  • update cropping
  • Implmented new bbox stuff on frontend
  • updated rust in docker
  • deploying api schema change
  • deploying api schema change
  • added message for usage limit exceeded
  • Using legacy version of pdfjs
  • Added posthog analytics
  • updated web
  • Update README.md
  • multi gpu support
  • fixed typo for curl command
  • fixed typo for curl command
  • Update README.md
  • Update README.md
  • updated readme
  • updated readme
  • updated readme
  • updated readme
  • increased chunkr timeout
  • increased timeout to 180s
  • Updates
  • added lazy static to client
  • updated terraform
  • add timeout for rrq
  • server on fire
  • server on fire
  • lfg
  • updating nodes to stop ooming
  • seperated ocr from segmentation
  • removed ocr lock
  • dep hell
  • throughput maxxing
  • load model for every request with thread locks
  • adding telemetry
  • ocr more stable
  • creating engine for each segment
  • fixed syntax errro
  • fixed syntax errro
  • fixed syntax errro
  • bentoml -> fastapi
  • added pacakges to pyproject
  • added logging
  • accepting segments as str of json
  • more logging
  • more logging
  • json 422 errors
  • json 422 errors
  • json 422 errors
  • json 422 errors
  • cleanup files
  • ocr bangs
  • pdf_file errors
  • removed async from conversion
  • cleanup
  • issues with file conversion
  • fastapi file response
  • tempfile errors
  • fixed for no conversion required
  • to_pdf updated
  • str -> path
  • pdf path issues
  • pdf path issues
  • removed positional argument
  • idk man
  • background task for deletion
  • ocr still fucky
  • added queue for ocr engines
  • ocr engine pooling
  • logging
  • Added copy button for copy/pasting markdown, html and JSON exactly as is. Can copy text from within chunks now as well.
  • Got rid of extra console logs
  • created ocr microservice
  • using ocr service
  • updated pyproject
  • updated pyproject
  • updated pyproject
  • updated pyproject
  • ocr test cmds added
  • adding parrallel support for ocr
  • adding parrallel support for ocr
  • adding telemetry
  • adding better telemetry
  • adding better telemetry
  • adding better telemetry
  • adding better telemetry
  • removed logs from ppocr
  • updated dockers
  • updated kube and docker
  • removed torch
  • updated dockers
  • updated docker
  • updated docker
  • updated docker
  • removed unused
  • deps suck
  • deployed new task and ocr services
  • udpated timeouts
  • gmi
  • turning ocr pff
  • OCR Off
  • if no table html show table image
  • segment chunk UI
  • deployed web
  • pdf file url implemented on web
  • deployed web
  • fixed bug with non pdf file viewing
  • deployed web
  • deployed web
  • made /process async
  • deployed
  • ocr patch
  • added multi OCR strategy
  • idk
  • added multi OCR config
  • added multi OCR config
  • added multi OCR config
  • new login strat
  • deploying
  • new login
  • new port
  • new port
  • new port
  • new port
  • new port
  • new port
  • new port
  • deploying with aws login
  • new port
  • new port
  • back to sq1
  • cool
  • changes to config
  • changes to config
  • changes to config
  • changes to config
  • changes to config
  • adding aws cli
  • deploying with aws textract support
  • adding curl
  • adding unzip
  • update
  • Update README.md
  • task deployed
  • installing into docker
  • debugging
  • added async context manager
  • updating login
  • Update README.md
  • removed print statements
  • Update README.md
  • deployes task
  • Update README.md
  • Update README.md
  • Update README.md
  • Update README.md
  • Added temp-banner for status updates to home page
  • Update README.md
  • deployed web
  • deployed web
  • removed out of date docker compose file
  • Fixed textract bboxes on frontend
  • Fixed OCR BBox stuff in python and for frontend
  • deployed task
  • rapid OCR slaps
  • pushing to l4
  • rapid ocr port change
  • added cuda support
  • added cuda support
  • gpu support for rapid
  • gpu support for rapid
  • opencv
  • adding unitable
  • Fixed import in data.py
  • Fixed imports trainer/utils.py
  • Trying diff image
  • Trying new image
  • Removing text recognition
  • It worksgit add -A
  • added lock
  • many workers
  • many workers
  • fixed ports on load balancer
  • fixed ports on load balancer
  • temp file cleanup
  • garbage collection
  • prints
  • rm comments and prints
  • adding unitable
  • Fixed import in data.py
  • Fixed imports trainer/utils.py
  • Trying diff image
  • Trying new image
  • Removing text recognition
  • It worksgit add -A
  • adding uv to unitable
  • update routes
  • updated uv deps
  • refactor
  • rm modular ports
  • gc collect in server
  • added rapid ocr to kube
  • updating rapidocr
  • updating rapidocr
  • added logging
  • rm modular ports
  • update readme
  • rapid ocr deployed succesfully
  • deployed rapidocr
  • created server
  • deployed task
  • deployed task
  • reverted task
  • server created
  • added python-dotenv
  • throw error if no models specfied
  • added logs
  • added logs
  • added logs
  • getting first file
  • downloading img as tempfile
  • Image from bytes
  • Image from bytes using bytesIO
  • Image from bytes using bytesIO
  • fixed syntax error
  • spliting routes
  • added html
  • added html
  • logging
  • logging
  • getting direct decode result
  • unable to get html
  • removed iPYTHON
  • test
  • better telemetry
  • more logs
  • more logs
  • using tempfile
  • loading model during inference
  • html fix
  • html fix
  • improving html output
  • improving html output
  • adding logger
  • adding logger
  • adding logger
  • error in response
  • debugging bbox
  • distributed queues
  • handing off
  • issues with file conversion
  • rb
  • updating pnpm lock:
  • rebasing
  • rebasing
  • textractor is poorly written
  • split into segmentation and ocr
  • rebasing
  • added paddle gpu
  • fixed logger bug
  • returning result
  • returning result
  • debugging
  • debugging
  • conversion errors
  • conversion errors
  • debugging
  • response tpye json
  • adding ororder
  • ocr testing script works - with preprocessing
  • ocr testing script works - with preprocessing
  • mapping ppocr to html
  • adding json output to server
  • done - outputs are mid
  • splitting
  • done with unitable
  • Preprocess worker done
  • segmentation worker udpated
  • preprocessing worker done
  • added cropping
  • added chunking
  • segmentation done
  • working on ocr worker
  • preprocessing tested
  • segmentation done
  • rapid ocr output payloads created
  • ocr done with rapid ocr
  • rebasing
  • rebasing
  • debugging rapid ocr
  • ocr works
  • distributed postprocessing and segmentation
  • queue splitting done
  • connected table struct to rust
  • done
  • rename table ocr
  • bbox output
  • binarization
  • constructing table
  • preprocess
  • preprocess
  • merged
  • merged
  • confidence scores
  • colspan row span
  • table ocr implemented - some issues with response
  • resize new table output
  • resize new table output
  • scaling issues
  • updated services
  • Preprocessing for ocr
  • removed preprocessing of image
  • Update README.md
  • Update README.md
  • Update README.md
  • Updates
  • updated dockers
  • added table-ocr docker
  • bruh
  • removed model downloading from docker
  • created kube .yaml files
  • added pdfium binary to docker
  • added wget to dockerfile
  • building dockers
  • preprocess copy error
  • preprocess copy error
  • fixed temp dir err
  • preprocess copy error
  • added col span and row span to cell
  • semaphore on rapid OCR server
  • added semaphore:
  • updates
  • updates
  • updated deployments
  • binary not found
  • fix link
  • fix link
  • alright
  • dokcer updated
  • binary validation check
  • fixed validation error
  • new deployment
  • better logs
  • new deployment
  • added file package to docker
  • added file package to docker
  • download with given file
  • new deploy
  • updated docker image
  • new deploy
  • 64but version of pdfium
  • adding grafana
  • new deploy
  • adding prometheus
  • not working out
  • added monitoring values.yaml
  • updated isDefaultDatasource
  • added service monitor
  • added rrq analytics
  • reverting rrq
  • added rrq api key
  • new deployments for rrq analytics
  • new deployments for rrq analytics
  • new deployments for rrq analytics
  • new deployments for rrq analytics
  • ingress update rrq analytics
  • deployment update rrq analytics
  • deployment update rrq analytics
  • new rrq
  • deployed
  • done
  • table preprocessing
  • ocr batching added
  • udpated docker and new ocr
  • logs on table ocr
  • new deployments
  • new deployments
  • redistribute pods
  • upgrade rrq
  • gpu consumption
  • cleanup
  • cleaned up task service artifacts
  • preprocess
  • minor refactor and cleanup
  • new rapid ocr models
  • postprocessing done
  • new models rapid ocr
  • better models
  • new rapid ocr models
  • new models rapid ocr
  • better models
  • done
  • gpu accelerator type
  • cooked
  • new deploys
  • rapid ocr cooked
  • throughput
  • fixed engines
  • engines suck
  • rapid ocr 1 engine
  • rm csvs
  • new rapid ocr deployment
  • updated rapid ocr
  • new rapid ocr deployment
  • fixed seralization error
  • new rapid ocr deployment
  • new deployments
  • new analytics
  • markdown tables fixed
  • mkd down fix deployed
  • testing changes
  • Service down header removed
  • Update README.md
  • Update README.md
  • Added github star and fork count to header
  • Removed console.log
  • new deploy
  • new deploy
  • pre apply discounts
  • deployed
  • Cahnges to github redirect button
  • better logs
  • new deploy
  • new deploy: libreoffice fix
  • deployment: github icon
  • Small fix for header spacing
  • adding github repo info
  • removed pritn statements
  • frontend updated
  • deployed: github info with token
  • page limits added
  • fixed trigger
  • deploy: page limits & discount bug fixes
  • edits to usage
  • Changed hardcoded limit values for fast and high quality on dashboard
  • deploy
  • deploy
  • updated example secret
  • updated self deployment
  • updates
  • adding paddle
  • sends request
  • updates
  • bruh
  • table ocr v1 done
  • new table ocr
  • adding compose.yaml
  • cleaning up local dev
  • compose yaml updated:
  • issues with task expiration fixed
  • postgres errors
  • working on postgres migration errors
  • works
  • server working
  • updated env
  • updated env
  • auto downloading pdfium binary works
  • Update README.md
  • wokring on pdla errors
  • segemetation working locally
  • wokring on table ocr
  • testing options
  • adding llm ocr
  • working on LLM models
  • added structured extraction
  • uncooking merge errors
  • rm unused code
  • server - structured extract done
  • done struc extract
  • added kube for embeddings
  • docker and kube for struc extract
  • added structured extract to ocr off
  • docker and kube for struc extract
  • docker and kube for struc extract
  • docker and kube for struc extract
  • docker and kube for struc extract
  • docker and kube for struc extract
  • new dev deploy
  • added structured extract to ocr off
  • new kube for tei
  • new ports
  • done struc
  • Frontend fixes for new output object
  • cleaned up struc extract code
  • docker and kube for struc extract
  • updaing pyscripts
  • updated pyscripts
  • model updated
  • hueristic for hallucinations
  • cleanup
  • cleanup
  • new deployment
  • merged properly
  • updated pyscripts
  • done with testing
  • added paddle service
  • added paddle service
  • done with paddle docker
  • docker compose updated
  • deploy: web
  • new prompts
  • quality fixes structured extract
  • new deployment
  • updated pyscripts
  • deploy: web
  • table recognition working
  • updated compose.yaml
  • ocr done
  • html to mkd v1
  • html to mkd v1
  • colspan works
  • mkd done
  • deploying new ocr
  • deploying changes
  • adding external s3 client to work with docker compose
  • adding external s3 client to work with docker compose
  • docker compose updated
  • updated envs
  • specfic dockers for different ocrs
  • specfic dockers for different ocrs
  • updated
  • updated
  • paddlex pipeline not working for table
  • paddle proxy working
  • added proxy to ocr services
  • fixed glibc version error
  • updated
  • updated kube with proxy
  • updated kube probe
  • adding error handling in html to mkd
  • deploy html fix
  • ocr worker html safety
  • fixing local dev
  • new prompts
  • quality fixes structured extract
  • compose with vllm added
  • updated
  • kube experimental added
  • readme updated
  • updated readme
  • updated readme
  • updated readme
  • updated readme
  • updated readme
  • Update README.md
  • :wq Wq added stuff to prompt fot table ocr
  • updated readme
  • added .env.example
  • updated LLM key in example.env
  • added readme
  • docker compose
  • rm segmentation
  • structured extract local works
  • structured extract local works
  • rm test
  • new model
  • very fast structured extract
  • changes to configs
  • configs updated
  • added vllm support for tables
  • updated table ocr
  • updated prefix
  • updated compose
  • VLLM ocr works
  • VLLM ocr works
  • refactor
  • table ocr with validation bangs
  • works
  • including prompts in binary
  • formula prompt added
  • tested
  • added throttle limits
  • cleaning up openai response
  • cherrypicked
  • experimental kube updated
  • removed host rules
  • bruh
  • idk
  • merged new seg strat
  • merged table ocr with structured extraction
  • built dockers and updated kube for tag d8eba61: cmd,fast,ocr,high,gen-ocr,post,pre,struc,table-ocr
  • done
  • done
  • search config
  • new kube
  • filter out empty chunks
  • new kube
  • prompt and default value changes
  • prompt and default value changes
  • new cube
  • updated openai call
  • working on proxy
  • using redis for rate limits
  • compose changes
  • leaky bucket rate limiter added with redis
  • minor bug fixes
  • minor bug fixes
  • minor bug fixes
  • paddle ocr test added
  • rate limits implemented
  • bugs
  • deployment: rate limits
  • deployment: rate limits
  • kube
  • bug fix
  • table test
  • added doctr service
  • doctr inference server created
  • adding doctr as ocr option
  • doctr ocr done
  • create new dockers
  • docker not working
  • fixed doctr docker
  • deploy: doctr ocr
  • deploy: doctr ocr
  • new dockers
  • doctr update
  • doctr update
  • doctr update
  • doctr cant get nvidia drivers
  • bulding docker with gpu
  • new dockerimage for doctr
  • new env example
  • new secrets
  • removed rrq message
  • new sha
  • added redis url
  • rebasing
  • testing more tables
  • ocr integrated
  • tested locally
  • new kube for llm table ocr
  • updated readme
  • updated readme
  • updated readme
  • new kube for llm table ocr
  • new kube for llm table ocr
  • timeout issue
  • ocr html validation removed
  • new kube
  • table ocr now fails on ocr error
  • new docker build
  • rrq timeout increase
  • new dockers
  • rebuild dockers
  • Fixed formula rendering for html and markdown in viewer
  • Final fixes
  • presigned url headers
  • compose
  • new dockers and kube for pdf signed url
  • removed dompurify
  • deploying formula support on web
  • Update compose.yaml
  • page llm ocr
  • tweaks to page prompt
  • tweaks to page prompt
  • view window
  • tweaks to page prompt
  • kube for page llm
  • updated session timeout
  • docs(README): fix incorrect git URL
  • updated docker compose to use paddle ocr
  • added comments
  • page llm ocr
  • tweaks to page prompt
  • tweaks to page prompt
  • view window
  • tweaks to page prompt
  • kube for page llm
  • updated session timeout
  • docs(README): fix incorrect git URL
  • Update compose.yaml
  • updated compose.yaml
  • added jobs
  • segmentation strategy:
  • new redoc
  • checkpointing tokenizer broken
  • Sending workstation stuff
  • segmentation strategy:
  • new redoc
  • expiration job created
  • added comments for descriptions
  • small changes
  • checkpoint on segmentation forward pass
  • new dockers
  • added check to only delete finished tasks
  • reading order and most prob segments done
  • Fixed table colspan rowspan rendering
  • terraform for azure
  • renamed kube files to share across providers
  • deleted old gcp kube - deprecated
  • not required for internal communications
  • removed backend config as that is only needed for ingress
  • adding support for cloudflare tunnel
  • cloudflare dns for tunnel and dns records
  • cloudflare removed dsn records
  • removed cloudflare terraform
  • adding helm support
  • added helm support for all the services with templates
  • set up secrets for helm and internal communications
  • updated shared secrets
  • added readmes
  • debugging
  • adding redis to kube
  • removed redis from terraform for azure
  • added gpu support for embeddings
  • added ngnix setup instructions
  • adding cloudflare tunnel support
  • cloudflared tunnel works
  • updated envs
  • kubernetes on azure
  • clean up
  • updated container name
  • recovered gcp-experimental
  • web bug fixes
  • increase max nodes on azure
  • adding postgres support
  • renamed
  • added postgress support with init db
  • added local filesystem support for s3
  • fixed bugs
  • postgres permission issues fixed
  • added pvc to redis, postgres and s3
  • s3proxy works with azure blob storage
  • added time slicing for gpus
  • time slicing works
  • Fixed table colspan rowspan rendering
  • terraform for azure
  • renamed kube files to share across providers
  • deleted old gcp kube - deprecated
  • not required for internal communications
  • removed backend config as that is only needed for ingress
  • adding support for cloudflare tunnel
  • cloudflare dns for tunnel and dns records
  • cloudflare removed dsn records
  • removed cloudflare terraform
  • adding helm support
  • added helm support for all the services with templates
  • set up secrets for helm and internal communications
  • updated shared secrets
  • added readmes
  • debugging
  • adding redis to kube
  • removed redis from terraform for azure
  • added gpu support for embeddings
  • added ngnix setup instructions
  • adding cloudflare tunnel support
  • cloudflared tunnel works
  • updated envs
  • kubernetes on azure
  • clean up
  • updated container name
  • recovered gcp-experimental
  • web bug fixes
  • increase max nodes on azure
  • adding postgres support
  • renamed
  • added postgress support with init db
  • added local filesystem support for s3
  • fixed bugs
  • postgres permission issues fixed
  • added pvc to redis, postgres and s3
  • s3proxy works with azure blob storage
  • added time slicing for gpus
  • time slicing works
  • added backend s3 config for terraform
  • updated readme
  • updated backend config
  • autoscale bug fixed
  • updated deployment config
  • deployment changes
  • add configMap to overwrite the envs with envFrom
  • added s3 backend to gcp
  • minio successfully added
  • updated web
  • Fixed table colspan rowspan rendering
  • terraform for azure
  • renamed kube files to share across providers
  • deleted old gcp kube - deprecated
  • not required for internal communications
  • removed backend config as that is only needed for ingress
  • adding support for cloudflare tunnel
  • cloudflare dns for tunnel and dns records
  • cloudflare removed dsn records
  • removed cloudflare terraform
  • adding helm support
  • added helm support for all the services with templates
  • set up secrets for helm and internal communications
  • updated shared secrets
  • added readmes
  • debugging
  • adding redis to kube
  • removed redis from terraform for azure
  • added gpu support for embeddings
  • added ngnix setup instructions
  • adding cloudflare tunnel support
  • cloudflared tunnel works
  • updated envs
  • kubernetes on azure
  • clean up
  • updated container name
  • recovered gcp-experimental
  • web bug fixes
  • increase max nodes on azure
  • adding postgres support
  • renamed
  • added postgress support with init db
  • added local filesystem support for s3
  • fixed bugs
  • postgres permission issues fixed
  • added pvc to redis, postgres and s3
  • s3proxy works with azure blob storage
  • added time slicing for gpus
  • time slicing works
  • struc extract changes
  • basic VGT splitting
  • embeddings on a100 by default
  • updated time sliciing
  • deploy: task expiration
  • new dpeloyment
  • removed api key
  • refactor
  • refactor
  • server for vgt done
  • rm pdla
  • cleanups
  • finalized vgt
  • cooked
  • cooked
  • created server for reading order
  • reading order server works
  • outputs cooked
  • reading order working with pairs
  • added visualizer
  • memory management for VGT
  • experimenting with word grid
  • mem management and reading order
  • oom protection
  • oom protection
  • done reading order
  • rm pr branch
  • merged
  • refactoring new worker with pipeline config
  • new models created
  • refactoring done with new api schema
  • added first pipeline
  • added first pipeline
  • first 2 steps work
  • clean up
  • clean up
  • clean up
  • added mimetype to postgres
  • ready for page processing
  • ocr implemented
  • rename
  • deleted unused services
  • deleted unused docker
  • ocr auto and all done
  • kill server management
  • kill server management
  • making segmentation models
  • added merging and chunking
  • rename step
  • rename step
  • updated chunking
  • refactored in and out models for vgt server
  • refactored in and out models for vgt server
  • retries on ocr with backoff
  • working on segmentation
  • edit readme
  • fix models
  • refactor
  • refactored clients to 1 file
  • json.load array deserialization error
  • segmentation works
  • image processing added to upload form
  • made high res images optional
  • cropping works
  • cropping done
  • generating html in segment processing
  • working on segment generation
  • added macro for llm templates
  • fixed typo
  • fixed typo
  • fixed typo
  • segment processing done
  • init for package
  • added sci kit learn clustering
  • cors
  • cors
  • working on upload model
  • working on upload model
  • added sci kit learn clustering
  • added sci kit learn clustering
  • added sci kit learn clustering
  • added sci kit learn clustering
  • added sci kit learn clustering
  • added sci kit learn clustering
  • reading order
  • ocr and segmentation concurrently
  • cropping issues
  • added sci kit learn clustering v1
  • segment processing works
  • cors
  • refactored rate limits
  • Picture config added
  • rewrite done
  • added changelog
  • fixed typo
  • final reading order
  • final reading order
  • done
  • expose threshold vgt
  • upadted readme
  • fixed dockerfiles
  • refactor
  • ok it works
  • increase task replicas
  • vgt
  • adding cpu and mac support
  • added cpu support
  • new vgt +reading order again
  • new vgt +reading order again
  • updated gcp deployment:
  • fixed cooked readme
  • gcp deployment successful
  • chunkr v1.1.0 services added
  • fixed docker issue
  • added cpu support
  • added cpu support
  • removed warning
  • updated segmentation service
  • connected improved segmentation with ocr
  • updated dockers
  • updated dockers
  • updated changelog
  • fix reading order edge case
  • added libreoffice
  • pdf plumber in tests
  • single col pdf fix
  • segmentation bug fix
  • updated readme
  • updated python client and added tests
  • updated python client and added tests
  • updated python client
  • added better docstrings and improved imports
  • removed unused imports
  • improved docstrings
  • removed ruff cache
  • removed test.tsx
  • added url and base64 support
  • refactored
  • deleted outdated code
  • working on configuration testing
  • ocr_strategy and expiration work
  • chunkr client works
  • json schema works
  • readme updated
  • publish new version of python client
  • removed uv.lock
  • renamed services
  • removed unused model
  • fixed random bugs
  • fixed ocr mapping bug
  • improved chunking
  • new client published
  • fix ol bug
  • deployed bug fixes
  • server
  • return batch in doctr as completed
  • Landing page improvements
  • Working on features page
  • Pricing container remaining on landing
  • Bottom features and pricing section progress
  • Bottom features and pricing section progress
  • Bottom features and pricing section progress
  • Starting dashboard overhaul
  • Working on new dashboard
  • Updates
  • Skeleton completed for landing
  • Changing features section
  • Image deletion
  • Remove tracked images
  • Fixing commit history
  • Fixing commit history
  • Working on feature section
  • Features section crads
  • Moving features sections
  • Pricing section sizing fixes
  • Updates to pricing section and home layout
  • Changed pricing cards
  • Footer rework
  • COmpleted landing page layout, styling and general skeleton. Cleanup, copy and finishing touches remaining
  • Added MomentumScroll for homepage
  • Dynamic additions
  • Adding table for dashboard
  • Dynamic table added
  • Viewer adjustments for overhaul
  • Working on latez rendering html view
  • Completed viewer v1
  • Viewer with detached panel infinite scroll
  • v2 of viewer completed
  • Fixed ref and horizontal press to scroll
  • Toast functionality and segment specific json viewing and copying
  • Small fixes for spacing sizing in Viewer
  • Resizing handle bar style fixes
  • Optimized loading times for viewer * table
  • Optimized routing
  • URL params optimized
  • Working on Upload dialog
  • Reverted formula messups and adding new upload component
  • Small fixes
  • Rebased succesfully with main (I think) - added some more nav
  • Small sizing fix for file name tag
  • Completed new upload component
  • Added logic for usage fetching and processing
  • Added taskbar vieing and better nav x
  • New api management component started
  • Completed viewer data piping
  • Checkpoint - viewer completed
  • Checkpoint - deleted older pages and ocmponents
  • Checkpoint: Fixes to upload component for different states on DLA toggle
  • Checkpoint: Task table almost completed (CRUD remaining)
  • Working on structured extraction viewer
  • Checkpoint: Extraction viewer complete
  • Task table completed with crud
  • Small fix for structured extraction viewing
  • Fixed timeout between scrolls on viewer
  • Commented out stuff from web for merging to main
  • Commented out stuff from web for merging to main
  • Fixes for svg build
  • Fixes for build errors
  • delete task done
  • cancelling task works
  • working on update task
  • added direct methods
  • ok update works
  • updated migrations
  • updating routes
  • fixed bug in cancellation
  • deployed with deletion bug fixed
  • new Task struct
  • updated task payload
  • reconstructs pipline from artifacts on update
  • added failure cases to task completion
  • task processing
  • struc
  • new upload implementation
  • 1 billi prompts
  • 1 billi prompts
  • update works
  • update works
  • segment llm prompting works
  • removed dompurify
  • added file to server
  • deployed crud
  • patch chunking
  • Fix for tables in markdown view - switching to new branch
  • Fixes for markdown tables, task table expiry date
  • Small fixes
  • Fixes for markdown table, task table expiry date and added messages to task table for frontend
  • updated client
  • cooked
  • deployments
  • small change to vgt server
  • updated client
  • add timeout as optional
  • rebased main
  • rebase
  • max batch size for segmentation
  • deployed new segmentation
  • inference
  • ?
  • cooked
  • reverted to old ocr
  • new
  • moved inside async lock
  • updated time slicing
  • stable deploy
  • merged
  • merged conflicts fixed
  • cool
  • added reranker
  • new inference
  • cool
  • added reranker
  • rust batching instead of batch/async
  • moved entierly over to batching instead of dynamic batching
  • new COMPOSE
  • new helm
  • new helm
  • batch inference done
  • filter out empty strings
  • filter out empty strings
  • se done
  • updated embeddings
  • add taskresponseasync to python client exports
  • client updates to url
  • done
  • refactor task responses
  • merged
  • rebased
  • async client cooked
  • removed aiohttp import
  • updated
  • added docstring
  • deleted unused scripts
  • 1/2 reading order bug fixed
  • Simplify reading order algorithm by removing column boundary checks and using only width ratio to identify wide elements.
  • Created v1 node client
  • First version of node client completed with simple tests
  • First version of node client completed with simple tests
  • Small fix for imports - publishing
  • Completed first stable version of node client
  • Changed certain models to classes and added fucntionality tests
  • Updated package.json for devDependencies
  • Rebasing with main

@m-chadda m-chadda linked an issue Jan 20, 2025 that may be closed by this pull request
@m-chadda m-chadda merged commit 490bc14 into main Jan 20, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: create node client
1 participant