Did you know that English tokens represent more than 90ย % of generalist LLMs training dataย ?
Weโre OpenLLM Europe ๐ช๐บ, an Open Source community committed to empower LLM projects in all European languages, specifically medium and low-resource languages. We aim to build the first multimodal multilingual european model with partners all over the continent.
- OpenLLM-Europe ๐ช๐บ
- Discord: https://discord.com/invite/b5UQTWQn
- Contact:contact@openllm-europe.org - https://github.com/OpenLLM-Europe
Our work is 100% open and fits in with ALT-EDIC's mission, which you can discover here: https://language-data-space.ec.europa.eu/related-initiatives/alt-edic_en
- ALT = for Alliance for Language Technologies EDIC
- EDIC = European Digital Infrastructure Consortium
The mission of the ALT-EDIC is to develop a common European infrastructure in Language Technologies, focussing particularly on Large Language Models. It seeks to improve European competitiveness, increase the availability of European language data and uphold Europe's linguistic diversity and cultural richness. The ALT-EDIC is a multi-country project, run and funded by the Member States who have agreed to join it. By pooling resources, the members should achieve the critical mass of data and other resources needed to create and finetune Large Language Models, which any single member would find difficult to do alone.
OpenLLM Europe ๐ช๐บ is thus making its contribution to identifying and attempting to federate national initiatives to create LLMs or learning datasets. Our goal is to federate, create together & promote open source and sovereign Generative AI digital commons.
Here is a list of Open Source projects in AI (mostly LLMs) that we have gathered during our research.
Feel free to use it to build great things together. Feel free to amen it and add projects that we missed. PR are welcome ! Feel free to join our Discord server
- Insat - Contact:bggpt@insait.ai
- CroAI - https://www.linkedin.com/posts/croai_large-language-models-have-demonstrated-impressive-activity-7167796231417520128-AlDs/
- Czech BERT - Contact:sidoj@kiv.zcu.cz
- Danish foundation models - https://www.linkedin.com/in/saattrupdan/
- Danskgpt - Contact:vasu.sharma@recursionpharma.com
- Going Dutch - Contact:edwin@edwinrijgersberg.nl
- Stability AI - Multilingual ๐ - https://stability.ai/contact
- NOUS Research - Contact:karan@nousresearch.com
- TartuNLP - Discord: https://discord.gg/tartunlp - Contact:ping@tartunlp.ai
- PORO silogen - Contact:Founderpeter.sarlin@silo.ai
- Le Bon LLM - https://www.linkedin.com/company/le-bon-llm/
- OpenLLM France - Contact:contact@openllm-france.fr - https://www.openllm-france.fr
- LAION - Discord: https://discord.com/invite/laion - Contact: contact@laion.ai
- OpenGPTX - Discord: https://discord.gg/ZmF2dJgJ - Contact: opengpt-x@ki-verband.de
- Fraunhofer IAIS - Contact:info@iais.fraunhofer.de
- GFOSS - Contact:info@eellak.gr
- Hilanco - Contact:tavaradi@gmail.com
- HUN-REN - Contact:linginst@nytud.hun-ren.hu
- gaBERT - Discord: https://discord.com/invite/b5UQTWQn - Contact:alan.cowap2@mail.dcu.ie
- Fauno Italian LLM - Contact:bacciu@diag.uniroma1.it
- NLP Odyssey - Discord: https://discord.gg/nlpodyssey - Contact:matteogrella@gmail.com
- LVBERT - Contact:hello@peteris.rocks
- EMBEDDIA - Contact:info@embeddia.eu
- Tilde AI powered langage technologies - Contact:https://www.linkedin.com/in/andrejs-vasiljevs/
- Tollef Jรธrgensen - Contact:tollef.jorgensen@ntnu.no
- Polbert - Contact:darek@wandb.com
- Sabia - Contact:ramon@maritaca.ai
- LLM for Romanian - Contact:contact@ilds.ro
- Beia, consult international - Contact:office@beia.ro
- Serbian LLM - Serbian ๐ท๐ธ - https://www.linkedin.com/in/aleksagordic/
- KInit - https://www.linkedin.com/in/juraj-bezdek-6b521346/
- Blip.solution - Contact:juraj.bezdek@blip.solutions
- SloBERTa - Contact:info@embeddia.eu
- Projecte Aina : Aguila Alpaca - Discord: https://discord.gg/projecte-aina - Contact:aina@bsc.es
- BSC โ Barcelona supercomputing Center - Contact:info@bsc.es
- Expert AI -Contact:jmgomez@expert.ai
- AI Sweden - Contact:johanna.bergman@ai.se
- Satisfied - Discord: https://discord.gg/statisfied - Contact:info@statisfied.io
- HPLT - Contact:andreku@ifi.uio.no
- Unbabel - Contact:https://communityonboarding.unbabel.com/signup/step/0
- Occiglot - Contact:brack.cs.tu-darmstadt.de
- TrustLLM - Contact:trine.platou@liu.se
- Luxembourg Institute of Science and technology - Luxembourg ๐ฑ๐บ - Contact:jordi.cabot@list.lu
- Sosnitskij - https://www.linkedin.com/in/said-azizov-6b5a82256/
- Evidently AI - Multilingual ๐ - Discord: https://discord.gg/evidentlyai - Contact:hello@evidentlyai.com
- YugoGPT - ๐ท๐ธ๐ญ๐ท๐ง๐ฆ๐ฒ๐ฐ๐ฝ๐ฐ - Discord: https://discord.gg/yugogpt - https://www.linkedin.com/in/aleksagordic/
- LangFuse US project using european languages ๐บ๐ธ - Contact:hi@langfuse.com
- Sayhan - Turkish ๐น๐ท - https://www.linkedin.com/in/sayhan-yalva%C3%A7er-0617641b1/
- Sestek - Turkish ๐น๐ท - Contact:sales@sestek.com
- AI Forever - Armenian ๐ฆ๐ฒ - https://www.linkedin.com/in/said-azizov-6b5a82256/
- Yandex YaLM 100B - Russian and English ๐ท๐บ๐ฌ๐ง - Contact:pr@yandex-team.ru
- EleutherAI - International collaboration using english ๐ - Discord:https://discord.com/invite/zBGx3azzUn - Contact:contact@eleuther.ai