From f84a90c91b1c27b32d751d9679686c87c7fd5d9b Mon Sep 17 00:00:00 2001 From: Sepideh Alassi <> Date: Tue, 30 Jul 2024 16:44:30 +0200 Subject: [PATCH 1/2] poster session added --- content/pages/2024/poster_session.md | 131 ++++++++++++++++++++++++++- poster_parser/parser_posters.py | 33 +++++++ poster_parser/posters.md | 127 ++++++++++++++++++++++++++ 3 files changed, 290 insertions(+), 1 deletion(-) create mode 100644 poster_parser/parser_posters.py create mode 100644 poster_parser/posters.md diff --git a/content/pages/2024/poster_session.md b/content/pages/2024/poster_session.md index 5cb4747..bace716 100644 --- a/content/pages/2024/poster_session.md +++ b/content/pages/2024/poster_session.md @@ -9,4 +9,133 @@ slug: poster_session_2024 # Spotlight and Poster Session -TBA +On Wednesday 28th, a poster session is organized from 4.30 pm to 18:00 pm. +It will be preceded by a spotlight session taking place at 3:30 pm in room 7. + +You can find the list of the posters below. + +## Validation of Association - implementation in Python library + +*Mateusz*, *Data Science and Visualisation* + +**Abstract:** During out talk, we would like to present a Python library, that implements ideas proposed in the paper "Validation of Association" by Ćmiel Bogdan and Ledwina Teresa. The authors introduce a novel function-valued measure of dependence known as the quantile dependence function. This measure plays a pivotal role in constructing tests for independence and allows for easily interpretable diagnostic plots that highlight deviations from the null model. The quantile dependence function is specifically designed to identify general dependence structures between variables within different quantiles of their joint distribution. The authors develop new estimators for the dependence function and utilize them to devise innovative tests for independence. + + + + +## Revolutionizing Enterprise Operations: Innovative Uses of AI Agents + +*Adwaith T A*, *Machine and Deep Learning* + +**Abstract:** Unlock the transformative power of AI agents in enterprise environments! This session explores how AI can revolutionize business operations by automating complex tasks, enhancing decision-making, and personalizing customer interactions. Through real-world case studies and practical insights, attendees will learn advanced techniques for developing and integrating AI agents to drive efficiency and innovation. Whether you're a technical expert or a business leader, gain the tools and knowledge to harness AI for significant operational improvements. + + + + +## Visualizing and debugging tensors with aesthetic-tensor + +*Iliya Zhechev*, *Community, Education, and Outreach* + +**Abstract:** Researchers and ML Engineers working hands-on with tensors frequently have to inspect and visualize the contents of tensors in order to understand what they hold. Aesthetic-tensor is a Python library that works for PyTorch and NumPy that abstracts away tensor visualizations in an intuitive and easy-to-learn API. In this tutorial, we're going to explain the core concept of the API and walk through a few practical examples of how to use the library. + + + + +## Using time series to detect anomalies in a wide area network environment. Is it possible to solve such a problem using Python? + +*Paweł Żal*, *Scientific Applications* + +**Abstract:** Time series issues may seem destined only for a narrow group of data scientists, but in fact they can make the work of administrators, support people, researchers or developers themselves easier. + +In the era of ubiquitous IoT, and the consequent intensive use of telemetry, time series make it possible to analyze the timing of events, correlate the dynamics of their occurrence, and ultimately facilitate the understanding of these phenomena. + +Using Python, time series can be successfully analyzed, and when the amount of data exceeds the amount of available RAM, Python can use time series databases. + +My experience allows me to present cases from the practical use of time series to detect anomalies in the operation of equipment, often preceding failures. + + + + +## 🧪 `classy-bench`: a low-code library for quickly training and evaluating model baselines for Multi Label Classification applications + +*Edoardo Abati*, *Machine and Deep Learning* + +**Abstract:** `classy-bench` is a low-code Python library that simplifies the process of training and evaluating baseline models for real-world Multi-Label Classification applications. Simply provide your datasets, and quickly get a benchmark of multiple models tailored to your specific use case. This talk will introduce the library and demonstrate its ease of use through examples. + + + + +## pycodehash: boost your pipeline by skipping all unchanged steps! + +*Simon Brugman, Ralph*, *Data Science and Visualisation* + +**Abstract:** Data pipelines are of paramount importance in data science, engineering and analysis. Often, there are parts of the pipeline that have not changed. Recomputing these nodes is wasteful, especially for larger datasets. PyCodeHash is a novel generic data and Python code hashing library that facilitates downstream caching. + + + + +## Mastering Python Performance: Advanced Techniques for Efficiency + +*Adwaith T A*, *High Performance Computing* + +**Abstract:** Delve into the nuanced realm of Python optimization in this comprehensive session, where we uncover advanced strategies to maximize code performance. From profiling tools like cProfile and Memory Profiler to sophisticated techniques for data handling and parallel processing, we'll explore how to minimize resource consumption and boost execution speed. Geared towards both seasoned developers and newcomers, this talk promises to equip you with the skills to transform your Python projects into models of efficiency and high performance. + + + + +## Gatherer: Insight Revelation from Diplomatic Archives + +*Egemen Bezci*, *Scientific Applications* + +**Abstract:** Gatherer is an open-source Python tool designed to enhance the research efficiency of political scientists and historians conducting extensive archival research on historical diplomatic records. This tool helps digitization of physical records, converts unstructured data into structured tabular formats, generates summaries, extracts metadata, and identifies key named entities. These capabilities streamline the primary source collection and synthesis processes, significantly improving the research workflow. + + + + +## SpatialData: a FAIR framework for multimodal spatial omics + +*Wouter-Michiel Vierdag, Luca Marconato*, *High Performance Computing* + +**Abstract:** Spatial omics data generation displaying DNA, RNA and protein within their spatial context has been tremendously increasing in the last couple years. This has led to challenges for bioinformaticians tasked with analysing the data due to among others data size and a plethora of different formats being used by different researchers. Hence the need for highly performant findable, accessible, interoperable and reusable (FAIR) representation of this bioimaging data. +For this we developed the SpatialData framework, a solution that combines an on-disk format, the SpatialData format, with a set of Python libraries for accessing and operating on spatial omics data, and tools for interactive data annotation and visualization.The SpatialData library seamlessly integrates with the existing Python ecosystem by building upon standard scientific Python data types, such as xarray, dask, geopandas and anndata. Thereby providing a flexible, community standards-based, open framework to store, process, and annotate data from virtually any spatial omics technology available to date. With the simplified and interoperable data representation, the ability to easily create unified coordinate systems, and the numerous downstream analysis capabilities, it can facilitate the development, reproducibility and reuse of analysis pipelines, and ultimately unlock new approaches to unfold scientific questions. + + + + +## Streamlining Strain-Stress Analysis with Pydidas for XRD experiments + +*Gudrun Lotze*, *Scientific Applications* + +**Abstract:** X-ray diffraction (XRD) reveals atomic structures in a variety of materials, from chocolate and biomaterials, like bone, to hard coatings for CNC machining tools. Synchrotron radiation facilities constantly strive to engage new scientific communities. However, attracting new user groups can be challenging, as many are not familiar with XRD. Pydidas bridges this gap by offering a comprehensive framework for XRD analysis, featuring data processing, analysis, and visualization tools. It efficiently handles complex HDF5 files and provides near real-time feedback.  +We introduce a new Pydidas workflow for X-ray diffraction-based strain-stress analysis, combining crystallographic data with mechanical properties. This integration aids in optimizing the material design by understanding how the microstructure influences macroscopic behaviour, advancing materials development in mechanical and aerospace engineering. + + + + +## A modular interface for visualization and pre-processing for multi-channel signals + +*Anais Monteils*, *Data Science and Visualisation* + +**Abstract:** Filtering and having an overview of data are essential steps before feature extraction or any analytical process. This interface can provide a robust foundation to support this processing stage. The key word: modularity. + +Built in the context of High Density Electromyography processing, this interface developed with PyQT, Pyqtgraph, Xarray/Datatree and Jinja, offers multi-channel visualizations in temporal and frequency domains. It enables users to generate reports with specific metrics, create and customize processing pipelines. Designed to be as modular as possible, the interface allows users to incorporate their own features that fit their datasets by following a standardized code architecture. + + + + +## Project-Based Python Training Resources for AI Scientists, Engineers, and Developers + +*Anuradha Kar, PhD*, *Community, Education, and Outreach* + +**Abstract:** In this talk, I will discuss about the significance of hands-on project based training resources for Python learners of all levels who aspire to work in the latest domains of machine learning, deep learning and generative AI algorithms and applications. I will present the steps of development of hands-on training modules for python programming and AI enthusiasts. These resources are aimed to help learners in mastering key concepts while developing applied technical skills that are essential in undertaking latest AI based projects in industry and academia. In this talk I will discuss how python programmers and AI developers at all knowledge levels may use and benefit from such hands-on training approaches as well as how currently there is a huge need for developing project based learning modules in the domains of python programming, machine learning, generative AI and data science. As a creator of several such practical learning modules myself, I will do a walk through of the procedural steps involved in building these kind of learning curriculums and highlight the need and benefits of such practical, hands-on training methodologies. + + + + +## From Logs to Insights: An Exploration of Infrastructure Logging and Clustering + +*Arkadiusz Trawiński*, *Data Science and Visualisation* + +**Abstract:** The analysis of logging messages is a big challenge because of their massive number, different origins and unspecify formats. These challenges can be partially address with NLP techniques and ultimately detect, predict or even maybe avoid incidents. What we demonstrate is complete monitoring solution. That includes clustering and uncovering warning-incident correlation with a Hawkes model. This model was previously successfully applied for earthquake predictions based on aftershocks. The Hawkes process model is well-defined mathematically and can process a large volume of data. + + diff --git a/poster_parser/parser_posters.py b/poster_parser/parser_posters.py new file mode 100644 index 0000000..c79c0da --- /dev/null +++ b/poster_parser/parser_posters.py @@ -0,0 +1,33 @@ +# This script is intended to parse the proposals exported from Pretalx in CSV +# format and generate a markdown output that cna be pasted in the `poster_session.md`. +# This is really manual and not very elegant, but it works. + +# %% +import pandas as pd + +proposals = pd.read_csv("euroscipy-2024_confirmed.csv") + +# %% +proposals.head() + +# %% +import numpy as np + +output_md = "" +for _, proposal in proposals.iterrows(): + session_type = proposal['Session type'] + if session_type == 'Poster': + output_md += f"\n## {proposal['Proposal title']}\n\n" + + track = proposal['Track'] + name = proposal["Speaker names"] + if not isinstance(name, str): + name = "To Be Defined" + output_md += f"*{name}*, *{track}*\n\n" + output_md += f"**Abstract:** {proposal['Abstract']}\n\n" + output_md += f'\n\n' + +with open('posters.md', 'w') as file: + file.write(output_md) + +# %% diff --git a/poster_parser/posters.md b/poster_parser/posters.md new file mode 100644 index 0000000..250dbbb --- /dev/null +++ b/poster_parser/posters.md @@ -0,0 +1,127 @@ + +## Validation of Association - implementation in Python library + +*Mateusz*, *Data Science and Visualisation* + +**Abstract:** During out talk, we would like to present a Python library, that implements ideas proposed in the paper "Validation of Association" by Ćmiel Bogdan and Ledwina Teresa. The authors introduce a novel function-valued measure of dependence known as the quantile dependence function. This measure plays a pivotal role in constructing tests for independence and allows for easily interpretable diagnostic plots that highlight deviations from the null model. The quantile dependence function is specifically designed to identify general dependence structures between variables within different quantiles of their joint distribution. The authors develop new estimators for the dependence function and utilize them to devise innovative tests for independence. + + + + +## Revolutionizing Enterprise Operations: Innovative Uses of AI Agents + +*Adwaith T A*, *Machine and Deep Learning* + +**Abstract:** Unlock the transformative power of AI agents in enterprise environments! This session explores how AI can revolutionize business operations by automating complex tasks, enhancing decision-making, and personalizing customer interactions. Through real-world case studies and practical insights, attendees will learn advanced techniques for developing and integrating AI agents to drive efficiency and innovation. Whether you're a technical expert or a business leader, gain the tools and knowledge to harness AI for significant operational improvements. + + + + +## Visualizing and debugging tensors with aesthetic-tensor + +*Iliya Zhechev*, *Community, Education, and Outreach* + +**Abstract:** Researchers and ML Engineers working hands-on with tensors frequently have to inspect and visualize the contents of tensors in order to understand what they hold. Aesthetic-tensor is a Python library that works for PyTorch and NumPy that abstracts away tensor visualizations in an intuitive and easy-to-learn API. In this tutorial, we're going to explain the core concept of the API and walk through a few practical examples of how to use the library. + + + + +## Using time series to detect anomalies in a wide area network environment. Is it possible to solve such a problem using Python? + +*Paweł Żal*, *Scientific Applications* + +**Abstract:** Time series issues may seem destined only for a narrow group of data scientists, but in fact they can make the work of administrators, support people, researchers or developers themselves easier. + +In the era of ubiquitous IoT, and the consequent intensive use of telemetry, time series make it possible to analyze the timing of events, correlate the dynamics of their occurrence, and ultimately facilitate the understanding of these phenomena. + +Using Python, time series can be successfully analyzed, and when the amount of data exceeds the amount of available RAM, Python can use time series databases. + +My experience allows me to present cases from the practical use of time series to detect anomalies in the operation of equipment, often preceding failures. + + + + +## 🧪 `classy-bench`: a low-code library for quickly training and evaluating model baselines for Multi Label Classification applications + +*Edoardo Abati*, *Machine and Deep Learning* + +**Abstract:** `classy-bench` is a low-code Python library that simplifies the process of training and evaluating baseline models for real-world Multi-Label Classification applications. Simply provide your datasets, and quickly get a benchmark of multiple models tailored to your specific use case. This talk will introduce the library and demonstrate its ease of use through examples. + + + + +## pycodehash: boost your pipeline by skipping all unchanged steps! + +*Simon Brugman, Ralph*, *Data Science and Visualisation* + +**Abstract:** Data pipelines are of paramount importance in data science, engineering and analysis. Often, there are parts of the pipeline that have not changed. Recomputing these nodes is wasteful, especially for larger datasets. PyCodeHash is a novel generic data and Python code hashing library that facilitates downstream caching. + + + + +## Mastering Python Performance: Advanced Techniques for Efficiency + +*Adwaith T A*, *High Performance Computing* + +**Abstract:** Delve into the nuanced realm of Python optimization in this comprehensive session, where we uncover advanced strategies to maximize code performance. From profiling tools like cProfile and Memory Profiler to sophisticated techniques for data handling and parallel processing, we'll explore how to minimize resource consumption and boost execution speed. Geared towards both seasoned developers and newcomers, this talk promises to equip you with the skills to transform your Python projects into models of efficiency and high performance. + + + + +## Gatherer: Insight Revelation from Diplomatic Archives + +*Egemen Bezci*, *Scientific Applications* + +**Abstract:** Gatherer is an open-source Python tool designed to enhance the research efficiency of political scientists and historians conducting extensive archival research on historical diplomatic records. This tool helps digitization of physical records, converts unstructured data into structured tabular formats, generates summaries, extracts metadata, and identifies key named entities. These capabilities streamline the primary source collection and synthesis processes, significantly improving the research workflow. + + + + +## SpatialData: a FAIR framework for multimodal spatial omics + +*Wouter-Michiel Vierdag, Luca Marconato*, *High Performance Computing* + +**Abstract:** Spatial omics data generation displaying DNA, RNA and protein within their spatial context has been tremendously increasing in the last couple years. This has led to challenges for bioinformaticians tasked with analysing the data due to among others data size and a plethora of different formats being used by different researchers. Hence the need for highly performant findable, accessible, interoperable and reusable (FAIR) representation of this bioimaging data. +For this we developed the SpatialData framework, a solution that combines an on-disk format, the SpatialData format, with a set of Python libraries for accessing and operating on spatial omics data, and tools for interactive data annotation and visualization.The SpatialData library seamlessly integrates with the existing Python ecosystem by building upon standard scientific Python data types, such as xarray, dask, geopandas and anndata. Thereby providing a flexible, community standards-based, open framework to store, process, and annotate data from virtually any spatial omics technology available to date. With the simplified and interoperable data representation, the ability to easily create unified coordinate systems, and the numerous downstream analysis capabilities, it can facilitate the development, reproducibility and reuse of analysis pipelines, and ultimately unlock new approaches to unfold scientific questions. + + + + +## Streamlining Strain-Stress Analysis with Pydidas for XRD experiments + +*Gudrun Lotze*, *Scientific Applications* + +**Abstract:** X-ray diffraction (XRD) reveals atomic structures in a variety of materials, from chocolate and biomaterials, like bone, to hard coatings for CNC machining tools. Synchrotron radiation facilities constantly strive to engage new scientific communities. However, attracting new user groups can be challenging, as many are not familiar with XRD. Pydidas bridges this gap by offering a comprehensive framework for XRD analysis, featuring data processing, analysis, and visualization tools. It efficiently handles complex HDF5 files and provides near real-time feedback.  +We introduce a new Pydidas workflow for X-ray diffraction-based strain-stress analysis, combining crystallographic data with mechanical properties. This integration aids in optimizing the material design by understanding how the microstructure influences macroscopic behaviour, advancing materials development in mechanical and aerospace engineering. + + + + +## A modular interface for visualization and pre-processing for multi-channel signals + +*Anais Monteils*, *Data Science and Visualisation* + +**Abstract:** Filtering and having an overview of data are essential steps before feature extraction or any analytical process. This interface can provide a robust foundation to support this processing stage. The key word: modularity. + +Built in the context of High Density Electromyography processing, this interface developed with PyQT, Pyqtgraph, Xarray/Datatree and Jinja, offers multi-channel visualizations in temporal and frequency domains. It enables users to generate reports with specific metrics, create and customize processing pipelines. Designed to be as modular as possible, the interface allows users to incorporate their own features that fit their datasets by following a standardized code architecture. + + + + +## Project-Based Python Training Resources for AI Scientists, Engineers, and Developers + +*Anuradha Kar, PhD*, *Community, Education, and Outreach* + +**Abstract:** In this talk, I will discuss about the significance of hands-on project based training resources for Python learners of all levels who aspire to work in the latest domains of machine learning, deep learning and generative AI algorithms and applications. I will present the steps of development of hands-on training modules for python programming and AI enthusiasts. These resources are aimed to help learners in mastering key concepts while developing applied technical skills that are essential in undertaking latest AI based projects in industry and academia. In this talk I will discuss how python programmers and AI developers at all knowledge levels may use and benefit from such hands-on training approaches as well as how currently there is a huge need for developing project based learning modules in the domains of python programming, machine learning, generative AI and data science. As a creator of several such practical learning modules myself, I will do a walk through of the procedural steps involved in building these kind of learning curriculums and highlight the need and benefits of such practical, hands-on training methodologies. + + + + +## From Logs to Insights: An Exploration of Infrastructure Logging and Clustering + +*Arkadiusz Trawiński*, *Data Science and Visualisation* + +**Abstract:** The analysis of logging messages is a big challenge because of their massive number, different origins and unspecify formats. These challenges can be partially address with NLP techniques and ultimately detect, predict or even maybe avoid incidents. What we demonstrate is complete monitoring solution. That includes clustering and uncovering warning-incident correlation with a Hawkes model. This model was previously successfully applied for earthquake predictions based on aftershocks. The Hawkes process model is well-defined mathematically and can process a large volume of data. + + + From bafe691d2e13456b3ed49f7c9344bffcb65423d8 Mon Sep 17 00:00:00 2001 From: Sepideh Alassi <> Date: Tue, 30 Jul 2024 16:50:20 +0200 Subject: [PATCH 2/2] update the talks and posters pages --- content/pages/2024/featured_talks.md | 238 +++++++++++++++++- .../{confirmed.md => confirmed_talks.md} | 237 ++++++++++++++++- poster_parser/euroscipy-2024_confirmed.csv | 216 ++++++++++++++++ poster_parser/parser_speaker.py | 26 +- 4 files changed, 689 insertions(+), 28 deletions(-) rename poster_parser/{confirmed.md => confirmed_talks.md} (62%) create mode 100644 poster_parser/euroscipy-2024_confirmed.csv diff --git a/content/pages/2024/featured_talks.md b/content/pages/2024/featured_talks.md index 2058973..c9babab 100644 --- a/content/pages/2024/featured_talks.md +++ b/content/pages/2024/featured_talks.md @@ -8,6 +8,16 @@ slug: accepted_talks_2024 --- # Featured Talks +You can find the list of the EuroSciPy2024 talks, below. +Please see the list of EuroSciPy2024 posters [here](https://euroscipy.org/2024/poster_session.html). + +## napari: multi-dimensional image visualization, annotation, and analysis in Python + +*Grzegorz Bokota, Wouter-Michiel Vierdag*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** Napari is an interactive n-dimensional image viewer for Python. It is able to rapidly render and interactively visualize almost any array like image data. Additionally, napari can overlay derived data, such as segmentations, points, polygons, surfaces and more. Each of these data exists as a layer in the napari viewer, which allows fine control over how the data is displayed. Furthermore, derived data can be edited. Together with the capability of writing plugins, napari lets you seamlessly weave exploration, computation, and annotation in common and custom image analysis workflows. + + ## LPython: Novel, Fast, Retargetable Python Compiler @@ -19,6 +29,26 @@ slug: accepted_talks_2024 +## fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs + +*Kushal Kolar, Caitlin Lewis*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** Fast interactive visualization remains a considerable barrier in analyses pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general purpose scientific plotting library that is useful for the fast and live visualization and analysis of complex datasets. + + + + +## Combining Python and Rust to create Polars Plugins + +*Marco Gorelli*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practically any problem. + +No prior Rust experience required, intermediate Python or general programming experience required. By the end of the session, you will know how to write your own Polars Plugin! This talk is aimed at data practitioners. + + + + ## Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI *Jan Boelts (Teusen)*, *Talk (25 mins + Q&A)*, *Scientific Applications* @@ -28,6 +58,17 @@ slug: accepted_talks_2024 +## Accelerating Python on HPC with Dask + +*Jacob Tomlinson*, *Talk (25 mins + Q&A)*, *High Performance Computing* + +**Abstract:** Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives. + +In this session we will dive into the many ways to deploy Dask workloads on HPC, and how to choose the right method for your workload. Then we will dig into the accelerated side of Dask and how you can leverage GPUs with RAPIDS and Dask CUDA and use UCX to take advantage of accelerated networking like Infiniband and NVLink. + + + + ## Using the Array API to write code that runs with Numpy, Cupy and PyTorch *Tim Head, Sebastian Berg*, *Tutorial*, *High Performance Computing* @@ -37,6 +78,15 @@ slug: accepted_talks_2024 +## Dispatching and Backend Selection Discussion + +*Sebastian Berg*, *Maintainer track*, *High Performance Computing* + +**Abstract:** In this session, we would like to briefly review the successful [`NetworkX` backend selection](https://networkx.org/documentation/stable/reference/backends.html) and work towards a possible future dispatching project under the Scientific Python umbrella, [`spatch`](https://github.com/scientific-python/spatch). + + + + ## From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale *Frank Sauerburger*, *Talk (15 mins + Q&A)*, *Machine and Deep Learning* @@ -46,6 +96,15 @@ slug: accepted_talks_2024 +## Introduction to matplotlib for Data Visualization with Python + +*Nefta Kanilmaz*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** _matplotlib_ is a library for creating visualizations with Python which "...makes easy things easy and hard things possible" (https://matplotlib.org/). This tutorial, intended for beginners, will introduce the library and explain core concepts as well as the main interfaces. Starting with styling simple point data plots, we will explain how to work with several dimensions, shared axes and advanced styling options using _rcParams_. After completing this tutorial, participants will hopefully be equipped with a thorough understanding of _matplotlib_ to navigate the "hard things" in the world of data visualization. + + + + ## Multi-dimensional arrays with Scipp *Mridul Seth*, *Tutorial*, *High Performance Computing* @@ -61,6 +120,17 @@ Scipp is available via pip and conda and runs on Linux, Mac and Windows. +## Just contribute?! + +*Wolf Vollprecht*, *Keynote*, *Community, Education, and Outreach* + +**Abstract:** Open source software is here for everyone - but how are we making sure that everyone has equal access? +In this keynote I will discuss how to lower barriers of entry for new contributors - and the many facets to this: documentation, community, guidelines, and tools. +I will share my personal motivations for contributing to open-source software and my journey over the past five years and all of its learnings. + + + + ## Decorators - A Deep Dive *Mike Müller*, *Tutorial*, *Scientific Applications* @@ -114,6 +184,27 @@ functions and return new functions. +## Data augmentation with Scikit-LLM + +*Claudio G. Giancaterino*, *Talk (15 mins + Q&A)*, *Machine and Deep Learning* + +**Abstract:** Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization. + + + + +## NumPy's new DType API and 2.0 transition + +*Sebastian Berg*, *Maintainer track*, *Scientific Applications* + +**Abstract:** NumPy 2 had some significant changes in its API and required many downstream libraries and users to adapt. +One of the larger new features is that the new DType API is now public. This C-API allows more powerful user defined DTypes, for which the new StringDType is an example. In the first part, I will give a brief overview of this API. + +Since many downstream projects needed to adapt and publish new versions, in the second part I recap the current and past difficulties in transitioning to NumPy 2. This part of the session will be a forum for open discussion to gauge the challenges faced by users in making this transition. + + + + ## From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints *Janos Gabler*, *Talk (25 mins + Q&A)*, *Community, Education, and Outreach* @@ -185,6 +276,15 @@ How do Python's new features such as sub-interpreters and free-threading without +## A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia + +*Daniele Raimondi*, *Talk (15 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** This presentation introduces a Qdrant vector DB and Specter2 model used to identify whether a rejected academic manuscript is later published in a competing journal. Our method combines AI, data science and analytics to ensure a good identification of manuscripts and authors. The findings offer insights into resubmission patterns, enhancing our understanding of academic publishing dynamics. The system is implemented in Python. + + + + ## Reproducible workflows with AiiDA - The power and challenges of full data provenance *Marnik Bercx, Xing Wang*, *Talk (25 mins + Q&A)*, *Scientific Applications* @@ -194,9 +294,27 @@ How do Python's new features such as sub-interpreters and free-threading without +## Architecting Science Tools: A Roadmap for Turning Theory and Data Projects into Python Packages + +*Ramon Perez*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** This workshop aims to address the gap between the development of technical work--whether research or data science--and its reproducibility by providing participants with the necessary knowledge to get started creating Python packages. This means that, if you're a researcher (with basic Python knowledge) wanting to make your theories more accessible, or a data professional wanting to share your Python code inside or outside of your organization, this workshop will help you understand how to contribute to, and develop, open-source projects from scratch. + + + + +## 10 Years of Open Source: Navigating the Next AI Revolution + +*Ines Montani*, *Keynote*, *Community, Education, and Outreach* + +**Abstract:** A lot has been happening in the field of AI and Natural Language Processing: there's endless excitement about new technologies, sobering post-hype hangovers and also uncertainty about where the field is heading next. In this talk, I'll share the most important lessons we've learned in 10 years of working on open source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs + + + + ## A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying -*Thomas Fraunholz*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* +*Thomas Fraunholz, Tim Köhler*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* **Abstract:** The world of open source computer vision has never been so exciting - and so challenging. With so many options available to you, what's the best way to solve your real world problem? The questions are always the same: Do I have enough data? Which model should I choose? How can I fine-tune and optimize the hyperparameters? @@ -211,13 +329,38 @@ If you want to level up your MLOps game and gain practical knowledge of the late ## Building optimized packages for conda-forge and PyPI -*Wolf Vollprecht*, *Talk (15 mins + Q&A)*, *High Performance Computing* +*Wolf Vollprecht, Bas Zalmstra*, *Talk (15 mins + Q&A)*, *High Performance Computing* **Abstract:** In this talk we're introducing a new tool to build conda packages. It has been adopted by the conda community and is being rolled out in the widely used conda-forge distribution. The new recipe format has been vetted in multiple Conda Enhancement Proposals (CEPs). We are going to introduce the exciting new features of rattler-build (reproducible builds, high speed build execution, etc.). Using some examples, we will then discuss how you can use rattler-build & conda-forge to build highly optimized packages with SIMD and CUDA support. We will also take a look at `cibuildwheel` and recent improvements in the PyPI space for CUDA. +## Improving the build config of your package + +*Ralf Gommers*, *Maintainer track*, *Scientific Applications* + +**Abstract:** Goals: + +- Share tips, tricks and best practices for configuring the build backend of a Python package with compiled (Cython/C/C++/Rust/Fortran) code +- Identify shared needs between packages, and discuss gaps in current build backends, documentation, or shared infrastructure + +Topics: + +- Goals to aim for in your build config (and how to achieve them): + - Faster builds and relevant tooling like profiling, + - Build logs that actually help when diagnosing issues, + - How to debug build failures effectively, + - How to check for and visualize build dependencies, + - Ensuring builds are reproducible, + - Approaches to reducing binary size, + - CI config ideas to guard against regressions +- Recent build-related developments & a post-distutils world +- What are the most pressing pain points for maintainers? + + + + ## Skrub: prepping tables for machine learning *Guillaume Lemaitre*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* @@ -278,23 +421,35 @@ Finally, we discuss an ambitious roadmap for improvements, new features, and pla -## Paths in Parallel: Creating Custom NetworkX Backends +## Understanding NetworkX's API Dispatching with a parallel backend *Erik Welch, Aditi Juneja*, *Talk (25 mins + Q&A)*, *Community, Education, and Outreach* -**Abstract:** Hi! Have you ever wished your Python libraries could run faster algorithms? Or wanted to fundamentally improve a python library by re-writing everything in a faster language like C or Rust? +**Abstract:** Hi! Have you ever wished your pure Python libraries were faster? Or wanted to fundamentally improve a Python library by rewriting everything in a faster language like C or Rust? Well, wish no more... NetworkX's backend dispatching mechanism redirects your plain old NetworkX function calls to a FASTER implementation present in a separate backend package by leveraging the Python's [`entry_point`](https://packaging.python.org/en/latest/specifications/entry-points) specification! -NetworkX is a popular, pure Python library used for graph(aka network) analysis. But, when the graph size increases (like a network of everyone in the world) then networkx algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues, recently, a backend dispatch mechanism was developed. This mechanism leverages Python's package discovery mechanism using [`entry_points`](https://packaging.python.org/en/latest/specifications/entry-points) specification. +NetworkX is a popular, pure Python library used for graph(aka network) analysis. But when the graph size increases (like a network of everyone in the world), then NetworkX algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues this backend dispatching mechanism was recently developed. In this talk, we will unveil this dispatching mechanism and its implementation details, and how we can use it just by specifying a `backend` kwarg like this: -In this talk, we will unveil NetworkX's API Dispatching architecture that lets us redirect the plain old NetworkX’s function call to an alternative implementation present in a backend package, just by specifying a `backend` kwarg like this: + >>> nx.betweenness_centrality(G, backend=“parallel”) - >>> nx.betweenness_centrality(Graph, backend=“parallel”) +or by passing the backend graph object(type-based dispatching): -We will also go over all the implementation details and challenges of this dispatching mechanism, and then we’ll use the example of the nx-parallel as a guide to building our own custom NetworkX backend. And then using the NetworkX's testing suite to test this backend of ours. Ending with a quick dive into the details of the nx-parallel backend and a quick walk through of other backends and future ToDos. And then finally conclude with an interactive Q&A. + >>> H = nxp.ParallelGraph(G) + >>> nx.betweenness_centrality(H) + +We'll also go over the limitations of this dispatch mechanism. Then we’ll use the example of nx-parallel as a guide to building our own custom NetworkX backend. And then, using NetworkX's existing test suite, we'll test this backend that we build. Ending with a quick dive into the details of the nx-parallel backend. +## OpenGL is dying, let's talk about WebGPU + +*Almar Klein*, *Maintainer track*, *Data Science and Visualisation* + +**Abstract:** OpenGL is old and on a path to being deprecated. Modern GPU API's like Vulkan and Metal solve most problems that plague OpenGL, and higher abstractions like wgpu / WebGPU provide a modern interface to control GPU hardware. The way that these work is much more pleasant to work with, and also provides performance benefits, especially for Python. + + + + ## Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains *Ilya Komarov*, *Talk (15 mins + Q&A)*, *Data Science and Visualisation* @@ -304,6 +459,15 @@ We will also go over all the implementation details and challenges of this dispa +## Scientific Python + +*Jarrod Millman, Stéfan van der Walt*, *Maintainer track*, *Community, Education, and Outreach* + +**Abstract:** Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec odio erat, posuere non sodales ac, iaculis quis augue. Morbi et justo tellus. Donec et est diam. Sed nec libero ac leo vehicula ornare. Proin imperdiet risus at aliquam iaculis. Nam maximus mi vitae turpis aliquet, eget mollis leo imperdiet. Nunc nec sapien ornare, cursus urna in, dignissim ante. Ut nec sapien in nunc elementum pretium. Donec varius ante quis ipsum imperdiet volutpat. Praesent tristique turpis mi, id mattis augue feugiat nec. Phasellus nec feugiat nunc. Suspendisse at felis a lorem congue sodales. Nam consectetur mi in posuere ultricies. Vestibulum nisi lacus, suscipit ac nisl id, mattis faucibus libero. + + + + ## Regularizing Python using Structured Control Flow *Valentin Haenel*, *Talk (25 mins + Q&A)*, *High Performance Computing* @@ -341,6 +505,30 @@ You will learn how to manipulate NumPy arrays, understand how they store data an +## A Hitchhiker's Guide to Contributing to Open Source + +*Sebastian Berg, Nikoleta E. Glynatsi*, *Tutorial*, *Community, Education, and Outreach* + +**Abstract:** Open-source projects are essential for scientific programming. They provide many tools and resources that can be customized for different scientific needs. However, sometimes the existing tools in a package don't meet all the requirements of a project. This is when contributing to open-source packages becomes important. By contributing, you can implement new functionalities, improve the software and help keep the open-source community strong. + +This workshop will make contributing to open-source projects easier to understand. It will guide participants from just using the software to actively contributing to it. The workshop will address technical challenges such as interacting with web-based hosting services (like GitHub and GitLab), branching, and opening pull requests. Additionally, it will cover how to contribute documentation and ensure the correctness of the code. + + + + +## Mostly Harmless Fixed Effects Regression in Python with PyFixest + +*Alexander Fischer*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** This session introduces PyFixest, an open source Python library inspired by the "fixest" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap and randomization inference. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption. + +PyFixest aims to faithfully replicate the core design principles of "fixest", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects. + +The presentation will argue why there is a need for another regression package in Python, cover PyFixest's functionality and design philosophy, and discuss future development prospects. + + + + ## Probabilistic classification and cost-sensitive learning with scikit-learn *Guillaume Lemaitre, Olivier Grisel*, *Tutorial*, *Machine and Deep Learning* @@ -354,6 +542,15 @@ More specifically, we will introduce the concepts of calibrated probabilistic cl +## Helmholtz Blablador and the LLM models' ecosystem + +*Alexandre Strube*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* + +**Abstract:** Helmholtz Blablador is the LLM inference server from the Helmholtz Foundation. This talk explores Blablador's role in hosting open-source LLM models and models developed in-house at the Juelich Supercomputing Centre (JSC). This talk is about Blablador and the open source LLM models' ecosystem. + + + + ## forecasting foundation models: evaluation and integration with sktime – challenges and outcomes *Franz Kiraly*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* @@ -372,3 +569,28 @@ Our key challenges and findings are presented in this talk – for instance, the + +## Introduction to Python + +*Mojdeh Rastgoo*, *Tutorial*, *Community, Education, and Outreach* + +**Abstract:** This tutorial will provide an introduction to Python intended for beginners. + +It will notably introduce the following aspects: + +- built-in types +- controls flow (i.e. conditions, loops, etc.) +- built-in functions +- basic Python class + + + + +## Image analysis in Python with scikit-image + +*Lars Grüter, Marianne Corvellec, Stéfan van der Walt*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** Scientists are producing more and more images with telescopes, microscopes, MRI scanners, etc. They need automatable tools to measure what they've imaged and help them turn these images into knowledge. This tutorial covers the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions. + + + diff --git a/poster_parser/confirmed.md b/poster_parser/confirmed_talks.md similarity index 62% rename from poster_parser/confirmed.md rename to poster_parser/confirmed_talks.md index 887f145..e613743 100644 --- a/poster_parser/confirmed.md +++ b/poster_parser/confirmed_talks.md @@ -1,4 +1,13 @@ +## napari: multi-dimensional image visualization, annotation, and analysis in Python + +*Grzegorz Bokota, Wouter-Michiel Vierdag*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** Napari is an interactive n-dimensional image viewer for Python. It is able to rapidly render and interactively visualize almost any array like image data. Additionally, napari can overlay derived data, such as segmentations, points, polygons, surfaces and more. Each of these data exists as a layer in the napari viewer, which allows fine control over how the data is displayed. Furthermore, derived data can be edited. Together with the capability of writing plugins, napari lets you seamlessly weave exploration, computation, and annotation in common and custom image analysis workflows. + + + + ## LPython: Novel, Fast, Retargetable Python Compiler *Naman Gera*, *Talk (25 mins + Q&A)*, *High Performance Computing* @@ -8,6 +17,26 @@ +## fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs + +*Kushal Kolar, Caitlin Lewis*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** Fast interactive visualization remains a considerable barrier in analyses pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general purpose scientific plotting library that is useful for the fast and live visualization and analysis of complex datasets. + + + + +## Combining Python and Rust to create Polars Plugins + +*Marco Gorelli*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practically any problem. + +No prior Rust experience required, intermediate Python or general programming experience required. By the end of the session, you will know how to write your own Polars Plugin! This talk is aimed at data practitioners. + + + + ## Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI *Jan Boelts (Teusen)*, *Talk (25 mins + Q&A)*, *Scientific Applications* @@ -17,6 +46,17 @@ +## Accelerating Python on HPC with Dask + +*Jacob Tomlinson*, *Talk (25 mins + Q&A)*, *High Performance Computing* + +**Abstract:** Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives. + +In this session we will dive into the many ways to deploy Dask workloads on HPC, and how to choose the right method for your workload. Then we will dig into the accelerated side of Dask and how you can leverage GPUs with RAPIDS and Dask CUDA and use UCX to take advantage of accelerated networking like Infiniband and NVLink. + + + + ## Using the Array API to write code that runs with Numpy, Cupy and PyTorch *Tim Head, Sebastian Berg*, *Tutorial*, *High Performance Computing* @@ -26,6 +66,15 @@ +## Dispatching and Backend Selection Discussion + +*Sebastian Berg*, *Maintainer track*, *High Performance Computing* + +**Abstract:** In this session, we would like to briefly review the successful [`NetworkX` backend selection](https://networkx.org/documentation/stable/reference/backends.html) and work towards a possible future dispatching project under the Scientific Python umbrella, [`spatch`](https://github.com/scientific-python/spatch). + + + + ## From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale *Frank Sauerburger*, *Talk (15 mins + Q&A)*, *Machine and Deep Learning* @@ -35,6 +84,15 @@ +## Introduction to matplotlib for Data Visualization with Python + +*Nefta Kanilmaz*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** _matplotlib_ is a library for creating visualizations with Python which "...makes easy things easy and hard things possible" (https://matplotlib.org/). This tutorial, intended for beginners, will introduce the library and explain core concepts as well as the main interfaces. Starting with styling simple point data plots, we will explain how to work with several dimensions, shared axes and advanced styling options using _rcParams_. After completing this tutorial, participants will hopefully be equipped with a thorough understanding of _matplotlib_ to navigate the "hard things" in the world of data visualization. + + + + ## Multi-dimensional arrays with Scipp *Mridul Seth*, *Tutorial*, *High Performance Computing* @@ -50,6 +108,17 @@ Scipp is available via pip and conda and runs on Linux, Mac and Windows. +## Just contribute?! + +*Wolf Vollprecht*, *Keynote*, *Community, Education, and Outreach* + +**Abstract:** Open source software is here for everyone - but how are we making sure that everyone has equal access? +In this keynote I will discuss how to lower barriers of entry for new contributors - and the many facets to this: documentation, community, guidelines, and tools. +I will share my personal motivations for contributing to open-source software and my journey over the past five years and all of its learnings. + + + + ## Decorators - A Deep Dive *Mike Müller*, *Tutorial*, *Scientific Applications* @@ -103,6 +172,27 @@ functions and return new functions. +## Data augmentation with Scikit-LLM + +*Claudio G. Giancaterino*, *Talk (15 mins + Q&A)*, *Machine and Deep Learning* + +**Abstract:** Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization. + + + + +## NumPy's new DType API and 2.0 transition + +*Sebastian Berg*, *Maintainer track*, *Scientific Applications* + +**Abstract:** NumPy 2 had some significant changes in its API and required many downstream libraries and users to adapt. +One of the larger new features is that the new DType API is now public. This C-API allows more powerful user defined DTypes, for which the new StringDType is an example. In the first part, I will give a brief overview of this API. + +Since many downstream projects needed to adapt and publish new versions, in the second part I recap the current and past difficulties in transitioning to NumPy 2. This part of the session will be a forum for open discussion to gauge the challenges faced by users in making this transition. + + + + ## From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints *Janos Gabler*, *Talk (25 mins + Q&A)*, *Community, Education, and Outreach* @@ -174,6 +264,15 @@ How do Python's new features such as sub-interpreters and free-threading without +## A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia + +*Daniele Raimondi*, *Talk (15 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** This presentation introduces a Qdrant vector DB and Specter2 model used to identify whether a rejected academic manuscript is later published in a competing journal. Our method combines AI, data science and analytics to ensure a good identification of manuscripts and authors. The findings offer insights into resubmission patterns, enhancing our understanding of academic publishing dynamics. The system is implemented in Python. + + + + ## Reproducible workflows with AiiDA - The power and challenges of full data provenance *Marnik Bercx, Xing Wang*, *Talk (25 mins + Q&A)*, *Scientific Applications* @@ -183,9 +282,27 @@ How do Python's new features such as sub-interpreters and free-threading without +## Architecting Science Tools: A Roadmap for Turning Theory and Data Projects into Python Packages + +*Ramon Perez*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** This workshop aims to address the gap between the development of technical work--whether research or data science--and its reproducibility by providing participants with the necessary knowledge to get started creating Python packages. This means that, if you're a researcher (with basic Python knowledge) wanting to make your theories more accessible, or a data professional wanting to share your Python code inside or outside of your organization, this workshop will help you understand how to contribute to, and develop, open-source projects from scratch. + + + + +## 10 Years of Open Source: Navigating the Next AI Revolution + +*Ines Montani*, *Keynote*, *Community, Education, and Outreach* + +**Abstract:** A lot has been happening in the field of AI and Natural Language Processing: there's endless excitement about new technologies, sobering post-hype hangovers and also uncertainty about where the field is heading next. In this talk, I'll share the most important lessons we've learned in 10 years of working on open source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs + + + + ## A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying -*Thomas Fraunholz*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* +*Thomas Fraunholz, Tim Köhler*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* **Abstract:** The world of open source computer vision has never been so exciting - and so challenging. With so many options available to you, what's the best way to solve your real world problem? The questions are always the same: Do I have enough data? Which model should I choose? How can I fine-tune and optimize the hyperparameters? @@ -200,13 +317,38 @@ If you want to level up your MLOps game and gain practical knowledge of the late ## Building optimized packages for conda-forge and PyPI -*Wolf Vollprecht*, *Talk (15 mins + Q&A)*, *High Performance Computing* +*Wolf Vollprecht, Bas Zalmstra*, *Talk (15 mins + Q&A)*, *High Performance Computing* **Abstract:** In this talk we're introducing a new tool to build conda packages. It has been adopted by the conda community and is being rolled out in the widely used conda-forge distribution. The new recipe format has been vetted in multiple Conda Enhancement Proposals (CEPs). We are going to introduce the exciting new features of rattler-build (reproducible builds, high speed build execution, etc.). Using some examples, we will then discuss how you can use rattler-build & conda-forge to build highly optimized packages with SIMD and CUDA support. We will also take a look at `cibuildwheel` and recent improvements in the PyPI space for CUDA. +## Improving the build config of your package + +*Ralf Gommers*, *Maintainer track*, *Scientific Applications* + +**Abstract:** Goals: + +- Share tips, tricks and best practices for configuring the build backend of a Python package with compiled (Cython/C/C++/Rust/Fortran) code +- Identify shared needs between packages, and discuss gaps in current build backends, documentation, or shared infrastructure + +Topics: + +- Goals to aim for in your build config (and how to achieve them): + - Faster builds and relevant tooling like profiling, + - Build logs that actually help when diagnosing issues, + - How to debug build failures effectively, + - How to check for and visualize build dependencies, + - Ensuring builds are reproducible, + - Approaches to reducing binary size, + - CI config ideas to guard against regressions +- Recent build-related developments & a post-distutils world +- What are the most pressing pain points for maintainers? + + + + ## Skrub: prepping tables for machine learning *Guillaume Lemaitre*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* @@ -267,23 +409,35 @@ Finally, we discuss an ambitious roadmap for improvements, new features, and pla -## Paths in Parallel: Creating Custom NetworkX Backends +## Understanding NetworkX's API Dispatching with a parallel backend *Erik Welch, Aditi Juneja*, *Talk (25 mins + Q&A)*, *Community, Education, and Outreach* -**Abstract:** Hi! Have you ever wished your Python libraries could run faster algorithms? Or wanted to fundamentally improve a python library by re-writing everything in a faster language like C or Rust? +**Abstract:** Hi! Have you ever wished your pure Python libraries were faster? Or wanted to fundamentally improve a Python library by rewriting everything in a faster language like C or Rust? Well, wish no more... NetworkX's backend dispatching mechanism redirects your plain old NetworkX function calls to a FASTER implementation present in a separate backend package by leveraging the Python's [`entry_point`](https://packaging.python.org/en/latest/specifications/entry-points) specification! + +NetworkX is a popular, pure Python library used for graph(aka network) analysis. But when the graph size increases (like a network of everyone in the world), then NetworkX algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues this backend dispatching mechanism was recently developed. In this talk, we will unveil this dispatching mechanism and its implementation details, and how we can use it just by specifying a `backend` kwarg like this: -NetworkX is a popular, pure Python library used for graph(aka network) analysis. But, when the graph size increases (like a network of everyone in the world) then networkx algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues, recently, a backend dispatch mechanism was developed. This mechanism leverages Python's package discovery mechanism using [`entry_points`](https://packaging.python.org/en/latest/specifications/entry-points) specification. + >>> nx.betweenness_centrality(G, backend=“parallel”) -In this talk, we will unveil NetworkX's API Dispatching architecture that lets us redirect the plain old NetworkX’s function call to an alternative implementation present in a backend package, just by specifying a `backend` kwarg like this: +or by passing the backend graph object(type-based dispatching): - >>> nx.betweenness_centrality(Graph, backend=“parallel”) + >>> H = nxp.ParallelGraph(G) + >>> nx.betweenness_centrality(H) -We will also go over all the implementation details and challenges of this dispatching mechanism, and then we’ll use the example of the nx-parallel as a guide to building our own custom NetworkX backend. And then using the NetworkX's testing suite to test this backend of ours. Ending with a quick dive into the details of the nx-parallel backend and a quick walk through of other backends and future ToDos. And then finally conclude with an interactive Q&A. +We'll also go over the limitations of this dispatch mechanism. Then we’ll use the example of nx-parallel as a guide to building our own custom NetworkX backend. And then, using NetworkX's existing test suite, we'll test this backend that we build. Ending with a quick dive into the details of the nx-parallel backend. +## OpenGL is dying, let's talk about WebGPU + +*Almar Klein*, *Maintainer track*, *Data Science and Visualisation* + +**Abstract:** OpenGL is old and on a path to being deprecated. Modern GPU API's like Vulkan and Metal solve most problems that plague OpenGL, and higher abstractions like wgpu / WebGPU provide a modern interface to control GPU hardware. The way that these work is much more pleasant to work with, and also provides performance benefits, especially for Python. + + + + ## Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains *Ilya Komarov*, *Talk (15 mins + Q&A)*, *Data Science and Visualisation* @@ -293,6 +447,15 @@ We will also go over all the implementation details and challenges of this dispa +## Scientific Python + +*Jarrod Millman, Stéfan van der Walt*, *Maintainer track*, *Community, Education, and Outreach* + +**Abstract:** Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec odio erat, posuere non sodales ac, iaculis quis augue. Morbi et justo tellus. Donec et est diam. Sed nec libero ac leo vehicula ornare. Proin imperdiet risus at aliquam iaculis. Nam maximus mi vitae turpis aliquet, eget mollis leo imperdiet. Nunc nec sapien ornare, cursus urna in, dignissim ante. Ut nec sapien in nunc elementum pretium. Donec varius ante quis ipsum imperdiet volutpat. Praesent tristique turpis mi, id mattis augue feugiat nec. Phasellus nec feugiat nunc. Suspendisse at felis a lorem congue sodales. Nam consectetur mi in posuere ultricies. Vestibulum nisi lacus, suscipit ac nisl id, mattis faucibus libero. + + + + ## Regularizing Python using Structured Control Flow *Valentin Haenel*, *Talk (25 mins + Q&A)*, *High Performance Computing* @@ -330,6 +493,30 @@ You will learn how to manipulate NumPy arrays, understand how they store data an +## A Hitchhiker's Guide to Contributing to Open Source + +*Sebastian Berg, Nikoleta E. Glynatsi*, *Tutorial*, *Community, Education, and Outreach* + +**Abstract:** Open-source projects are essential for scientific programming. They provide many tools and resources that can be customized for different scientific needs. However, sometimes the existing tools in a package don't meet all the requirements of a project. This is when contributing to open-source packages becomes important. By contributing, you can implement new functionalities, improve the software and help keep the open-source community strong. + +This workshop will make contributing to open-source projects easier to understand. It will guide participants from just using the software to actively contributing to it. The workshop will address technical challenges such as interacting with web-based hosting services (like GitHub and GitLab), branching, and opening pull requests. Additionally, it will cover how to contribute documentation and ensure the correctness of the code. + + + + +## Mostly Harmless Fixed Effects Regression in Python with PyFixest + +*Alexander Fischer*, *Talk (25 mins + Q&A)*, *Data Science and Visualisation* + +**Abstract:** This session introduces PyFixest, an open source Python library inspired by the "fixest" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap and randomization inference. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption. + +PyFixest aims to faithfully replicate the core design principles of "fixest", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects. + +The presentation will argue why there is a need for another regression package in Python, cover PyFixest's functionality and design philosophy, and discuss future development prospects. + + + + ## Probabilistic classification and cost-sensitive learning with scikit-learn *Guillaume Lemaitre, Olivier Grisel*, *Tutorial*, *Machine and Deep Learning* @@ -343,6 +530,15 @@ More specifically, we will introduce the concepts of calibrated probabilistic cl +## Helmholtz Blablador and the LLM models' ecosystem + +*Alexandre Strube*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* + +**Abstract:** Helmholtz Blablador is the LLM inference server from the Helmholtz Foundation. This talk explores Blablador's role in hosting open-source LLM models and models developed in-house at the Juelich Supercomputing Centre (JSC). This talk is about Blablador and the open source LLM models' ecosystem. + + + + ## forecasting foundation models: evaluation and integration with sktime – challenges and outcomes *Franz Kiraly*, *Talk (25 mins + Q&A)*, *Machine and Deep Learning* @@ -361,3 +557,28 @@ Our key challenges and findings are presented in this talk – for instance, the + +## Introduction to Python + +*Mojdeh Rastgoo*, *Tutorial*, *Community, Education, and Outreach* + +**Abstract:** This tutorial will provide an introduction to Python intended for beginners. + +It will notably introduce the following aspects: + +- built-in types +- controls flow (i.e. conditions, loops, etc.) +- built-in functions +- basic Python class + + + + +## Image analysis in Python with scikit-image + +*Lars Grüter, Marianne Corvellec, Stéfan van der Walt*, *Tutorial*, *Data Science and Visualisation* + +**Abstract:** Scientists are producing more and more images with telescopes, microscopes, MRI scanners, etc. They need automatable tools to measure what they've imaged and help them turn these images into knowledge. This tutorial covers the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions. + + + diff --git a/poster_parser/euroscipy-2024_confirmed.csv b/poster_parser/euroscipy-2024_confirmed.csv new file mode 100644 index 0000000..8e65437 --- /dev/null +++ b/poster_parser/euroscipy-2024_confirmed.csv @@ -0,0 +1,216 @@ +ID,Proposal title,Session type,Track,Abstract,Speaker names +3HCQFS,"napari: multi-dimensional image visualization, annotation, and analysis in Python",Talk (25 mins + Q&A),Data Science and Visualisation,"Napari is an interactive n-dimensional image viewer for Python. It is able to rapidly render and interactively visualize almost any array like image data. Additionally, napari can overlay derived data, such as segmentations, points, polygons, surfaces and more. Each of these data exists as a layer in the napari viewer, which allows fine control over how the data is displayed. Furthermore, derived data can be edited. Together with the capability of writing plugins, napari lets you seamlessly weave exploration, computation, and annotation in common and custom image analysis workflows.","Grzegorz Bokota, Wouter-Michiel Vierdag" +3K8ZXN,"LPython: Novel, Fast, Retargetable Python Compiler",Talk (25 mins + Q&A),High Performance Computing,"Python is one of the most used languages today, known for its simplicity and versatile ecosystem. For performance applications such as High Performance Computing (HPC) or any other kind of numerical computing the standard CPython implementation is often not fast enough. To address these issues, enter the fascinating world of LPython, a Python compiler designed to give you the best possible performance for numerical, array-oriented code, and can also generate code using multiple backends like LLVM, C, C++, WASM.",Naman Gera +3RENPJ,fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs,Talk (25 mins + Q&A),Data Science and Visualisation,"Fast interactive visualization remains a considerable barrier in analyses pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general purpose scientific plotting library that is useful for the fast and live visualization and analysis of complex datasets.","Kushal Kolar, Caitlin Lewis" +7SKUEN,Combining Python and Rust to create Polars Plugins,Tutorial,Data Science and Visualisation,"Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practically any problem. + +No prior Rust experience required, intermediate Python or general programming experience required. By the end of the session, you will know how to write your own Polars Plugin! This talk is aimed at data practitioners.",Marco Gorelli +7YUTTK,Validation of Association - implementation in Python library,Poster,Data Science and Visualisation,"During out talk, we would like to present a Python library, that implements ideas proposed in the paper ""Validation of Association"" by Ćmiel Bogdan and Ledwina Teresa. The authors introduce a novel function-valued measure of dependence known as the quantile dependence function. This measure plays a pivotal role in constructing tests for independence and allows for easily interpretable diagnostic plots that highlight deviations from the null model. The quantile dependence function is specifically designed to identify general dependence structures between variables within different quantiles of their joint distribution. The authors develop new estimators for the dependence function and utilize them to devise innovative tests for independence.",Mateusz +893KBK,Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI,Talk (25 mins + Q&A),Scientific Applications,"Simulators play a crucial role in scientific research, but accurately determining their parameters to reproduce observed data remains a significant challenge. Classical parameter inference methods often struggle due to the stochastic or black-box nature of these simulators. Simulation-based inference (SBI) offers a solution by enabling Bayesian parameter inference for simulation-based models: It only requires simulated data as input and returns a posterior distribution over suitable model parameters, including uncertainty estimates and parameter interactions. In this talk, we introduce SBI and present [`sbi`](https://sbi-dev.github.io/sbi/), an open source library that serves as a central resource for SBI practitioners and researchers, offering state-of-the-art SBI algorithms, comprehensive documentation and tutorials.",Jan Boelts (Teusen) +89BJ9Q,Accelerating Python on HPC with Dask,Talk (25 mins + Q&A),High Performance Computing,"Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives. + +In this session we will dive into the many ways to deploy Dask workloads on HPC, and how to choose the right method for your workload. Then we will dig into the accelerated side of Dask and how you can leverage GPUs with RAPIDS and Dask CUDA and use UCX to take advantage of accelerated networking like Infiniband and NVLink.",Jacob Tomlinson +89KK7L,"Using the Array API to write code that runs with Numpy, Cupy and PyTorch",Tutorial,High Performance Computing,"Python code that works with Numpy, Cupy and PyTorch arrays? Use a GPU when possible, but fallback to using a CPU if there is none? We will show you how you can write Python code that can do all of the above. The not so secret ingredient to do this is the Array API. In this workshop you will learn what the Array API is and how to use it to write programs that can take any compatible array as input.","Tim Head, Sebastian Berg" +8MXPRW,Dispatching and Backend Selection Discussion,Maintainer track,High Performance Computing,"In this session, we would like to briefly review the successful [`NetworkX` backend selection](https://networkx.org/documentation/stable/reference/backends.html) and work towards a possible future dispatching project under the Scientific Python umbrella, [`spatch`](https://github.com/scientific-python/spatch).",Sebastian Berg +8NJGVH,From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale,Talk (15 mins + Q&A),Machine and Deep Learning,"The availability of AI models and packages in the Python ecosystem has revolutionized many applications across domains. This talk discusses infrastructural decisions and best practices that bridge the gap between interactive data analyses in notebooks and production applications at a reasonable scale, suitable for both commercial and scientific contexts. In particular, the talk introduces the on-premises, Python-based AI architecture employed at MDPI, one of the largest open-access publishers. The presentation emphasizes the impact of the design on reproducibility, decoupling of different resources, and ease of use during the development and exploration phases.",Frank Sauerburger +8NL9R3,Introduction to matplotlib for Data Visualization with Python,Tutorial,Data Science and Visualisation,"_matplotlib_ is a library for creating visualizations with Python which ""...makes easy things easy and hard things possible"" (https://matplotlib.org/). This tutorial, intended for beginners, will introduce the library and explain core concepts as well as the main interfaces. Starting with styling simple point data plots, we will explain how to work with several dimensions, shared axes and advanced styling options using _rcParams_. After completing this tutorial, participants will hopefully be equipped with a thorough understanding of _matplotlib_ to navigate the ""hard things"" in the world of data visualization.",Nefta Kanilmaz +8WL8GX,Multi-dimensional arrays with Scipp,Tutorial,High Performance Computing,"Inspired by xarray, Scipp enriches raw NumPy-like multi-dimensional data arrays by adding named dimensions and associated coordinates. For an even more intuitive and less error-prone user experience, Scipp adds physical units to arrays and their coordinates. Through this tutorial, participants will learn about the basics of modelling their data with the Scipp library and using in built tools in Scipp for scientific data analysis. + +One of Scipp's key features is the possibility of using multi-dimensional non-destructive binning to sort record-based ""tabular""/""event"" data into arrays of bins. This provides fast and flexible binning, rebinning, and filtering operations, all while preserving the original individual records. + +Scipp ships with data display and visualization features for Jupyter notebooks, including a powerful plotting interface. Named Plopp, this tool uses a graph of connected nodes to provide interactivity between multiple plots and widgets, requiring only a few lines of code from the user. + +Scipp is available via pip and conda and runs on Linux, Mac and Windows.",Mridul Seth +93QFDT,Revolutionizing Enterprise Operations: Innovative Uses of AI Agents,Poster,Machine and Deep Learning,"Unlock the transformative power of AI agents in enterprise environments! This session explores how AI can revolutionize business operations by automating complex tasks, enhancing decision-making, and personalizing customer interactions. Through real-world case studies and practical insights, attendees will learn advanced techniques for developing and integrating AI agents to drive efficiency and innovation. Whether you're a technical expert or a business leader, gain the tools and knowledge to harness AI for significant operational improvements.",Adwaith T A +9EUT78,Just contribute?!,Keynote,"Community, Education, and Outreach","Open source software is here for everyone - but how are we making sure that everyone has equal access? +In this keynote I will discuss how to lower barriers of entry for new contributors - and the many facets to this: documentation, community, guidelines, and tools. +I will share my personal motivations for contributing to open-source software and my journey over the past five years and all of its learnings.",Wolf Vollprecht +9J8RG8,Visualizing and debugging tensors with aesthetic-tensor,Poster,"Community, Education, and Outreach","Researchers and ML Engineers working hands-on with tensors frequently have to inspect and visualize the contents of tensors in order to understand what they hold. Aesthetic-tensor is a Python library that works for PyTorch and NumPy that abstracts away tensor visualizations in an intuitive and easy-to-learn API. In this tutorial, we're going to explain the core concept of the API and walk through a few practical examples of how to use the library.",Iliya Zhechev +9NEHUM,Using time series to detect anomalies in a wide area network environment. Is it possible to solve such a problem using Python?,Poster,Scientific Applications,"Time series issues may seem destined only for a narrow group of data scientists, but in fact they can make the work of administrators, support people, researchers or developers themselves easier. + +In the era of ubiquitous IoT, and the consequent intensive use of telemetry, time series make it possible to analyze the timing of events, correlate the dynamics of their occurrence, and ultimately facilitate the understanding of these phenomena. + +Using Python, time series can be successfully analyzed, and when the amount of data exceeds the amount of available RAM, Python can use time series databases. + +My experience allows me to present cases from the practical use of time series to detect anomalies in the operation of equipment, often preceding failures.",Paweł Żal +BCAUKU,Decorators - A Deep Dive,Tutorial,Scientific Applications,"Python offers decorator to implement re-usable code for cross-cutting task. +The support the separation of cross-cutting concerns such as logging, caching, +or checking of permissions. +This can improve code modularity and maintainability. + +This tutorial is an in-depth introduction to decorators. +It covers the usage of decorators and how to implement simple and more advanced +decorators. +Use cases demonstrate how to work with decorators. +In addition to showing how functions can use closures to create decorators, +the tutorial introduces callable class instance as alternative. +Class decorators can solve problems that use be to be tasks for metaclasses. +The tutorial provides uses cases for class decorators. + +While the focus is on best practices and practical applications, the tutorial +also provides deeper insight into how Python works behind the scene. +After the tutorial participants will feel comfortable with functions that take +functions and return new functions.",Mike Müller +BQCN3F,🧪 `classy-bench`: a low-code library for quickly training and evaluating model baselines for Multi Label Classification applications,Poster,Machine and Deep Learning,"`classy-bench` is a low-code Python library that simplifies the process of training and evaluating baseline models for real-world Multi-Label Classification applications. Simply provide your datasets, and quickly get a benchmark of multiple models tailored to your specific use case. This talk will introduce the library and demonstrate its ease of use through examples.",Edoardo Abati +BXNEY8,The joys and pains of reproducing research: An experiment in bioimaging data analysis,Talk (25 mins + Q&A),Data Science and Visualisation,"The conversation about reproducibility is usually focused on how to make research workflows (more) reproducible. Here, we consider it from the opposite perspective, and ask: How feasible is it, in practice, to reproduce research which is meant to be reproducible? Is it even done or attempted? We provide a detailed account of such an attempt, trying to reproduce some segmentation results for 3D microscopy images of a developing mouse embryo. The original research is a monumental work of bioimaging and analysis at the single-cell level, published in *Cell* in 2018, alongside with all the necessary research artifacts. Did we succeed in this attempt? As we share the joys and pains of this journey, many questions arise: How do reviewers assess the reproducibility claims exactly? Incentivizing reproducible research is still an open problem, since it is so much more costly (in time) to produce. And how can we incentivize those who test reproducibility? Not only is it costly to set up computational environments and execute data-intensive scientific workflows, but it may not appear as rewarding at first thought. In addition, there is a human factor: It is thorny to show authors that their publication does not hold up to their reproducibility claims.",Marianne Corvellec +BYESWT,Building robust workflows with strong provenance,Tutorial,Scientific Applications,"In computational science, different software packages are often glued together as scripts to perform numerical experiments. With increasing complexity, these scripts become unmaintainable, prone to crashes, hard to scale up and to collaborate on. AiiDA solves these problems via a powerful workflow engine and by keeping provenance for the entire workflow. In this tutorial, we learn how to create dynamic workflows combining together different executables that automatically can restart from failed runs and reuse results from completed calculations via caching.","Alexander Goscinski, Julian Geiger, Ali Khosravi" +BYTCSC,The Array API Standard in SciPy,Talk (15 mins + Q&A),High Performance Computing,"The array API standard is unifying the ecosystem of Python array computing, facilitating greater interoperability between array libraries, including NumPy, CuPy, PyTorch, JAX, and Dask. Find out how we are using it in SciPy to bring support for hardware-accelerated (e.g. GPU) and distributed arrays to our users, and how you can do the same in your library.",Lucas Colley +CBSXQN,Data augmentation with Scikit-LLM,Talk (15 mins + Q&A),Machine and Deep Learning,"Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization.",Claudio G. Giancaterino +CDB9NG,NumPy's new DType API and 2.0 transition,Maintainer track,Scientific Applications,"NumPy 2 had some significant changes in its API and required many downstream libraries and users to adapt. +One of the larger new features is that the new DType API is now public. This C-API allows more powerful user defined DTypes, for which the new StringDType is an example. In the first part, I will give a brief overview of this API. + +Since many downstream projects needed to adapt and publish new versions, in the second part I recap the current and past difficulties in transitioning to NumPy 2. This part of the session will be a forum for open discussion to gauge the challenges faced by users in making this transition.",Sebastian Berg +CETWRS,From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints,Talk (25 mins + Q&A),"Community, Education, and Outreach","Many scientific Python packages are ""stringly typed,"" i.e., using strings to select algorithms or methods and dictionaries for configuration. While easy for beginners and convenient for authors, these libraries miss out on static typing benefits like error detection before runtime and autocomplete. This talk shares insights from redesigning the optimagic library from the ground up with static typing in mind. Without compromising on simplicity, we achieve better static analysis, autocomplete, and fewer runtime errors. The insights are not specific to numerical optimization and apply to a wide range of scientific Python packages.",Janos Gabler +DF3VHU,"sktime - python toolbox for time series – introduction and new features 2024: foundation models, deep learning backends, probabilistic models, changepoints and segmentation",Tutorial,Machine and Deep Learning,"sktime is the most widely used scikit-learn compatible framework library for learning with time series. sktime is maintained by a neutral non-profit under permissive license, easily extensible by anyone, and interoperable with the python data science stack. + +This tutorial gives a hands-on introduction to sktime, for common time series learning tasks such as forecasting, and an overview of different model categories, pipeline building, feature engineering, model tuning, and autoML. + +The tutorial also showcases newest features in 2024, including support for foundation models, hugging face connectors, probabilistic models, categorical features, anomaly and changepoint detection, time series segmentation.",Franz Kiraly +DLYRXH,Free-threaded (aka nogil) CPython in the Scientific Python ecosystem : status and road ahead,Talk (25 mins + Q&A),Machine and Deep Learning,"CPython 3.13 will be released in October 2024 and has been in beta since May 2024. One of its most awaited features is the possibility to remove the GIL (Global Interpreter Lock) through a compile-time flag. + +In this talk we will explain the relevance of free-threaded CPython for the Scientific Python ecosystem, what already works, some of the caveats, and how to try it out on your favourite use case. + +In particular we will discuss: +- the historic effort in the scikit-learn project to add Continuous Integration for the `nogil` fork of CPython 3.9, and the kind of issues that were surfaced +- the ongoing effort in the Scientific Python ecosystem (Numpy, Scipy, scikit-learn, etc ...) to test free-threaded CPython 3.13 and fix issues along the way +- how a typical scikit-learn grid-search use case can benefit from free-threaded CPython +- how to try out free-threaded CPython on your favourite use case +- possible future developments",Loïc Estève +E8HD9K,The Parallel Universe in Python - A Time Travel to Python 3.13 and beyond,Talk (25 mins + Q&A),High Performance Computing,"Parallel computing is essential for many performance-critical applications. Python provides many solutions for this problem. New versions of Python will support sub-interpreters and a, currently experimental, free-threading version without the Global Interpreter Lock (GIL). + +This talk starts with a short overview over this topic, clarifying terms such parallel, concurrent, and distribute computing as well as CPU-bound, memory-bound, and IO-bound problems. The presentation explains how Python and its standard library support parallel programming tasks. In addition, many Python libraries provide very useful approaches and tools for parallel computing. An overview of important libraries provides guidance which library can be used for what type of parallel problem. + +How do Python's new features such as sub-interpreters and free-threading without the Global Interpreter Lock (GIL) impact parallel Programming in Python? This talk address this question by providing examples where these features might help to make programs simpler and/or faster.",Mike Müller +EGSCDK,pycodehash: boost your pipeline by skipping all unchanged steps!,Poster,Data Science and Visualisation,"Data pipelines are of paramount importance in data science, engineering and analysis. Often, there are parts of the pipeline that have not changed. Recomputing these nodes is wasteful, especially for larger datasets. PyCodeHash is a novel generic data and Python code hashing library that facilitates downstream caching.","Simon Brugman, Ralph" +GKYTSY,wgpu and pygfx: next-generation graphics for Python,Talk (25 mins + Q&A),Data Science and Visualisation,"This talk introduces a new render engine for Python, called pygfx (pronounced ""py-graphics""). Its purpose is to bring powerful and reliable visualization to the Python world. Since pygfx is built on wgpu, it has superior performance and reliability compared to OpenGL-based solutions. It is also designed to be versatile: with its modular architecture, one can assemble graphical scenes for diverse applications, ranging from scientific visualization to video games.",Almar Klein +GQN8AF,The Mission Support System and its use in planning an aircraft campaign,Talk (15 mins + Q&A),Scientific Applications,"The Mission Support System (MSS) is an open source software package that has been used for planning flight tracks of scientific aircraft in multiple measurement campaigns during the last decade. It consists of many components, a data-retrieval tool chain, a wms server which creates 2-D figures from 4-D meterogical data. A client application for displaying the figures in combination with the planned flight track and other data. For data exchange between participants a collaboration server is used. The talk describes how we used these components for a campaign.",Reimar Bauer +H9XRN9,Mastering Python Performance: Advanced Techniques for Efficiency,Poster,High Performance Computing,"Delve into the nuanced realm of Python optimization in this comprehensive session, where we uncover advanced strategies to maximize code performance. From profiling tools like cProfile and Memory Profiler to sophisticated techniques for data handling and parallel processing, we'll explore how to minimize resource consumption and boost execution speed. Geared towards both seasoned developers and newcomers, this talk promises to equip you with the skills to transform your Python projects into models of efficiency and high performance.",Adwaith T A +HCMV78,A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia,Talk (15 mins + Q&A),Data Science and Visualisation,"This presentation introduces a Qdrant vector DB and Specter2 model used to identify whether a rejected academic manuscript is later published in a competing journal. Our method combines AI, data science and analytics to ensure a good identification of manuscripts and authors. The findings offer insights into resubmission patterns, enhancing our understanding of academic publishing dynamics. The system is implemented in Python.",Daniele Raimondi +HKVEXW,Reproducible workflows with AiiDA - The power and challenges of full data provenance,Talk (25 mins + Q&A),Scientific Applications,"AiiDA is a workflow manager with a strong focus on reproducibility through automated data provenance. In this talk we discuss what it means to have full “data provenance” for scientific workflows, the advantages it offers, but also the challenges it represents for new users and how we deal with them.","Marnik Bercx, Xing Wang" +HY7QYG,Architecting Science Tools: A Roadmap for Turning Theory and Data Projects into Python Packages,Tutorial,Data Science and Visualisation,"This workshop aims to address the gap between the development of technical work--whether research or data science--and its reproducibility by providing participants with the necessary knowledge to get started creating Python packages. This means that, if you're a researcher (with basic Python knowledge) wanting to make your theories more accessible, or a data professional wanting to share your Python code inside or outside of your organization, this workshop will help you understand how to contribute to, and develop, open-source projects from scratch.",Ramon Perez +JFATCJ,10 Years of Open Source: Navigating the Next AI Revolution,Keynote,"Community, Education, and Outreach","A lot has been happening in the field of AI and Natural Language Processing: there's endless excitement about new technologies, sobering post-hype hangovers and also uncertainty about where the field is heading next. In this talk, I'll share the most important lessons we've learned in 10 years of working on open source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs",Ines Montani +JWAMDE,A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying,Talk (25 mins + Q&A),Machine and Deep Learning,"The world of open source computer vision has never been so exciting - and so challenging. With so many options available to you, what's the best way to solve your real world problem? The questions are always the same: Do I have enough data? Which model should I choose? How can I fine-tune and optimize the hyperparameters? + +In collaboration with the German Aerospace Center, we investigated these questions to develop a model for quality assurance of CFRP tape laying, with only a small real data set fresh from production. We are very pleased to present a machine learning setup that can empirically answer these questions. Not only for us, but also for you - our setup can easily be transferred to your application! + +Dive with us into the world of Open Source machine learning tools that are perfectly tailored for your next project. Discover the seamless integration of Hugging Face Model Hub, DvC and Ray Tune. You'll also gain unique insights into the fascinating world of CFRP tape laying, specifically how well different architectures of open source models perform on our small dataset. + +If you want to level up your MLOps game and gain practical knowledge of the latest computer vision models and practices, this talk is a must for you. Don't miss the opportunity, and look forward to your next computer vision projects!","Thomas Fraunholz, Tim Köhler" +JXB79J,Building optimized packages for conda-forge and PyPI,Talk (15 mins + Q&A),High Performance Computing,"In this talk we're introducing a new tool to build conda packages. It has been adopted by the conda community and is being rolled out in the widely used conda-forge distribution. The new recipe format has been vetted in multiple Conda Enhancement Proposals (CEPs). We are going to introduce the exciting new features of rattler-build (reproducible builds, high speed build execution, etc.). Using some examples, we will then discuss how you can use rattler-build & conda-forge to build highly optimized packages with SIMD and CUDA support. We will also take a look at `cibuildwheel` and recent improvements in the PyPI space for CUDA.","Wolf Vollprecht, Bas Zalmstra" +KCXHPA,Gatherer: Insight Revelation from Diplomatic Archives,Poster,Scientific Applications,"Gatherer is an open-source Python tool designed to enhance the research efficiency of political scientists and historians conducting extensive archival research on historical diplomatic records. This tool helps digitization of physical records, converts unstructured data into structured tabular formats, generates summaries, extracts metadata, and identifies key named entities. These capabilities streamline the primary source collection and synthesis processes, significantly improving the research workflow.",Egemen Bezci +LDVMAQ,SpatialData: a FAIR framework for multimodal spatial omics,Poster,High Performance Computing,"Spatial omics data generation displaying DNA, RNA and protein within their spatial context has been tremendously increasing in the last couple years. This has led to challenges for bioinformaticians tasked with analysing the data due to among others data size and a plethora of different formats being used by different researchers. Hence the need for highly performant findable, accessible, interoperable and reusable (FAIR) representation of this bioimaging data. +For this we developed the SpatialData framework, a solution that combines an on-disk format, the SpatialData format, with a set of Python libraries for accessing and operating on spatial omics data, and tools for interactive data annotation and visualization.The SpatialData library seamlessly integrates with the existing Python ecosystem by building upon standard scientific Python data types, such as xarray, dask, geopandas and anndata. Thereby providing a flexible, community standards-based, open framework to store, process, and annotate data from virtually any spatial omics technology available to date. With the simplified and interoperable data representation, the ability to easily create unified coordinate systems, and the numerous downstream analysis capabilities, it can facilitate the development, reproducibility and reuse of analysis pipelines, and ultimately unlock new approaches to unfold scientific questions.","Wouter-Michiel Vierdag, Luca Marconato" +LMSJ8Z,Improving the build config of your package,Maintainer track,Scientific Applications,"Goals: + +- Share tips, tricks and best practices for configuring the build backend of a Python package with compiled (Cython/C/C++/Rust/Fortran) code +- Identify shared needs between packages, and discuss gaps in current build backends, documentation, or shared infrastructure + +Topics: + +- Goals to aim for in your build config (and how to achieve them): + - Faster builds and relevant tooling like profiling, + - Build logs that actually help when diagnosing issues, + - How to debug build failures effectively, + - How to check for and visualize build dependencies, + - Ensuring builds are reproducible, + - Approaches to reducing binary size, + - CI config ideas to guard against regressions +- Recent build-related developments & a post-distutils world +- What are the most pressing pain points for maintainers?",Ralf Gommers +MFF7GE,Skrub: prepping tables for machine learning,Talk (25 mins + Q&A),Machine and Deep Learning,"When it comes to designing machine learning predictive models, it is reported that data scientists spend over 80% of their time preparing the data to input to the machine learning algorithm. + +Currently, no automated solution exists to address this problem. However, the `skrub` Python library is here to alleviate some of the daily tasks of data scientists and offer an integration with the `scikit-learn` machine learning library. + +In this talk, we provide an overview of the features available in `skrub`. + +First, we focus on the preprocessing stage closest to the data sources. While predictive models usually expect a single design matrix and a target vector (or matrix), in practice, it is common that data are available from different data tables. It is also possible that the data to be merged are slightly different, making it difficult to join them. We will present the `skrub` joiners that handle such use cases and are fully compatible with `scikit-learn` and its pipeline. + +Then, another issue widely tackled by data scientists is dealing with heterogeneous data types (e.g., dates, categorical, numerical). We will present the `TableVectorizer`, a preprocessor that automatically handles different types of encoding and transformation, reducing the amount of boilerplate code to write when designing predictive models with `scikit-learn`. Like the joiner, this transformer is fully compatible with `scikit-learn`.",Guillaume Lemaitre +MPMRUZ,Introduction to Polars: Fast and Readable Data Analysis,Tutorial,Data Science and Visualisation,"Polars is a new, powerful library for doing analysis on structured data. The library focuses on processing speed and a consistent and intuitive API. This tutorial will help you get started with Polars, by showing you how to read and write data and manipulate it with Polars' powerful expression syntax. You'll learn about how the lazy API is an important key to Polars' efficiency.",Geir Arne Hjelle +NGECXK,Optimagic: Can we unify Python's numerical optimization ecosystem?,Talk (25 mins + Q&A),"Community, Education, and Outreach","Python has many high quality optimization algorithms but they are scattered across many different packages. Switching between packages is cumbersome and time consuming. Other languages are ahead of Python in this respect. For example, `Optimization.jl` provides a unified interface to more than 100 optimization algorithms and is widely accepted as a standard interface for optimization in Julia. + +In this talk, we take stock of the existing optimization ecosystem in Python and analyze pain points and reasons why no single package has emerged as a standard so far. We use these findings to derive desirable features a Python optimization package would need to unify the ecosystem. + +We then present optimagic, a NumFocus affiliated Project with the goal of unifying the Python optimization ecosystem. Optimagic provides a common interface to optimization algorithms from scipy, NlOpt, pygmo, and many other libraries. The minimize function feels familiar to users of scipy.optimize who are looking for a more extensive set of +supported optimizers. Advanced users can use optional arguments to configure every aspect of the optimization, create a persistent log file, turn local optimizers global with a multistart framework, and more. + +Finally, we discuss an ambitious roadmap for improvements, new features, and planned community activities for optimagic.",Janos Gabler +NH7LGF,Conformal Prediction with MAPIE: A Journey into Reliable Uncertainty Quantification,Talk (15 mins + Q&A),Data Science and Visualisation,"In the ever-evolving landscape of data science, accurate uncertainty quantification is crucial for decision-making processes. Conformal Prediction (CP) stands out as a powerful framework for addressing this challenge by providing reliable uncertainty estimates alongside predictions. In this talk, I'll delve into the world of Conformal Prediction, with a focus on the MAPIE Python library, offering a comprehensive understanding of its advantages and practical applications.",Claudio G. Giancaterino +NYPQ9E,Streamlining Strain-Stress Analysis with Pydidas for XRD experiments,Poster,Scientific Applications,"X-ray diffraction (XRD) reveals atomic structures in a variety of materials, from chocolate and biomaterials, like bone, to hard coatings for CNC machining tools. Synchrotron radiation facilities constantly strive to engage new scientific communities. However, attracting new user groups can be challenging, as many are not familiar with XRD. Pydidas bridges this gap by offering a comprehensive framework for XRD analysis, featuring data processing, analysis, and visualization tools. It efficiently handles complex HDF5 files and provides near real-time feedback.  +We introduce a new Pydidas workflow for X-ray diffraction-based strain-stress analysis, combining crystallographic data with mechanical properties. This integration aids in optimizing the material design by understanding how the microstructure influences macroscopic behaviour, advancing materials development in mechanical and aerospace engineering.",Gudrun Lotze +PFVX9L,Federated Learning: Where we are and where we need to be,Talk (25 mins + Q&A),Machine and Deep Learning,"In this talk, we'll review the landscape of open-source federated learning libraries with a lens on actual real world data problems, use cases and actors who could benefit from federated learning. We'll then analyze gaps, weaknesses and explore new ways we could formulate federated learning problems (and their associated libraries!) to build more useful software and use decentralized machine learning in real world use cases.",Katharine Jarmul +QLVBYY,Understanding NetworkX's API Dispatching with a parallel backend,Talk (25 mins + Q&A),"Community, Education, and Outreach","Hi! Have you ever wished your pure Python libraries were faster? Or wanted to fundamentally improve a Python library by rewriting everything in a faster language like C or Rust? Well, wish no more... NetworkX's backend dispatching mechanism redirects your plain old NetworkX function calls to a FASTER implementation present in a separate backend package by leveraging the Python's [`entry_point`](https://packaging.python.org/en/latest/specifications/entry-points) specification! + +NetworkX is a popular, pure Python library used for graph(aka network) analysis. But when the graph size increases (like a network of everyone in the world), then NetworkX algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues this backend dispatching mechanism was recently developed. In this talk, we will unveil this dispatching mechanism and its implementation details, and how we can use it just by specifying a `backend` kwarg like this: + + >>> nx.betweenness_centrality(G, backend=“parallel”) + +or by passing the backend graph object(type-based dispatching): + + >>> H = nxp.ParallelGraph(G) + >>> nx.betweenness_centrality(H) + +We'll also go over the limitations of this dispatch mechanism. Then we’ll use the example of nx-parallel as a guide to building our own custom NetworkX backend. And then, using NetworkX's existing test suite, we'll test this backend that we build. Ending with a quick dive into the details of the nx-parallel backend.","Erik Welch, Aditi Juneja" +QMV8P3,"OpenGL is dying, let's talk about WebGPU",Maintainer track,Data Science and Visualisation,"OpenGL is old and on a path to being deprecated. Modern GPU API's like Vulkan and Metal solve most problems that plague OpenGL, and higher abstractions like wgpu / WebGPU provide a modern interface to control GPU hardware. The way that these work is much more pleasant to work with, and also provides performance benefits, especially for Python.",Almar Klein +RHUANH,A modular interface for visualization and pre-processing for multi-channel signals,Poster,Data Science and Visualisation,"Filtering and having an overview of data are essential steps before feature extraction or any analytical process. This interface can provide a robust foundation to support this processing stage. The key word: modularity. + +Built in the context of High Density Electromyography processing, this interface developed with PyQT, Pyqtgraph, Xarray/Datatree and Jinja, offers multi-channel visualizations in temporal and frequency domains. It enables users to generate reports with specific metrics, create and customize processing pipelines. Designed to be as modular as possible, the interface allows users to incorporate their own features that fit their datasets by following a standardized code architecture.",Anais Monteils +SPUZPK,Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains,Talk (15 mins + Q&A),Data Science and Visualisation,"Bayesian optimization is a powerful technique for optimizing black-box, costly-to-evaluate functions, widely applicable across diverse fields. However, Gaussian process (GP) models commonly used in Bayesian optimization struggle with functions defined on categorical or mixed domains, limiting optimization in scenarios with numerous categorical inputs. In this talk, we present a solution by leveraging ensemble models for probabilistic modelling, providing a robust approach to optimize functions with categorical inputs. We showcase the effectiveness of our method through a Bayesian optimization setup implemented with the BoTorch library, utilizing probabilistic models from the XGBoostLSS framework. By integrating these tools, we achieve efficient optimization on domains with categorical variables, unlocking new possibilities for optimization in practical applications.",Ilya Komarov +SYXTMZ,Scientific Python,Maintainer track,"Community, Education, and Outreach","Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec odio erat, posuere non sodales ac, iaculis quis augue. Morbi et justo tellus. Donec et est diam. Sed nec libero ac leo vehicula ornare. Proin imperdiet risus at aliquam iaculis. Nam maximus mi vitae turpis aliquet, eget mollis leo imperdiet. Nunc nec sapien ornare, cursus urna in, dignissim ante. Ut nec sapien in nunc elementum pretium. Donec varius ante quis ipsum imperdiet volutpat. Praesent tristique turpis mi, id mattis augue feugiat nec. Phasellus nec feugiat nunc. Suspendisse at felis a lorem congue sodales. Nam consectetur mi in posuere ultricies. Vestibulum nisi lacus, suscipit ac nisl id, mattis faucibus libero.","Jarrod Millman, Stéfan van der Walt" +TVUQ3U,"Project-Based Python Training Resources for AI Scientists, Engineers, and Developers",Poster,"Community, Education, and Outreach","In this talk, I will discuss about the significance of hands-on project based training resources for Python learners of all levels who aspire to work in the latest domains of machine learning, deep learning and generative AI algorithms and applications. I will present the steps of development of hands-on training modules for python programming and AI enthusiasts. These resources are aimed to help learners in mastering key concepts while developing applied technical skills that are essential in undertaking latest AI based projects in industry and academia. In this talk I will discuss how python programmers and AI developers at all knowledge levels may use and benefit from such hands-on training approaches as well as how currently there is a huge need for developing project based learning modules in the domains of python programming, machine learning, generative AI and data science. As a creator of several such practical learning modules myself, I will do a walk through of the procedural steps involved in building these kind of learning curriculums and highlight the need and benefits of such practical, hands-on training methodologies.","Anuradha Kar, PhD" +U3EMKF,Regularizing Python using Structured Control Flow,Talk (25 mins + Q&A),High Performance Computing,"In this talk we will present applied research and working code to regularize +Python programs using a Structured Control Flow Graph (SCFG). This is a novel +approach to rewriting programs at the source level such that the resulting +(regularized) program is potentially more amenable to compiler optimizations, +for example when using Numba[1] to compile Python. The SCFG representation of +a program is simpler to analyze and thus significantly easier to optimize +because the higher order semantic information regarding the program structure +is explicitly included. This can be of great benefit to many scientific +applications such as High Performance Computing (HPC), a discipline that relies +heavily on compiler optimizations to turn user source code into highly +performant executables. Additionally the SCFG format is a first step to +representing Python programs as Regionalized Value State Dependence Graphs +(RVSDGs). This is another recently proposed program representation which is +expected to unlock even more advanced compiler optimizations at the +Intermediary Representation (IR) level. The talk will cover an introduction to +the theory of SCFGs and RVSDG and demonstrate how programs are transformed. We +will start with simple Python programs containing control-flow constructs and +then show both the SCFG representation and the resulting regularized result to +illustrate the transformations.",Valentin Haenel +UDVD77,Introduction to NumPy,Tutorial,Data Science and Visualisation,"Are you starting to use Python for scientific computing? Join this tutorial to know more about NumPy, the building block for nearly all libraries in the scientific ecosystem. +You will learn how to manipulate NumPy arrays, understand how they store data and discover how to get optimal performances. By the end of this tutorial, you will be able to start working with NumPy and know the main pitfalls to avoid.",Sarah Diot-Girard +UF7LM8,A Hitchhiker's Guide to Contributing to Open Source,Tutorial,"Community, Education, and Outreach","Open-source projects are essential for scientific programming. They provide many tools and resources that can be customized for different scientific needs. However, sometimes the existing tools in a package don't meet all the requirements of a project. This is when contributing to open-source packages becomes important. By contributing, you can implement new functionalities, improve the software and help keep the open-source community strong. + +This workshop will make contributing to open-source projects easier to understand. It will guide participants from just using the software to actively contributing to it. The workshop will address technical challenges such as interacting with web-based hosting services (like GitHub and GitLab), branching, and opening pull requests. Additionally, it will cover how to contribute documentation and ensure the correctness of the code.","Sebastian Berg, Nikoleta E. Glynatsi" +UGJ3HQ,Mostly Harmless Fixed Effects Regression in Python with PyFixest,Talk (25 mins + Q&A),Data Science and Visualisation,"This session introduces PyFixest, an open source Python library inspired by the ""fixest"" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap and randomization inference. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption. + +PyFixest aims to faithfully replicate the core design principles of ""fixest"", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects. + +The presentation will argue why there is a need for another regression package in Python, cover PyFixest's functionality and design philosophy, and discuss future development prospects.",Alexander Fischer +UNYV7V,Probabilistic classification and cost-sensitive learning with scikit-learn,Tutorial,Machine and Deep Learning,"Data scientists are repeatedly told that it is absolutely critical to align their model training methodology with a specific business objective. While being a rather good advice, it usually falls short on details on how to achieve this in practice. + +This hands-on tutorial aims to introduce helpful theoretical concepts and concrete software tools to help them bridge this gap. This method will be illustrated on a worked practical use case: optimizing the operations of a fraud detection system for a payment processing platform. + +More specifically, we will introduce the concepts of calibrated probabilistic classifiers, how to evaluate them and fix common causes of mis-calibration. In a second part, we will explore how to turn probabilistic classifiers into optimal business decision makers.","Guillaume Lemaitre, Olivier Grisel" +UXHSQC,Helmholtz Blablador and the LLM models' ecosystem,Talk (25 mins + Q&A),Machine and Deep Learning,Helmholtz Blablador is the LLM inference server from the Helmholtz Foundation. This talk explores Blablador's role in hosting open-source LLM models and models developed in-house at the Juelich Supercomputing Centre (JSC). This talk is about Blablador and the open source LLM models' ecosystem.,Alexandre Strube +XBXX89,forecasting foundation models: evaluation and integration with sktime – challenges and outcomes,Talk (25 mins + Q&A),Machine and Deep Learning,"Foundation models are here for forecasting! This will conclusively solve all forecasting problems with a one-model-fits-all approach! Or … maybe not? + +Fact is, an increasingly growing number of foundation models for time series and forecasting hitting the market. + +To innocent end users, this situation raises various challenges and questions. How do I integrate the models as candidates into existing forecasting workflows? Are the models performant? How do they compare to more classical choices? Which one to pick? How to know whether to “upgrade”? + +At sktime, we have tried so you don’t have to! Although you will probably be forced to anyway, but even then, it’s worth sharing experiences. + +Our key challenges and findings are presented in this talk – for instance, the unexpected fragmentation of the ecosystem, difficulties in evaluating the models fairly, and more. + +(sktime is an openly governed community with neutral point of view. You may be surprised to hear that this talk will not try to sell you a foundation model)",Franz Kiraly +Z98TJA,Introduction to Python,Tutorial,"Community, Education, and Outreach","This tutorial will provide an introduction to Python intended for beginners. + +It will notably introduce the following aspects: + +- built-in types +- controls flow (i.e. conditions, loops, etc.) +- built-in functions +- basic Python class",Mojdeh Rastgoo +ZKS8WX,From Logs to Insights: An Exploration of Infrastructure Logging and Clustering,Poster,Data Science and Visualisation,"The analysis of logging messages is a big challenge because of their massive number, different origins and unspecify formats. These challenges can be partially address with NLP techniques and ultimately detect, predict or even maybe avoid incidents. What we demonstrate is complete monitoring solution. That includes clustering and uncovering warning-incident correlation with a Hawkes model. This model was previously successfully applied for earthquake predictions based on aftershocks. The Hawkes process model is well-defined mathematically and can process a large volume of data.",Arkadiusz Trawiński +ZVBAKK,Image analysis in Python with scikit-image,Tutorial,Data Science and Visualisation,"Scientists are producing more and more images with telescopes, microscopes, MRI scanners, etc. They need automatable tools to measure what they've imaged and help them turn these images into knowledge. This tutorial covers the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions.","Lars Grüter, Marianne Corvellec, Stéfan van der Walt" diff --git a/poster_parser/parser_speaker.py b/poster_parser/parser_speaker.py index 68a62ea..8f4b8e3 100644 --- a/poster_parser/parser_speaker.py +++ b/poster_parser/parser_speaker.py @@ -5,7 +5,7 @@ # %% import pandas as pd -proposals = pd.read_csv("euroscipy-2024_sessions.csv") +proposals = pd.read_csv("euroscipy-2024_confirmed.csv") # %% proposals.head() @@ -15,17 +15,19 @@ output_md = "" for _, proposal in proposals.iterrows(): - output_md += f"\n## {proposal['Proposal title']}\n\n" - session_type=proposal['Session type'] - track = proposal['Track'] - name = proposal["Speaker names"] - if not isinstance(name, str): - name = "To Be Defined" - output_md += f"*{name}*, *{session_type}*, *{track}*\n\n" - output_md += f"**Abstract:** {proposal['Abstract']}\n\n" - output_md += f'\n\n' - -with open('confirmed.md', 'w') as file: + session_type = proposal['Session type'] + if session_type != 'Poster': + output_md += f"\n## {proposal['Proposal title']}\n\n" + + track = proposal['Track'] + name = proposal["Speaker names"] + if not isinstance(name, str): + name = "To Be Defined" + output_md += f"*{name}*, *{session_type}*, *{track}*\n\n" + output_md += f"**Abstract:** {proposal['Abstract']}\n\n" + output_md += f'\n\n' + +with open('confirmed_talks.md', 'w') as file: file.write(output_md) # %%