Emily's internship blog #192

EmilyXinyi · 2024-08-13T12:30:07Z

Blog post about Emily's summer internship

welcome · 2024-08-13T12:30:10Z

💖 Thanks for opening this pull request! 💖
scikit-learn community really appreciates your time and effort to contribute to the project.
Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

francoisgoupil · 2024-08-14T14:42:03Z

@EmilyXinyi Could you maybe add a link to or embed your Vlog here? I think it would be a nice addition to this well written blogpost about your internship experience.

EmilyXinyi · 2024-08-14T15:49:24Z

@EmilyXinyi Could you maybe add a link to or embed your Vlog here? I think it would be a nice addition to this well written blogpost about your internship experience.

I embedded the video for now, though I am not sure how good it would look on the blog because it's a portrait (vertical) video. Alternatively I can post it on some social media first and link it here.

francoisgoupil

Some minor edits

francoisgoupil · 2024-08-14T14:34:45Z

_posts/2024-08-15-emily-internship.md

+
+### Open Source Developement
+
+I started my contributions by adapting certain metrics (tweedie, mean absolute percentage error etc.) to be Array API compatible under the guidance of my mentor, Olivier. The Array API standard is a cross-library API for array operations on Python, which is designed to improve interoperability and consistency across different array libraries. This also means that scikit-learn algorithms written in NumPy for CPU can work on other hardwares (GPU) with PyTorch or CuPy, greatly improving performance. As I gained more familiarity with the scikit-learn codebase and Array API, I began working on adapting “larger” functions to be Array API compatible, which means a lot more fundamental, a lot more dependencies, a lot more challenging, and a lot more fun. 


Suggested change

I started my contributions by adapting certain metrics (tweedie, mean absolute percentage error etc.) to be Array API compatible under the guidance of my mentor, Olivier. The Array API standard is a cross-library API for array operations on Python, which is designed to improve interoperability and consistency across different array libraries. This also means that scikit-learn algorithms written in NumPy for CPU can work on other hardwares (GPU) with PyTorch or CuPy, greatly improving performance. As I gained more familiarity with the scikit-learn codebase and Array API, I began working on adapting “larger” functions to be Array API compatible, which means a lot more fundamental, a lot more dependencies, a lot more challenging, and a lot more fun.

I started my contributions by adapting certain metrics (tweedie, mean absolute percentage error etc.) to be Array API compatible under the guidance of my mentor, [Olivier](https://github.com/ogrisel). The Array API standard is a cross-library API for array operations on Python, which is designed to improve interoperability and consistency across different array libraries. This also means that scikit-learn algorithms written in NumPy for CPU can work on other hardwares (GPU) with PyTorch or CuPy, greatly improving performance. As I gained more familiarity with the scikit-learn codebase and Array API, I began working on adapting “larger” functions to be Array API compatible, which means a lot more fundamental, a lot more dependencies, a lot more challenging, and a lot more fun.

francoisgoupil · 2024-08-14T14:35:27Z

_posts/2024-08-15-emily-internship.md

+
+### Chinese Community Outreach
+
+China has the second largest user group of scikit-learn. As a community, we believe that we can be more inclusive to ease Chinese contribution and do what is necessary to recruit more Chinese contributors. Therefore, I need to find out who and where scikit-learn is being used, if there are other platforms (outside of GitHub) that development is happening, because GitHub tends to be very slow in China, and establish scikit-learn’s official presence in the Chinese community. 


Suggested change

China has the second largest user group of scikit-learn. As a community, we believe that we can be more inclusive to ease Chinese contribution and do what is necessary to recruit more Chinese contributors. Therefore, I need to find out who and where scikit-learn is being used, if there are other platforms (outside of GitHub) that development is happening, because GitHub tends to be very slow in China, and establish scikit-learn’s official presence in the Chinese community.

China has the second largest user group of scikit-learn according to documentation web analytics. As a community, we believe that we can be more inclusive to ease Chinese contribution and do what is necessary to onboard more Chinese contributors. Therefore, I need to find out who and where scikit-learn is being used, if there are other platforms (outside of GitHub) that development is happening, because GitHub tends to be very slow in China, and establish scikit-learn’s official presence in the Chinese community.

francoisgoupil · 2024-08-14T14:38:44Z

_posts/2024-08-15-emily-internship.md

+
+I also had weekly Peer Programming sessions with Loïc and Stefanie, where my piled-up questions from the week outside of Array API would be answered, and I would almost always learn something new about developer tools or programming fundamentals. 
+
+On the Chinese community outreach side, it has always been with the scikit-learn communications team. Here I must give a special shoutout to manager François, who is also part of the communications team, for always being supportive and believing in my outreach efforts, especially because I was nervous doing this kind of task and using Chinese in a professional context for the first time. I also got to interact with [Charlie](https://charlie-xiao.github.io/) (yes, the core-dev Charlie), who is located in China and helped me tremendously with tasks that require physical presence. 


Suggested change

On the Chinese community outreach side, it has always been with the scikit-learn communications team. Here I must give a special shoutout to manager François, who is also part of the communications team, for always being supportive and believing in my outreach efforts, especially because I was nervous doing this kind of task and using Chinese in a professional context for the first time. I also got to interact with [Charlie](https://charlie-xiao.github.io/) (yes, the core-dev Charlie), who is located in China and helped me tremendously with tasks that require physical presence.

On the Chinese community outreach side, it has always been with the scikit-learn communications team. Here I must give a special shoutout to manager [François](https://www.linkedin.com/in/françois-goupil/), who is also part of the communications team, for always being supportive and believing in my outreach efforts, especially because I was nervous doing this kind of task and using Chinese in a professional context for the first time. I also got to interact with [Charlie](https://charlie-xiao.github.io/) (yes, the core-dev Charlie), who is located in China and helped me tremendously with tasks that require physical presence.

Charlie-XIAO

Also seems that assets/videos/emily_blog_vid.MOV can be removed now that we are using the TikTok link?

Emily's internship blog

d632043

francoisgoupil assigned francoisgoupil, ogrisel and Charlie-XIAO Aug 14, 2024

adding media

20aec6b

change video to be embedded tiktok link

e0a255b

francoisgoupil reviewed Aug 27, 2024

View reviewed changes

Charlie-XIAO reviewed Aug 27, 2024

View reviewed changes

EmilyXinyi and others added 3 commits August 29, 2024 06:44

address comments

4130c7d

edited kubecon to past tense and added picture

82079cc

Merge branch 'scikit-learn:main' into main

494c2a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emily's internship blog #192

Emily's internship blog #192

EmilyXinyi commented Aug 13, 2024

welcome bot commented Aug 13, 2024

francoisgoupil commented Aug 14, 2024

EmilyXinyi commented Aug 14, 2024

francoisgoupil left a comment

francoisgoupil Aug 14, 2024

francoisgoupil Aug 14, 2024

francoisgoupil Aug 14, 2024

Charlie-XIAO left a comment •

edited

Loading


		### Open Source Developement

		I started my contributions by adapting certain metrics (tweedie, mean absolute percentage error etc.) to be Array API compatible under the guidance of my mentor, Olivier. The Array API standard is a cross-library API for array operations on Python, which is designed to improve interoperability and consistency across different array libraries. This also means that scikit-learn algorithms written in NumPy for CPU can work on other hardwares (GPU) with PyTorch or CuPy, greatly improving performance. As I gained more familiarity with the scikit-learn codebase and Array API, I began working on adapting “larger” functions to be Array API compatible, which means a lot more fundamental, a lot more dependencies, a lot more challenging, and a lot more fun.


		### Chinese Community Outreach

		China has the second largest user group of scikit-learn. As a community, we believe that we can be more inclusive to ease Chinese contribution and do what is necessary to recruit more Chinese contributors. Therefore, I need to find out who and where scikit-learn is being used, if there are other platforms (outside of GitHub) that development is happening, because GitHub tends to be very slow in China, and establish scikit-learn’s official presence in the Chinese community.


		I also had weekly Peer Programming sessions with Loïc and Stefanie, where my piled-up questions from the week outside of Array API would be answered, and I would almost always learn something new about developer tools or programming fundamentals.

		On the Chinese community outreach side, it has always been with the scikit-learn communications team. Here I must give a special shoutout to manager François, who is also part of the communications team, for always being supportive and believing in my outreach efforts, especially because I was nervous doing this kind of task and using Chinese in a professional context for the first time. I also got to interact with [Charlie](https://charlie-xiao.github.io/) (yes, the core-dev Charlie), who is located in China and helped me tremendously with tasks that require physical presence.

Emily's internship blog #192

Are you sure you want to change the base?

Emily's internship blog #192

Conversation

EmilyXinyi commented Aug 13, 2024

welcome bot commented Aug 13, 2024

francoisgoupil commented Aug 14, 2024

EmilyXinyi commented Aug 14, 2024

francoisgoupil left a comment

Choose a reason for hiding this comment

francoisgoupil Aug 14, 2024

Choose a reason for hiding this comment

francoisgoupil Aug 14, 2024

Choose a reason for hiding this comment

francoisgoupil Aug 14, 2024

Choose a reason for hiding this comment

Charlie-XIAO left a comment • edited Loading

Choose a reason for hiding this comment

Charlie-XIAO left a comment •

edited

Loading