generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path07-sharing-images.qmd
157 lines (96 loc) · 10.6 KB
/
07-sharing-images.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# Best practices for sharing images
```{r, out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g2effc5b673e_0_852")
```
```{r panel-setup, include = FALSE}
xaringanExtra::use_panelset()
xaringanExtra::style_panelset_tabs(font_family = "inherit")
```
Reproducibility is a community activity and because of this sharing images is clearly a big component of why images can aid in reproducibility.
```{r, out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g2effc5b673e_0_492")
```
But before you really get going with more containerization and swapping images with others in the wide world, its time we talk about some best practices when doing so.
Some of these best practices are ethically and legally consequential, while others are just to save you time and frustration. Ultimately we want to make sure you are containerizing responsibly.
## Best practices of working with containers
### Do NOT put protected data on your shared image
Sharing is crucial to community driven science.
Not sharing at all is not an option, this impedes the progress of research. BUT we do need to discuss *when*, *what*, and *who* of appropriate sharing. If you work with protected data types, like Protected Health Information (PHI) or Personally Identifiable Information (PII) and want to use your protected data with containers, that's great!
However, there are some very strong dos and don'ts associated with protected data.
If you are working with protected data (or are not sure if you are), we also **HIGHLY encourage you to talk with data privacy experts** and ensure that the practices you are using are appropriate and making sure they protect the individuals' whose data you have.
Bottom line: **DO NOT put protected data like PII or PHI on publicly shared Docker images!**
The more layers of safety checks and cushions for human mistakes (which will happen), the better!
#### Alternatives:
You can use one or more of these alternatives just make sure you clear it with the proper channels like IRBs and security experts!
- Do NOT put the data on the image. Store the data separately from the image. Don't even build the docker image near where those data are stored. You may be able to, from a secure location, run a Docker container and access the data through a volume assuming the data is not moved anywhere. Not only does storing data on an image make the image really big, but obviously in the case of protected data its just not a good idea.
- If for some reason you must put something protected on an image, your other option is you can push the image to a secure and protected location site where only IRB approved individuals have access to it. Amazon Web Services Container registry has options as does Microsoft Azure for example.
- If for some reason you must put something protected on an image, and you can't use a private registry, your other option is: **Don’t push the image anywhere** This obviously makes it harder to share methods, but it also could be possible you could share a version of the Dockerfile of this image that doesn't have protected data information on there and this Dockerfile could just be shared for methods purposes.
```{r, out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g30a4ed49e59_0_1276")
```
Data privacy is just one concern of best practices with containers. There's also some other tips we can talk about at this time.
## Do not make one super large image that covers all your use cases
Just as in Lord of the Rings that the "one ring to rule them all" was dangerous, so too is "one docker image to rule them all". Big images are a headache to deal with. They take a long time to build and to pull, and if anything is broken on them, it can be a headache to troubleshoot.
### Alternatives:
There's no limit on the number of images you can make! There can be a fine art to making separate images for each use case. Obviously too many images could be hard to keep track of which is which, but a very intentional organizational plan for your images can take you far.
## Version control your Dockerfiles
Keeping your Dockerfile stored only locally and untracked is likely to lead to headaches. Version control is always a good idea and containerization is no exception! To learn more about version control.
If you do decide to keep your Dockerfiles on GitHub, there are a lot of useful tools to help you manage your images there. GitHub Actions marketplace for examples has a lot of handy tools. [See our GitHub Actions course for more on this](https://hutchdatascience.org/GitHub_Automation_for_Scientists/).
## Versioning is always a good idea
Just like with software development, it's good to have a system to keep track of things as you develop. Container development can easily get out of hand. Especially if others are using your images you want to be clear about which versions of containers are in a more risky earlier "development" stage and which have been more vetted and ready for use.
Versioning tags can be whatever you'd like them to be. [Versioning number systems used elsewhere](https://en.wikipedia.org/wiki/Software_versioning#Schemes) like `major.minor.patch` are also used with images.
You can alter versioning numbers either when you are building your image initially or by using the `tag` command:
```
docker tag cool-new-image:2 username/cool-new-image:2
```
Other strategies for versioning can mirror git branch naming conventions, so `main` for the vetted version of the image and `dev` or `stage` for a version that's still being worked on but will probably eventually be released.
There's no one size fits all for image tags, its really up to whatever you and your team decide works best for your project.
Regardless being intentional, consistent, and clearly documenting any system you choose to use are the main keys.
## Don't use random developers’ docker images
Images and containers can be difficult to have transparency into the build at times. And unfrotunately not everyone on the internet who makes images is trustworthy. To avoid security issues make sure to stick to trusted sources for your docker images. Either verified individuals or companies. Try not to wander too far off the beaten path. Better to make your own image from a trusted source's base image than to use a random strangers' image that promises to do it all.
## Summary of best practices
```{r, out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g30a4ed49e59_0_1282")
```
## Container Registries
To share our image with others (or ourselves), we can push it to an online repository. There are a lot of options for container registries. Container registries are generally cross-compatible, meaning you can pull the image from just about anywhere if you have the right command and software. You can use different container registries for different purposes.
[This article has a nice summary of some of the most popular ones](https://octopus.com/blog/top-8-container-registries).
And here's a TL;DR of the most common registries:
- [Dockerhub](https://hub.docker.com/) – widely used, a default
- [Amazon Web Services Container Registry](https://aws.amazon.com/containers/) - options for keeping private
- [Github container registry](https://aws.amazon.com/containers/) - If you are using GitHub packages works with that nicely
- [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) – if you need more robust security
We encourage you to consider what container registries work best for you specific project and team. Here's a starter list of considerations you may want to think of, roughly in the order of importance.
1. If you have protected data and security concerns (like we discussed earlier in this chapter) you may need to pick a container registry that allows privacy and strong security.
2. Price -- not all container registries are free, but many of them aren't. Think about what kind of budget do you have for specific purpose. Paying is generally not a necessity, so don't pay for a container registry subscription unless you need to.
2. What tools are you already using? For example GitHub, Azure, and AWS have their own container registries, if you already are using these services you may consider using their associated registry. (Note GitHub actions works quite seamlessly with Dockerhub, so personally I haven't had a reason to use GitHub Container Registry but it is an option.)
4. Is there an industry standard? Where are your collaborators or those at your institution storing your images?
While there are lots of container registry options, for the purposes of this tutorial, we'll use Dockerhub. Dockerhub is one of the first container registries and still remains one of the largest. For most purposes, using Dockerhub will be just fine.
## Activity Instructions
<input type="checkbox"> First you'll need to make sure you have a dockerhub account.
<input type="checkbox"> Go to the website and login or create an account https://hub.docker.com/
<input type="checkbox"> Next, locally you'll need to log into your account.
```
docker login -u your_user_name
```
```{r, out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g2effc5b673e_0_523")
```
<input type="checkbox"> It will ask for your password. Enter that password.
<input type="checkbox"> If you've logged in successfully, now we need to prep our image by putting our username in the name.
We can do that by using the `tag` command. But replace `<your_username>` with whatever your Dockerhub username is.
```
docker tag cool-new-image:2 <your_username>/cool-new-image:2
```
```{r, out.width = "70%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g2effc5b673e_0_530")
```
<input type="checkbox"> Now we can push it to our repository.
```
docker <your_username> push username/cool-new-image:2
```
You can also do this with this button on Docker Desktop:
```{r, out.width = "70%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1T5Lfei2UVou9b0qaUCrWXmkcIwAao-UcN4pHMPEE4CY/edit#slide=id.g2effc5b673e_0_539")
```
Go to https://hub.docker.com/repositories/ and you should see your image listed there!