Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11966. Add recon support #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fraochan
Copy link

@fraochan fraochan commented Dec 15, 2024

What changes were proposed in this pull request?

This PR adds the files required for an optional Recon.

Recon does not require a persistent volume. I hesitated to do a deployment, but the Apache Ozone Git project suggests a statefulset as an example. I don't mind converting it to a deployment if you find it necessary.

What is the link to the Apache JIRA

HDDS-11966

How was this patch tested?

I created a local cluster with kind, then :

  • I deployed an Apache Ozone cluster with Recon:
helm install ozone ./charts/ozone --set recon.enabled=true    
  • I waited for the datanodes to register with Recon and checked that the datanodes were listed in the Recon UI localhost:9888/#/Datanodes with a port-forward:
kubectl port-forward svc/ozone-recon 9888:9888
  • I created a bucket with aws-cli:
aws s3api --endpoint http://localhost:9878 create-bucket --bucket=wordcount
helm upgrade ozone ./charts/ozone --set recon.enabled=false

Copy link

@ptlrs ptlrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @fraochan. It mostly looks good. I have a few comments.

Comment on lines 60 to 61
- name: WAITFOR
value: {{ $.Release.Name }}-scm-0.{{ $.Release.Name }}-scm-headless:{{ .Values.scm.service.port }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider if waiting for another pod is strictly necessary.
If we do have to wait then we should do it in an initcontainer.

Also, we should use the SCM service instead of addressing the pod directly. At any point we have to consider the pod -scm-0 being down and should instead rely on other SCM pods (if any in HA mode) by using the SCM service.

Copy link
Author

@fraochan fraochan Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by the Apache Ozone Kubernetes examples of the source repository. But indeed, I agree with you.

Recon runs in a loop until it can communicate with the SCM headless service. In my opinion, deleting this part is enough.

Otherwise, I'll do an initContainer (although I'm not sure yet what I'll do as a check to make Recon wait, probably not too hard but I haven't thought about it yet).

{{- $tolerations := or .Values.recon.tolerations .Values.tolerations }}
{{- $securityContext := or .Values.recon.securityContext .Values.securityContext }}
apiVersion: apps/v1
kind: StatefulSet
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider making this pod stateless by using a Deployment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been more confident, that was my initial suggestion. But in line with the rest of the Helm Chart, I've made it a Statefulset.

So I completely agree with you and have made the changes.

By the way, I'd say the S3 gateway is also stateless and could be a deployment, couldn't it?

charts/ozone/templates/recon/recon-statefulset.yaml Outdated Show resolved Hide resolved
@@ -73,4 +78,8 @@ app.kubernetes.io/instance: {{ .Release.Name }}
value: "1"
- name: OZONE-SITE.XML_dfs.datanode.use.datanode.hostname
value: "true"
{{- if .Values.recon.enabled }}
- name: OZONE-SITE.XML_ozone.recon.address
value: {{ include "ozone.recon.pod" . }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to the Recon service name instead of the pod.
We should be able to refer to the recon pod without specifying the type of the pod. This is where the service comes into the picture.

ozone.recon.pod defined in this file above can be removed if not needed elsewhere.

Copy link
Author

@fraochan fraochan Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had to add the rpc port to the service as well, but yes,we might use the service.

@fraochan fraochan changed the title Add recon support HDDS-11966. Add recon support Dec 19, 2024
@fraochan
Copy link
Author

@ptlrs, thanks for the constructive review, I've modified the code accordingly and I'm ready for another one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants