Perses: visualization & dashboards standard for metrics monitoring
September 24, 2024

Harbor: the artifacts registry

In this episode of The Landscape, Bart and Sylvain interview Harbor maintainer Vadim Bauer. Harbor – a graduated CNCF project – is a comprehensive container image management solution that goes beyond just storing and retrieving images. It provides features like security, access control, multi-tenancy, and compliance tracking. Flux is an incubating project of the CNCF landscape.

In this episode, you will learn about:

  • Harbor’s powerful image replication feature allowing synchronizing artifacts from other registries, which is useful for security, performance, and disaster recovery purposes.
  • Harbor adoption, with success stories from cloud providers like OVH and Vultr who use it to offer container registry services to their customers.
  • Harbor integration with other CNCF projects like Kubernetes, Trivy, and Docker distribution, enhancing its functionality.
  • When not to use Harbor where it may not be the best fit for smaller teams already using well-integrated registries like Docker Hub or GitLab Registry.
  • How to use Harbor in an AI workflow, to store AI models.
  • How to contribute to Harbor, and the project success with the CNCF LFX program, which helps young engineers get involved in open-source projects.

Read the transcript

Bart Farrell  0:00  

In this episode of the landscape, Sylvain and I will be taking a look at the CNCF project. Harbor is a CNCF project that tackles some of the biggest challenges in container image management. Harbor looks at security compliance and efficiency, ensuring secure and trusted image deployment with features like vulnerability scanning and image signing, and also offers fine grade access control, supports multi tenancy and tracks all image related activities for compliance. Plus it streamlines image distribution across regions. It integrates smoothly with Kubernetes and CICD pipelines, making it a central hub in cloud native infrastructures. In order to find out more Sylvain and I will be speaking to Vadim Bauer, who’s a maintainer of the project and is also the founder of 8gears AG. Let’s take a look at the episode. Vadim. Welcome to the landscape for our listeners out there. Can you just tell them a little bit about who you are and what you do?

Vadim Bauer  0:52  

Yeah, thank you for having me. My name is Vadim Bauer one of the maintainers of Harbor, and happy to be here.

Sylvain Kalache  1:00  

Can you tell us what problem does Harbor solve?

Vadim Bauer  1:05  

Yeah, so Harbor, in its essence, is a container registry, and, as in OCA, Container Registry allows you to, you know, store images, and you can pull images and push images to the registry. And this is the essence of Harbor. However, in in the usability, terms of of the functionality, you not only want to have the possibility to push a little images, but you want to maybe kind of protect your images, have access control to those images. You want to structure them, you know, like, if you’re growing organization, even if a team of five, six people, you already start kind of a separate or segregate images into different buckets and different buckets, but projects, right? And you want to do this, and, you know, provide access to your customers, to your users, token management and all those things. And this is something where, where Harbor comes into play, like authentication, authorization and basically moving away from, I would say, from image storage, right? So it’s kind of not just an image storage solution, but as a image management solution, so you can manage the whole life cycle of your of your OCI artifact. So we’re not talking anymore about container images. We are talking more about OCI artifacts, because images are still a major part, but we have a growing number of other types of artifacts, right? So maybe wasn’t. Helm charts are nowadays, stored in OCI registries, and so we’re talking more about OCI artifacts than images. And this is where Harbor basically has this functionality, like everything around container management, and this is what this distinguishes a bit from. For example, Docker distribution, right? Docker distribution, I would consider Docker distribution being more of a framework than a running application, because it lacks all this functionality that Harbor brings, right, like, yeah, authentication, authorization, it’s just a way how we can store in brief images. And this is where Harbor come as an addition to Docker distribution. I mean, hardware is built on top of Docker distribution, so we’re not reinventing the wheel here. So we’re using Docker distribution underneath for this basic operations of stores, image, retrieval and and storage. Yeah. So this is, in essence, what Hubber provides. There is, of course, a lot of more features that are some of them are more interesting for larger organizations. Some are more interesting for general use of image management. Topic,

Sylvain Kalache  3:49  

good. Thank you. You mentioned, you know, a number of different features like we want at the time, to go over all of them today. But is there one particular feature that you really like?

Vadim Bauer  4:05  

Well, there are quite a few, I think, from from the so one of the features that I like about Harbor is it’s it’s roundness, in the sense that it provides all the essential parts that you need if you want to do the whole management part of the images. And this is usable for, I would say, from from my perspective. And what I see is small organizations like a teams of 535, yeah, three, 510, developers, two, three projects are using hardware and adopting hardware because it’s usable for them. And also large organizations with 1000s of employees and 1000s of developers storing terabytes of data is they’re also using hardware and happy with them. So I think this is one of the functionalities that I like about Harvard that is suitable for small and large teams. And from the feature feature wise, I would say the capabilities of image. Replication is, is a powerful functionality so that you can use this replication capabilities to bring all your artifacts into one in one place, or, for example, you’re you’re consuming artifacts from Docker Hub, from GH GitHub registry, and you want to have all those images locally on your on your premises or new environment, you know, just in case, and making sure that you can scan them and for vulnerabilities and things like this. This is a one of the, one of the, I would say, powerful features that happen provides, and you can use it two ways, so you can synchronize it to other registries, and you can synchronize from other registries to yourself. And this is something that I see people use a lot in for Yeah, in hardware, different on the perspective. So, like large organization use it more extensively than smaller ones. But this is something that yeah is quite popular use case for an hour

Sylvain Kalache  6:00  

next? Yeah, it makes a lot of sense, as you said, whether it’s for security or maybe for performance, if you hope to cache images or the artistic locally, so you know, if you deploy, you’re testing it already on your machine, and obviously for backup, that’s something that, you know, it’s exciting, but having the right disaster recovery plan in mind sounds like Harbor can be part of that. You seem to say that actually, even small teams are using Harbor. So now, like one thing that we always want to to to hear about our success stories. Do you have any success stories to share, whether it’s from large organization of smaller one, any specific use case that you’ve heard from the companies that use Harbor and solve the specific pain point

Vadim Bauer  6:53  

I know from cloud providers, from few cloud providers who are adopting Harbor or adopted Harbor, and using Harbor to offer it one to one to their customers. So like OBh does it, so you can just go to ivh and there’s auto registry and get one. And the other use case is where cloud providers are adopting hardware in a sense that it’s not really transparent for the users that there is Harbor underneath. I think Vultr is one of the organizations who are using it that way. So you have Harbor underneath, but you don’t see it actually. And yeah, so this is a few of those use cases that I would say are, yeah, a success factor. Nice, yeah.

Sylvain Kalache  7:45  

I mean, for for those who don’t know, because I think OVH, OVH cloud is our branding of OVHCloud in in the US, for the audience, many of you may not know, but that’s one of the largest, the largest European cloud provider. So that’s, you know, very impressive to have obvious use hardware as part of the offering.

Bart Farrell  8:10  

Now, madam is Harbor often integrated with other CNCF projects, and if so, which ones and why? Well

Vadim Bauer  8:17  

integrated. Of course, it’s into or used in combination with Kubernetes, right? So, I mean, this is the main use case, I would say, using it with Kubernetes. I’m not sure if this is already a CNCF project, if he has a lot of treeb. I’m not sure if this is a CNCF project. So if it is, then Harbor is using treaty underneath. So we can use image scanning, vulnerability scanning, and sbom generation with hardware, so you have an integration with that. So this is where we use trivi as one of the options built in. And then, of course, as I briefly mentioned, we also using Docker distribution. And Docker distribution is also a CNCF project, so we are kind of a tightly integrated with those two projects.

Bart Farrell  9:03  

Now you mentioned some, you know, quite a few cases where it adds a lot of value, the kind of companies that are using it, what are situations in which it doesn’t make sense to use Harbor?

Vadim Bauer  9:14  

It depends a bit on the, see, on the A lot of times it depends a bit of the organizational structure, right? So, if you’re talking, if you’re looking about the larger a larger organization, I think in a smaller organization, it’s, it’s clear, like, if you’re, for example, using Docker Hub or GitLab Registry, or, you know, GitHub registry, and if you’re using wealth registries, and there, the feature set is suitable for you, right? And because it’s only, it’s kind of nicely integrated, and you’re happy with that. There is definitely no need to switch to hardware if you don’t miss anything there, and if you just want to use it for storing for the larger organizations, I would see two use cases where it may be not fit because. So sometimes, depending on the organization structure, you have teams that are really independent, right? So they they don’t share any code, they don’t share any images, and they kind of also independent in how they’re structured and organized. So they also host their own registries, probably. And then some of those teams in the organization are using, you know, ACR, ECR, or Docker Hub, and the other ones are using other products. So in those use case, if you have this, this kind of fragmented landscape of different different units using different things, I would say that it might not make sense to use Harbor, and it makes really a lot of sense if you want to have all your artifacts in in in one place, because you want to do some analytics and logic on that, and you want to have, yeah, things in one place, because this is like something that thesis want to see, because they want to have control over the artifact that gets deployed. And for those organizations where it is important, or where, I wouldn’t say it’s important for everyone, right? But it’s just a bit of a the maturity level that organization have sometimes, right? So if you’re adopting containers, and there is kind of a bit of a wild west in the beginning, and then things get kind of structured and organized. And you you implement processes and structures, and then you have a kind of pipelines, and the security pipelines and supply chain pipelines, and all these workflows are built up over the years. And then you end up with this somehow, some sort of a central place where you want to have your artifact in one place, right? And if you’re not there yet, then probably Harbor is not yet the right thing for you.

Bart Farrell  11:50  

And you know, we’re in the midst of this AI hype wave. Is there anything that’s going on in Harbor related to AI that people should be aware of? Do you have anything planned for the future? Anything we should know about that

Vadim Bauer  12:02  

Well, we’re also confronted with this AI topics as well in our, you know, now in our project, because people are using or abusing Harbor for their AI needs. And mostly, you know, organizations are using Harbor to store models in their registry. Because, you know, OCI is becoming the de facto standard for artifact and you can use, you can store models in Harbor, right? So you can build your own integrations and build your own tools with auras, for example, which is an awesome, I think, also CNCF tool, where you can hack everything in OCI artifact. And people do this, they use, they use Arbor for storing models that is trained to store trained out the training data and all those things. And currently this is, let’s say it’s not, I wouldn’t say not officially supported. I don’t want to say that, but I say that it’s not kind of a there is no formal support for that, but we are looking into ways how we can support this, this workflows, and make it easier, right for for people to use Harbor for other types of artifacts.

Sylvain Kalache  13:17  

That’s very interesting. It looks like that could give like, not a second life, but another life, you know, to Arbor and make the project like Central in other capacity, you mentioned analytics. Can you tell us more about this? Is there, like some, built in analytics features that openness can use when they are using Harbor

Vadim Bauer  13:41  

well analytics in the sense that you could query the API and find out what the other artifacts that you have right and on the artifact, you can find out what vulnerabilities they have. So you have a kind of ability scanner with Harbor, and you can plug in other scanners. I can change scanners, and you can export the vulnerability list, and then you can use this exported vulnerabilities to further processing. And so this is the analytics capabilities that we provide right so like exporting CVS, exporting sboms, and also exporting the artifact data, so you can, maybe, you know, investigate how images are composed right on what base images and things like this, but this is something that we just provide the data and not see the analytical part itself. Very nice.

Bart Farrell  14:31  

And you know, there are over 300 contributors. You’ve been a maintainer for quite some time. Is there anyone that you’d like to give a shout out to to take advantage while, while we’re, while we’re speaking? Well, I’m,

Vadim Bauer  14:41  

I’m happy for the the project itself, right for the maintainers. I mean, we have over 300 contributors over the years. And of course, I’m want to shout out all the Harbor maintainers and contributors that contributed over to Harbor over the last years. I know it’s, it’s always difficult. Task to, you know, there’s the demand is always, really not demand, but the bar is always high, right? So the there’s always a lot of requests coming from, from the user base, and sometimes the wild request and crazy request, and there’s always under pressure a bit. And it’s, yeah, it’s, it’s difficult, right? So to to maintain these things, and I wouldn’t shout out to all Harbor maintenance, but also to all maintainers of open source project, because, you know, we kind of share the same boat a bit, if we’re talking about maintaining open source, very good.

Bart Farrell  15:38  

And would you say that that’s the hardest part? It stayed on top of all those requests that are coming in, trying to prioritize them. Or is there anything else that people maybe, if they’re interested in becoming a maintainer, you’d say, well, I really wish I knew this when I got started.

Vadim Bauer  15:53  

I think it’s a, it’s a bit of a maturity, maturity thing, right? So if you’re starting as a maintainer property or try to please all users equally, because, like you treat all requests serious. But if you have so many requests, you need to do a triage, right? You need to distinguish what requests are important, what are relevant, and what the bugs are relevant. You know, it’s not about just feature requests, it also becomes relevant to, you know, bugs, you know, because you cannot fix all the bugs, right? So there’s, like, there are so many use and edge cases that you cannot probably fix them at all. And so you need to find a way where it’s difficult for, maybe for a lot of engineers and black people like us were working structurally, and really want to have everything in order to cope with that, right? To see, like, Okay, this is the bug. Yeah, that’s it. So we have to live with that. It’s not, it’s not an important bug, right? That it’s bit difficult to to express this to users, because this is what they fail. But if you have the big picture here, and you can, you need to deal with that. And you can say, like, yeah, it’s that’s how it is, right? So you have to accept the fact that it’s you cannot please everyone or every user.

Sylvain Kalache  17:18  

So if some of our listener wants to get involved in in the Harbor, like, are you looking for specific skills, or is there any specific area where you need help?

Vadim Bauer  17:29  

Well, we’re definitely open for new contributors, right? So there’s definitely a lot of work in the hardware space and a lot of areas, right? So we are talking about the documentation can be improved, right? Which, if you’re not developer by heart, but working in the in the open source space and the community like improving documentation is always welcomed, then we are, have we made quite a lot of success, I would say, in this, in the LFX program, which is the news Foundation program for our young engineers who would like to, you Know, enter the open source space, and we are taking mentors or mentees every every couple of months, we have a new mentees contributing to projects. They’re mostly students, and we are taking them and supporting them and help and giving them tasks so they can contribute to open source and build projects. And we have quite successfully built already, two projects that are mainly developed by this analytics mentees. And I think if you’re a young, young student who would like to contribute to to open source, this is definitely an interesting option to to explore and approach and a lot of other scncf projects. There is not just us, you know, and if the reader is interested or the listener interested, it’s called an a fixed mentorship program. So if you Google that you You will find a lot of information from you know, past students, how to apply, how to you know, how to do this.

Bart Farrell  19:22  

I’d say, this? It’s a fantastic program, yeah. Shout out to all the folks that are involved there. Like you said, it’s a great way for people to get plugged in with a framework to get guidance, and also the pathway to contribution in ways that they can participate. It’s a reminder that you don’t need, you know, 15 years experience, to get involved. There are lots of different ways to do it. It can be a documentation, it can be, you know, bug fixes, lots of different opportunities that are available there for people that want to for people that want to get involved, for those that don’t know you as a person when you’re not maintaining Harbor, what kind of stuff do you like to do in your

Vadim Bauer  19:59  

free time? What. And I mean, my problem a bit, is that that, you know, computer is also my hobby, right? It’s a bit of a tricky, tricky situation here, because I hard, it’s hard for me to distinguish between my work and my hobby. But from my hobby perspective, I like to spend a lot of time in nature. I’m I’m a vivid no snow fan like go snowboarding and skiing in the winter, so I try to spend as much time as possible on the slopes or outside. And yeah, this is kind of my hobby, a bit ski sport and snowboarding and yeah,

Bart Farrell  20:45  

that’s good. It’s also good to know that as a maintainer, you still do have free time, so folks understand that not all of your time has to be spent on Harbor. You have, you know other folks there that are able to help out. But thank you very much for sharing your knowledge with us today. I think it’s quite clear the value that’s being provided by Harbor, its adoption has a graduated project with with strong end users, such as the ones that you mentioned. Look forward to crossing paths with you, whether it’s online or in person at KubeCon or other cloud native events, and wish you nothing but the best of luck as to well as all the folks that are maintainers and contributors in Harbor, yeah,

Vadim Bauer  21:18  

thank you for having me. Thank you very

Bart Farrell  21:20  

much pleasure. Take care. Bye.