System Architecture and Conway's Law
Learn how organizational communication structures can shape the architecture of systems. The team discusses the balance between creating independent teams for faster development and the potential pitfalls of unnecessary silos, especially for startups and smaller teams.
Listen next on Even-Keeled
Presented by
Transcript
Jon: Are we good? Tim say you love Ruby.
Tim: I love Ruby.
Chris: It's recorded. That's the episode. Well, we're done here. Good work, everyone.
Jon: Trying to channel my best MKBHD intro. We'll talk about it. We'll get there. Welcome.
Jon: So today we wanted to talk a little bit about system architecture, some of the evolution of hosted environments.
The ability for people to deploy entire solutions to the cloud, whether that be cloud native or dockerized servers behind Kubernetes clusters, all the different ways that we can structure a system and to some extent, the way that we deploy that system into environments and how that's impacted the evolution of Kubernetes.
Systems design of organizational structures and what that means to us at Hedway and to some of our clients and just some observations we've seen in the industry, as well as perhaps some personal preferences, some suggestions, and likely a little bit of debate around things we've seen work well and things that maybe haven't been quite our cup of tea.
So let's start with Conway's law. Conway's law. Coined in 1967 by Melvin Conway, states, Any organization that designs a system defined broadly will produce a design whose structure is a copy of the organization's communication structure. This concept gained popularity when the book The Mythical Man Month came out.
And I think really what it boils down to is you're going to design your systems to mirror the structures of your organization and perhaps even unconsciously. So if you have one team that's responsible for the API and one team that's responsible for the front end, they might have entirely different team members.
They might have entirely different managers. And all of a sudden, now you have these two silos in your company of folks that need to try to figure out how to work with one another. And you could argue that, well, that allows the API team to move on their own and it allows the front end team to move on their own and they can go faster without having to integrate and rely on one another.
And I think there are times when that is true. When your organization is As large as an Amazon or a Google or Facebook, you know, one of the sort of web scale type companies. But I think there are also times when that's overkill, especially if you're a startup or even a corporate innovation team bringing a new product to market.
Maybe there's some of your existing infrastructure that you need to deal with in that latter case, but it's, I think, unwise to design a brand new system. That doesn't even have users yet to mirror something like a Facebook or an Amazon. So what have any of you seen in terms of that concept or do you think that concept is something you think is a good thing or perhaps problematic?
Chris: Yeah. One other little wrinkle, I think, especially that we have to manage as being an agency that consults with other companies, when you're doing like a corporate innovation type project, a lot of times the backend team is internal. And the front end team is external and it's like a very clear line there because they don't want contractors using the legacy systems.
They don't want them dealing with that. So that's another reason for that sort of hard division there. But I will say that sometimes it feels as though if we were just given access to the database, we could just write the thing ourselves and it would be a little faster and it would be less of a communication gap.
But that's all organizational stuff and that comes down to how you manage and run your teams and you can solve all those with good communication.
Jon: I agree, Chris. I think that's probably where the rub is though, right? Is it requires thoughtful organization and good communication. And there's this concept that there's the exponential communication challenge in software development teams.
And this is like that the number of people in a team require exponentially more nodes of communication, the larger the team gets. So with three people, You have three connections, but with four people, you have six connections. And then at five people, it's 10 connections, six people, 15. And if you get up to eight person teams, maybe five of our team members and three of our client, there are 28 connection points between eight people.
And that's a lot to manage in terms of making sure everyone's on the same page. It requires, I think, really good project management, good user stories. The contracts between not only the two different organizations needs to be very well managed. But also if we are working, let's say with a backend API team, and we're the front end team with some designers as well, now we've got three groups of people, right, and all those different things that need to be passed around.
A point of diminishing returns and that's the whole concept of the mythical man month. They say that, hey, if it takes four people two months to build a system, if we need that system in one month, we just need to have eight people, but we all know that that's not exactly how it works. You can, of course, gain some speed by adding people, but the division of labor and the overhead required in order to facilitate that is definitely not something to be discounted.
Chris: Yeah, a good example of that is when like developers listening to this, when you have a side project and you finish it in four hours versus like client work where everything seems to take two or three times as long, there's a lot of overhead built in there when you're working with other people, but you can accomplish a lot more.
So there's a trade off there.
Jon: Yeah, it's interesting. We do a lot of Ruby on Rails development at Hedway and the framework was born out of a company called 37 signals and they famously use a technique called shape up that they designed. And I just saw a tweet the other day from Jason Fried, the CEO there.
And he says nearly all new product work is done by teams of three people at 37 signals, two programmers and one designer. And if it's not three, it's two or one, not four or five. He goes on to say, we don't throw more people at problems. We chisel problems down until they can be tackled by three people at most.
Thoughts on that?
Tim: I think we've experienced that with Dan and I on over a year long engagement. At one point there might've been four headway devs full time, but for the majority of the year plus engagement, it was. Dan and I, and compared to the partner teams that were at least five, uh, our ability to do work together far exceeded those teams, even though they had two, three times the number of engineers contributing to the effort.
And I think a big part of that is just because we're close to one other. We know one another, we understand the technology from front to back and we're able to support one another in a very efficient way. So, Dan, do you have anything you'd add to that experience?
Dan: No, I think it's dead on and I think the reason when I reflect on why that works so well, I think that three person team is like the sweet spot for what you could think about as like a context failover, right?
When one person's heavily engaged in whatever the problem is they're solving, there's enough context between those other two people to plan for and move forward the next thing. And you can keep that rolling and build momentum with that. The more people you get involved, the less efficient that becomes.
The fewer people you get involved, doesn't get more efficient because now you're really bottlenecked with each other, right? If person A is heavily engaged with maybe planning some work, person B might be at a stopping point and they can't do anything. So we found with that three person setup, it seemed to really work well because even though two of us might be developers and one person might be a product person, there was enough context about what the next step was that we could always keep something moving forward, we could always keep that momentum and that flywheel spinning.
Jon: Yeah, and all that to say that it's not always possible for a single person to end to end design and develop a system, especially when they're complex like mobile applications that need to span multiple deployment environments. And I guess by that I mean, multiple operating systems, at least the two biggest ones right now, iOS and Android.
It's difficult for a single person to be able to understand all of the various nuances of both. Certainly it's possible if that's your main focus, but as consultants, we have to build systems for lots of different environments. And so we do have some specialists that all need to come together. In order to be able to effectively build products.
So it does work when done thoughtfully. It's just that when done accidentally and without the proper setup for communication and the proper setup for planning, if you just say, Oh, well, we've got these two teams and they've got managers and go build some stuff. Like we've even been put in situations where.
We have to mock out and Chris, you're on a project right now where this is the case, and I think it's going to turn out okay, but it's a little bit like walking on eggshells. We don't have the back end all the way spun up yet. We're building components and have been for weeks and weeks, if not months. In isolation with storybook.
And now things are starting to come together with full pages, but we don't have a backend API to query into. So we're having to mock out what we think that'll be. And there's probably going to be some churn.
Chris: Yeah, for sure. Yeah. I think about it. It's like you have the front end layer and the furthest edge of the front end layer is pretty solid.
And then as you go up, it starts to get a little soft and the backend layer, therefore this edge is really solid. But as you start to get towards the middle, like it's a little pushy. So as that stuff hardens, like, you know, there's going to be a little bit of rework. So there is some inefficiency there that we're going to have to deal with, but yeah, that's common in these sorts of scenarios where you have separate teams and they're both working on separate priorities.
It takes a lot of communication, a lot of documentation to keep everybody on the same page.
Tim: One alternative approach to what Chris and Noah are working on. I know that you were working with another, a different client in considering was, uh, was to leverage live view, or if you wanted to replace live view with.
Hotwire in a way that would allow you to interface with that existing aged hardened API and begin to write fresh UI on top of that existing infrastructure. And that to me is very intriguing, very appealing, uh, not only because I love to write Elixir, but I think it allows you to get the speed of writing separate UI while not wrestling with the existing infrastructure, but instead migrating towards something more modern.
So I don't know if you wanted to add anything to that, but I think that's a very interesting pitch that you gave.
Jon: Yeah, in this particular case, what's most interesting about it is it's an intentional decision for them to be able to outsource various aspects of their next phase of their business. And so they're a dot net company and they have an existing back end API.
They're going to be moving towards react for a front end and yet they need partners to come in and be able to build. Slices of their system, largely in isolation, which is tricky, and they're cautious about giving too much access to their backend systems to external parties that are not FTEs. So in this case, I pitched that it would be good for us to build, A full stack Elixir application leveraging live view on the front end with the ability then through WebSockets to get real time updates based on our backend's query into one or multiple of their systems.
And I thought it would make the most sense and would be the most scalable. And efficient to have those queries happen from our back end to their back end. Rather than having to try and manage all of that on the client itself. Because we also needed our own persistence layer for some intermediary functionality that we would be building.
Why build A totally separate react app in a totally separate repo with our own API and the complexities that are required in maintaining those two code bases. Why not just build a full stack system that can still act very much like a spa for the client and yet gain more control by making those calls from our backend.
So that was my proposal. We're still in the process of. I'm not working through what that's going to look like to get started, but I'm optimistic that's going to help them achieve their goals and allow us to work in a way that's most effective for our team. So I mean, super leaning into Conway's Law there.
And yet at the same time, I frequently view Conway's Law as a negative. So I suppose it's all a matter of context, really.
Tim: John, can you explain that a little bit more, how that's leaning into Conway's Law?
Jon: Right. So the organizational structure is literally two organizations in this particular case, right?
So our client and their team, and then Headway and our team of developers. And so. Even within the context of those two organizations, there are more quote unquote silos, and that's the whole concept is that this system that is going to evolve from the humans that are creating it will mirror those communication and organizational structures.
And so we're literally saying one of the organizations is going to build a system in isolation that will then talk to another organization system. Through API calls and that contract will then need to be negotiated between the two of our organizations.
Chris: So is the API
Jon: call, is that like the scrum master in
Chris: this scenario, the project manager?
Jon: Could be. Does that make sense, Tim? Anything else you were wondering about that?
Tim: Yeah, that helps clarify. So that's not necessarily inherent in this systems approach that you described. From a technical perspective, that also exists, whether it's React or any other technology with a separate API talking to that existing backend.
I think the appeal to me in what you pitched is that it eliminates the need to manage. To really introduce two new disparate systems, a React SPA, a separate API, which then talks to the back end. Instead it introduces a single full stack solution. And in that way I think it minimizes Conway's loss, I just wanted to clarify.
Jon: Yes, it minimizes it while still allowing the overall engagement to benefit from that concept.
Tim: Yeah. That's great. Thanks for clarifying.
Jon: But yeah, I guess it minimizes it on the headway side specifically, I think is your point, right? So it's less of a burden for our team because we can work unified in a full stack scenario.
And then by nature of the engagements requirements that were two separate teams helping them build something, that's just that bridge that we have to cross between our two organizations.
Chris: And with React Server Components, now you can actually do full stack React Dev. You don't actually need to make a separate API in a separate spot anymore.
So, I guess technology agnostic, like, that does make a lot of sense. To just have one single app, especially when you're keeping team sizes small. I think that's an approach that we usually recommend.
Jon: Yeah, and I know a lot of the argument in the prior decade for using JavaScript in the first place on the server was that you could have the same language on your client as you did on the server.
And so now with the ability to do full stacked JavaScript development with React Server components, Yeah, whether or not you like JavaScript, I do like that better than having to have two entirely separate systems. So that's a little bit about architecture of systems, a little bit about Conway's law with regard to how organizations in their own walls work between teams, how Headway as a consultancy works with some of our clients, but let's talk a little bit deeper about some of the technology.
And this concept of migrating from on prem to the cloud. Of course, we've also talked about the fact that some companies, again, 37 signals championing this are going back and taking control from the cloud to their local on prem situations so that they can own their servers without having to pay the metered experience.
And that technology, once again, has helped make that so much easier, but at the same time, there is still a good place for the cloud to exist. There's a place where it makes sense, and I think there's still a place for microservices and various aspects of that architecture to help things scale more effectively.
Tim, what do you see in today's hosted infrastructure technology that is solving a problem in a unique way?
Tim: Yeah, it's a good question. I've been very intrigued with what Chris McCord had announced known as the flame pattern. He announced this a few months back and he recently presented on the pattern at Elixir Comp EU.
The video is just published this week and so we can share a link to it if folks are interested in seeing what the flame pattern does in action. Thanks One thing that sort of undergirds Chris's inspiration for Flame was he wanted to not just solve a problem, but he wanted to remove the problem. And I think that's a very pertinent goal, very pertinent observation on his part, because we often get focused on solving problems and that's what we have to do as engineers, but it's good to stop sometimes and say, do I need to solve this problem or can I remove it altogether?
So the Flame pattern is not specific to any language. It's a pattern. Although Chris is presenting it in the context of Elixir. There's an Elixir library with an adapter for Fly and for Kubernetes. And so, if you're using Fly. io, it's very easy to test out the Flame pattern. If you're using Kubernetes, there's an adapter for it.
And it should work on other platforms. And so what is it? So, the beauty of Flame is that I could take a specific block of code in my app. And all I need to do is wrap that block of code in a Flame block. And what that gives me is that block of code now will be offloaded to a different node in my system invisibly.
I don't need to do anything more. I simply need to wrap that code in a flame block and it will scale elastically based on how I may have configured flame up front. And so I might say scale flame from zero machines, which would imply that there would be a cold start involved when that block of code is executed.
Or make it a minimum of one or two, whatever your minimum is, up to a specific number of machines. And so the beauty of being able to wrap a block of code, or if we wanted to put it differently, a specific function within your app. And so what Chris often uses as his example is transcribing video, generating thumbnails from images.
And so historically what we would do is offload the responsibility of transcribing, transcoding video. We're generating thumbnails from an image to a separate machine. And in order to do that, we're passing state from our main application to that separate node. And then somehow we're getting notification when that work is complete.
We're getting notification if there's an error in that code, if it failed, like what's the status of it, right? We're having to track state for that piece of work that we panned into a separate node and then getting it back again into our main application. And Flame removes all of that management of state without losing the elasticity and the scalability ultimately that we need when doing something that's very work CPU intensive like transcoding video.
Jon: That's super interesting, Tim, because in the past we may have thought, hey, there's this heavy workload. We need to go create a lambda function on AWS and write all of that code and deploy it there. Whereas you're saying this is all a part of the stack that you're already building in and it handles it in the background.
So, I think that's, Amazing. That requires no additional work or thought from the engineer outside of the context of their full stack solution to be able to deploy that in an effective and scalable way. That's wonderful.
Tim: One of the main benefits is the DevOps aspect. I don't need to stand up another node that has all the dependencies needed for transcoding or for image processing.
I don't need to manage now two separate nodes. I just have my full stack solution that gets deployed and that's it. I just don't need to manage different things. And there's a significant gain to be realized in that scenario.
Dan: Yeah, I think what's cool about that example that Chris has put together with Flame is the level of DevOps knowledge you have to ramp up on to be able to use it is very shallow grade versus all the things that would be required maybe to stand up something over in a cloud.
On a Lambda, if you're coming in from like a zero, like I'm just doing programming, I'm just doing web development, but I know that I need to do this thing over here, I need to run this Lambda, there's a pretty steep curve to get it set up in a really scalable, functional way, like DevOps knowledge that you need to learn.
And here in this paradigm, it's a little bit more straightforward, it's more bite sized, and it's like just a little bit of a chunk that you can add to your existing knowledge, and then you've got it, and it's all you need to know. I think that's really compelling.
Jon: For people who don't have that capability yet baked into their frameworks of choice, or into the libraries that they're developing with, let's say, a React application.
There are some existing similar ideas within some of the hosted environments like Superbase. So, Chris, do you have experience with that?
Chris: Yeah, there's a lot of the more modern, I guess, popular hosting solutions or even the database as a service like Superbase or Netlify or Vercel. Like, they do a lot of this too.
It's really interesting. We were talking before we started recording here how, like, web development as a whole is converging around, like, these same ideas. We're kind of realizing that managing your own cloud functions on GCP or AWS is there's a lot of friction there. And it's also it's cross domain because you want to just build your cool stuff in your domain.
If you're. Building an e commerce site where you're trying to sell some clothes like you don't want to care about your infrastructure. You just want it to work and scale up if you get really popular. And there exists now these tools in between AWS and in between GCP and whatever cloud providers. That make that even easier for you so you can just go right like super base is a good example, like it has cloud functions, but you almost don't know that their cloud functions for sale.
Also, like when you're writing cloud functions in Brazil, it's just part of your code base. You have no idea unless you specifically look, there's no difference to you as a developer when you're writing a cloud function versus just writing a function that fires on the server. I'm getting kind of off topic here, but this same sort of paradigm with LiveView, Hotwire, server components, where we're kind of realizing that like everything on the server is bad and also everything on the client is bad.
So the best state is serve as much as you can from the server and refresh the client when you need to.
Tim: Chris, I think that's well said, that convergence that you're describing, it'll be very interesting to see if the flame pattern inspires libraries in other languages and other communities. Chris has said quite often in his presentations on flame.
It's very little code. And so I imagine that somebody creating a flame library in Ruby, for example, wouldn't take a whole ton of effort. And then that same pattern can be realized and taken advantage of across communities, not just Elixir.
Dan: Yeah, actually Fly has an article on their blog from the last couple months, detailing how you would use that same pattern in JavaScript.
There's a lot of caveats with that. It's clearly not as developed as it would be on the Elixir side, but you can start to see those things forming. And that's pretty exciting because I think what it does is when these things are commoditized in a way that developers can just use them without really thinking about it.
I think it opens up what novel things they can build in different ways, right? So if you look at like Chris's example on his documentation of building something that's processing video and generating thumbnails. When you as a developer know that's the kind of thing that you can build and build pretty easily.
I think it broadens your horizons on what potential a build can even have, because if you don't know that's a capability you could have, you might not even reach for it. And so having these things be out there and just be so easy to throw to something like Superbase or Netlify or pull in a flame pattern to do this kind of work, I think is going to be a key to kind of unlocking what like that next baseline level of what a web application is.
Jon: And then really nice thing going back to Conway's law for a moment is historically if you had perhaps, and I'm not trying to pick on Java, but if you had a legacy Java spring application and you needed the ability to have a really heavy workload and let's say you were hosting it on premise, you would probably have a separate team that was working on that higher throughput section of your application, right?
And they would need to say, all right, well, what are you passing us? We'll process the video. We'll munch it this way. We'll. encode it this way and we'll, we'll compress it and kick it back to you. That would be different people trying to achieve that capability, whereas when it's just part of your full stack and has the ability to scale in the background, you don't need that separate team.
The regular, so to speak, developers can handle that.
Dan: I've actually worked on a team who had a video processing feature in some of our applications, and it was scattered across a variety of different Amazon services. And when changes needed to happen, they were heavily bottlenecked by the one person who really understood it, understood where everything was, understood how to deploy it.
And so, like you're saying, when it's then brought into your full stack, that just benefits everybody, right? The context is there for everybody. Junior developers can look at that code and learn and understand and level up, instead of having to go poke around and try to understand all these other concepts just to get to the thing that's actually really business important.
Jon: Well, thanks for the conversation, guys. This has been really good. I think, for me, one of the takeaways is that Conway's law isn't necessarily positive or negative. The ability to design systems that mirror your communication structures of your organization can be beneficial, but it can also be detrimental if not taken into account on the onset of the project itself.
So for example, when we're working with a client and we need to handle one whole slice of what they want to do, it plays into our advantage when we're working in different code bases and we have arbitrary silos because two people need to be managers that can perhaps be detrimental. And those artificial silos can lead to delays and more complexity than is necessary.
So it's something to be aware of and something to Manage to your advantage. Any other thoughts before we wrap up guys?
Tim: Yeah. I mean, this has been really good. The questions that we get from clients and the innovations that we all consider when we're trying to help a client or working on something internal, whatever those innovations are that we try to apply to the problem at hand, it's constantly evolving, so it's really good to pause for a minute.
Hear from everybody's experience on the call and to be able to just identify, are we solving the right problem or is there a way for us to remove the problem altogether and just work smarter, not harder.
Chris: I think it was a good conversation. I think that one of the main takeaways that I took from this is your company's communication structure drives whether Conway's Law is good or bad for you, which is obvious, but then once you say it out loud, it makes a lot more sense.
So if you have good communication within your organization, Conway's Law will really help you. If you don't, or there's some gaps there, then maybe it's going to hurt you a little bit. Love it.
Dan: Yeah, I think it depends, honestly. There's going to be times where you have to have it, right? And so I think it's a tension to be managed, but I think the more and more that we can be aware of it and make sure that we're designing systems in a way that aren't just going to fully implement it, then the better off we're going to be.
And I think that's what we're seeing as a lot of these new techniques are converging. Things like we've talked about in this episode and things we'll probably find out about more this year as more and more new approaches are coming together from these battle hardened languages that we're pretty deeply entrenched in.
So I'm super optimistic about the future of software development and web applications.
Jon: Yeah, I'll do a little bit more research around 37signals technology called Kamal, and I'd be curious to see what capabilities they have for some of that elasticity that we talked about and not just spinning up another server or spinning up a beefier server to be able to facilitate some of those higher intensity portions of an application.
So more to come on that. We'll do a blog post or some additional content on social to follow up with this episode, but appreciate the time, everybody. Thanks. Thanks. Thanks everyone. Well, that wraps it up for today. If this content is helpful for you, it'd mean a lot to me if you could rate and review the podcast on iTunes or wherever you get your podcasts from so that we can help share this info with even more people.
Until next time, I'm your host, John Kinney, and this is the Even Keeled
Podcast.