Keeping Cyber Attacks from Blowing Stuff Up-An Interview with Erfan Ibrahim, CEO of The Bit Bazaar

Erfan Ibrahim CEO BitBazaar Cyber Security Dispatch.jpg

An Interview with Erfan Ibrahim, CEO of The Bit Bazaar

Cyber Security Dispatch: Season 2, Episode 11

Show Notes:

On today’s episode of the Cyber Security Dispatch we welcome the CEO of The Bit Bazaar, Erfan Ibrahim to talk about his groundbreaking ideas in the fields of security and resilience. Erfan’s expertise extends beyond the common cyber domains into the world of mechanical infrastructure and hardware and we hear all about the challenges that this dimension adds to the work. Erfan tells us about how he came up in the field and landed in his current position before deep diving on the topic of hardened and layered defenses, something that he sees as paramount to resilience. We then go on to chat about institutional architecture, mental models, confidentiality, ‘hyper-quiet’ networks, existent and legacy hardware and much more. So be sure to tune in for one of the most visionary and thoughtful conversations we have had the pleasure of hosting on this podcast.

Key Points From This Episode:

  • Erfan’s professional background and how this sets him apart.

  • The problem with businesses’ drive towards interconnectivity.

  • Creating a hardened, layered defense as opposed to merely a perimeter.

  • How these concerns fit into a real life utility configuration.

  • The importance of institutional architecture beyond personnel.

  • Shifting common mental models of security and how it relates to confidentiality.

  • The benefits of prioritizing ‘hyper-quiet’ networks.

  • The influence of existing hardware on the design of current security.

  • Erfan’s first instructions to consciousness CISOs wanting to create a securer network.

  • How Erfan views the current state of cyber security and its biggest impediments.

  • Properly measuring the strength of a network and its security.

  • The rise in popularity of the term ‘resiliency’ in place of ‘security’.

  • Erfan gives us his definition of resiliency.

  • And much more!

Links Mentioned in Today’s Episode:

The Bit Bazaar — https://tbbllc.com/
Erfan Ibrahim Linkedin — https://www.linkedin.com/in/erfan-ibrahim-99a5374
Pacific Bell — https://www.bloomberg.com/profiles/companies/PAC1:US-pacific-bell-telephone-co

Moore’s Law — https://www.investopedia.com/terms/m/mooreslaw.asp

Introduction:

[0:00:04.1] AA: Welcome to another edition of Cyber Security Dispatch, this is your host Andy Anderson. In this episode, Keeping Cyber Attacks from Blowing Stuff Up, we talk with Erfan Ibrahim, CEO of Bit Bazaar and an expert on security in the utility and power sector. A former nuclear engineer who has spent decades helping to secure both the OT, operational technology and IT, informational technology infrastructure. Erfan shares his refreshingly lucid and frank views on how he can help secure some of the most sensitive assets.

Namely power plants and the electric grid from potentially deadly cyber-attacks. Sit back and enjoy what Erfan has to share.

Interview:

[0:00:49.1] EI: My name is Erfan Ibrahim and I am the founder and CEO of a company called The Bit Bazaar LLC and I’ve had this company since 2001 and I’ve been helping companies align their IT goals with their business goals through making good choices in networking and cyber security network management for a variety of verticals.

[0:01:17.5] AA: We were talking before this about kind of your background and it’s a pretty unique one for someone in the cyber security space. I’d love if you could kind of just share how you ended up doing what you’re doing and your path here.

[0:01:31.3] EI: Sure, my background is actually in nuclear engineering. I have an undergrad in physics, a masters in mechanical engineering with a nuclear focus and a PHD in nuclear engineering. I’ve had experience in both fission and fusion engineering. I have been dealing with energy and the complexity of energy systems from over 35 years.

When I shifted gears and entered the information technology world in the mid-90s, when there was a slowdown in the nuclear industry, I began to focus on networks and the same background of complexity came back as I started understanding how  TCP IP networks work and when they don’t work and what are some of the security implications. I started this field from scratch.

I was working as help desk at Pacific Bell in Dublin California and so my knowledge of TCP IP and networking in general, cyber security, communications, is hands on. It’s not taught in a school, I did not take any certifications, I didn’t pass any exams but I did take business requirements and turned them into network designs, built them, tested them and then even maintained them.

At one point, I was managing the western backbone of the internet for Pacific Bell where we had two ATM switches that stands for ‘asynchronous transfer mode’ and I had two fiber distributed data interface or FDDI rings with Cisco 7,000 routers and all the major ISPs of the US going through the fabric that I was managing.

I have operations background, I have design background, I’ve also have ability to troubleshoot and identify problems. Now over a 23 year period, I have collected quite a lot of experience, I’ve thought about things and that’s why I have a very different take on the issues that we’re facing in the digital world than a person who typically goes through college education and starts working in the technical field.

[0:04:08.6] AA: We’ve had a couple of people on like yourself who kind of literally, their career ran alongside like the building of the internet and so I think it’s like you’ve seen how the sausage is made, right? There’s a bunch that came later and just sort of ate it and thought it worked and tasted good, right? You actually kind of peer behind the curtain and seen what’s going on back there.

It sounds like some of that stuff is definitely scary. Walk us through kind of some of the things that you’re seeing now that are particularly kind of a concern to you and knowing, informed by that background. Kind of understanding how things at work and are tied together?

[0:04:49.8] EI: My concern right now is that for the sake of business expediency, networks that were typically not connected to each other are quickly being daisy-chained to each other so that applications can run across them and end users can get the benefit of access to data quickly. But there hasn’t been much regard to the cyber security implications of connecting disparate networks like that quickly without having a proper cyber security architecture that provides the appropriate controls that the different logical layers to not allow attackers to come in or if they’re inside to move around.

What I’m seeing is an obsession with confidentiality where authentication screens are getting more and more sophisticated. Username and passwords are getting longer and longer, caches are being used and all of this falls apart the moment an attacker has a pivot in the network because of those security credentials are available to this hacker once they have pivoted and entered the network. Then there’s very little that can be done to stop them in the current network design.

This is what my concern is, that business is driving inter connectivity with little regard for cyber security.

[0:06:27.1] AA: I think for those who aren’t as deeply knowledgeable with this as you are the different networks out there, the internet is one network but there’s a number of other networks that existed, using different computers to do different things for example, like power grids, the different computers who are talking to each other, that’s also a network but not one that’s sort of imagine that it would be connected to a more global network and then y it sounds like the – yeah, there’s so much sort of, the only way with many people were thinking about defense is that perimeter, right?

Hardening that edge but once you can break through that edge, you’re into this sort of world where things are very fragile, very kind of broken inside. What are you sort of seeing as, what’s the solution, right? What is it, how do we kind of get to a better place where these potentially fragile networks are more secure and thinking kind of past that initial penetration?

It’s sort of crazy that we don’t assume that individuals will penetrate that perimeter because they’re doing it kind of every day in thousands of different locations, different companies etc.
 
[0:07:44.7] EI: There are a few things that you can introduce into a network to create a layered defense architecture. First and foremost is that boundary wall because you don’t want all kinds of malware coming in from the public internet into your corporate trusted network.

Having the appropriate firewalls, setting up the firewall policies, so that there’s role based access control, individuals log in to the network that based on their credentials, they’re only allowed to go in certain places if there’s no purpose for a person to enter the trusted network, they should not be allowed. Firewalls help you do those kinds of things, definitely a solid firewall with very granular firewall policies to enforce strict role based access control is the A-B-C of security.

Beyond that, within the trusted network, there should be a concept of segmentation by business function, it should not be sufficient that once you are authenticated as a trusted employee or a contractor that you have free reign over the trusted network. You should only be allowed to go places where the use cases justify that you should.

In other words, the transaction that you need to be a part of, that takes you to a certain IP address. Only those access controls should be in place, all others should be blocked, even though you’ve been authenticated, you can now only go to authorized sites or nodes within the trusted network based on your job functions. That’s what we call network segmentation.

You can enforce that by setting up access control list on switches, you can also have firewalls within the organization itself to break up the different parts of the network and do authentication. That’s the second level of defense, the third level of defense comes from actively monitoring the traffic that is going on all the critical links and trying to identify if there’s any well-known malware with a signature flying around the network.

If so, suppressing it so that it doesn’t contaminate the network. That’s one type of intrusion detection and prevention. The second kind is harder to do on the information technology side but easier to do on the operational technology side. That is what we call context based intrusion.

You are monitoring actively what the commands are in the various protocols. Those communication messages that are going back and forth between users and systems or between systems and identifying if those commands are legit or not.

If they’re not to block them. That’s the next level of protection that you want to provide. Now, that’s much easier to do on an operational technology or OT side, where the systems are the supervisory control and data acquisitions in terms of what we call industrial control systems that are actually managed physical assets.

It’s much easier to do it there because the protocols are well defined, the commands are well posted, the values are expected so you can set very granular rules and only allows certain messages and values to get to certain nodes and not others. But on the IT side which is the information technology or the corporate side, that’s much harder to do because there’s so many applications running on the corporate network at any given time, you’ve got emails, you’ve got DNS lookups, you’ve got accounting software and you have database queries going on.

It’s very difficult to setup filters that are context-based on the IT side, if you even you set them up, you’ll end up with lots of false positives. It’s best that we protect the information technology side using granular segmentation and malware - signature based malware detection tool and of course the firewalls.

That would cover the IT side and the OT side, we add this additional intrusion detection system that’s context based. Now, the final frontier of course is the end node itself. First of all, it needs to be hardened. Hardened means that the operating systems should have the latest security patches on it, there are no well-known vulnerabilities that hackers could exploit in bringing that node down.

The second thing that you want to do is if it has the capacity, if the hardware has a memory and processing capabilities that you virtualize it. You basically set up like a VMware or you could get an open sourced tool that creates a hypervisor and what that does is it allows you to then setup virtual machines and then host the operating systems in the virtual machines and then you put the application in the VM as supposed to directly on the hardware.

What that does is if the machine gets corrupted, you can just replace the virtual machine instance with the last stable one and you’re often running again, as supposed to trying to reformat the disk and putting in a new OS and then rebuilding the machine which is what you have to do with traditional systems.

These are all things that we can do to go beyond just the perimeter security in building up what I call a layered defense architecture. When you set up all these controls, the hacker, whether an outsider or insider is thoroughly frustrated as they try to navigate around you naturally. They lose motivation and they would go somewhere else which is exactly what the goal is. Eventually, all networks are penetrable and that’s why in the advanced persistent tread or APT, I write the P in capital letters. That persistence is what pays off for them.

Now, if they – depending on what their goal is about compromising systems, the level of persistence can vary. At least with this layered defense approach, you’re thoroughly discouraging them from continuing.

[0:14:26.3] AA: Yeah, that was great and I think, a great kind of overview of all the options that are there. I think two things that are really interesting is that idea of kind of impermanence like building in that ability to kind of throw things away and then have a fresh one there I think is not one that necessarily has traditionally been thought of as a way of hardening a system, right?

Usually, virtualization for the very fact that you can replace it quite quickly – it’s beneficial for an idea of resiliency but also, potentially for reducing that persistence of attackers or malware, right? If you’re building fresh, every day, every week, every month, whatever it is, that anything that might be living there sort of dies with the old install.

I think what might be really helpful is just for people who are kind of new to these concepts is just for you to ground it in a real life example, I know you have a lot of experience in the utility and the power space. Could you walk kind of how this looks, how this might look like in real life for example a utility?

[0:15:39.9] EI: Yeah. Typically, in a utility, you have a control center which sits in an enterprise, that’s some major urban area and then you have set of substations, they are the larger ones that have more intelligence in them and then there’s smaller ones that have less energy flowing through them but there’ snot that much intelligence there and then you have street level transformers. All the way to the customer.

What I’m talking about are the network in the corporate both on the IT and the OT side for the control center as the OT part of the network and the enterprise and then a bunch of sub stations so you would setup a firewall for sure between the IT and OT side to make sure that only authorized people are entering on the OT side, in the enterprise and then within the enterprise, you can break it up, the OT network into subnets or for segmentation of networks by the business function.

Where if it’s a transmission company, you may have one network that’s monitoring the space or measurement unit. Another one that’s got all the energy management system in it, and then you may have another one for fleet maintenance and so on. You can segment the enterprise OT network like that.

If you’re a distribution company, you may have a sub net for all the meter related, smart meter related stuff, you may have one that focuses on all the transformer meters that are out there in the substations. You may have one for that just looks at relays and so on. You can segment the network up in the corporate side. Then each substation would also have a firewall facing the outside.

Even though there’s no internet connection there at the substation but it’s still trying to block any unauthorized access from the corporate side. You put a firewall on the front. Again, segment the network in the substation by the functions of cab banks, capacitor banks or voltage regulators or relays and PLCs and RT use, you can break the function up of based on the application.

Then, beyond the substation, if there are sensors along the distribution line, those sensors can talk either wirelessly or to twisted pair like a phone connection or through fiber, back to the substation and then from the substation back to the corporate. You create a hierarchical network and again, you apply the same network segmentation principles based on use cases.

You keep the swim lanes of the different function separate so that if they don’t mix the traffic on to the same logical sub net. The IT, the OT, the management Vlans and all of these who will be separate. The hypervisors could be kept in a separate Vlan or a virtual local area network. By doing this, you can bring in this layer defense architecture into the utility industry.

[0:19:07.5] AA: What do you think the sort of challenge of doing that is like there needs to be an incredible amount of kind of thoughtfulness and design from the operators who are running these systems like how do you think about the design and the potential cost of segmentation and sort of setting up a network where you are kind of blocking a lot of the traffic? I mean, it’s, how do people overcome that challenge?

[0:19:32.5] EI: It’s overcome by having a visionary Chief Information Security Officer first. Who has been given the authority by the Chief Operating Officer to clean up the act. Then that individual and empowered to bring the network security and application people together into rooms and on white boards built cartoon diagrams of how the final stage should look like.

Then through gradual maintenance periods, migrate from the current architecture to that desired architecture. Unless the security officer is not in the same team with the network, the security and the application people, this is not going to happen. The problem that we are seeing in most corporate America, it’s not just limited to the utility industry, is that most Chief Security Officers last about 18 months in their job. The first six months, they’re trying to figure out what’s going on in the organization they moved in. The second six months, they try to make a difference and the third six months, their job hunting for their next job.

It’s almost like you know, being an elected official for the federal government where you’re being elected every two years. You only work one year or the next year you’re campaigning to get elected again. This is why a lot of work is left on the table and is not implemented. The second person comes with their own vision.

What needs to happen is that institutionally, the architecture needs to hope like the way our constitution is the constant as governments change in our country. The equivalent of constitution is the proper cyber security architecture that embraces diversity, that embraces change but is still there as a logical framework for everyone to follow.

At some point, a visionary chief security officer needs to create this architecture and even if they’re leaving their job after 18 months to pass on this architecture to the next person come in, they embrace that vision as continued. This constant change is what’s causing the problem. It doesn’t take very long and it doesn’t cost much money to redo it, we’re talking about the same Cisco firewalls and routers and switches and minimal changes in the configuration to create segmentation.

I will tell you from example, my previous work where I was at a national lab, we’ve started, we built network with essential site in two substations. It took me four, five months with a couple of people to build it up from scratch. Once we had done it, as we figured out the architecture, we brought that time down from four, or five months to six weeks and after we documented everything, we could do it in two.

I’m talking, starting from bare metal and building up the entire corporate and two substations worth of logic. Now, a few people can do this in two weeks with documentation. Most utilities have lots of people and they can contract people. This is not an excuse at all. I think that it just requires a vision and will and then a team that’s willing to implement the vision.

[0:23:19.1] AA: Yeah, I mean, I’m being pesky because I think it’s valuable because I know that security teams get a lot of these questions, right? I think one of the challenges is always like, well, security is often like an afterthought, right? It’s not the first - first we want to do a function and then we want it to be secure but doing things purely for - doing stuff for security sake alone, unfortunately, you know, a lot of organizations sort of often kind of becomes a second or third or even fourth priority.

Maybe that’s starting to change because we’re starting to just see more and more attacks and sort of the scale of the issues that they can create is rising. What are the, you know, how do you sort of sell this to a security team or some of the potential benefits that go beyond just a more secure system? Yeah, I’d love to kind of just hear your thoughts there.

[0:24:16.5] EI: First and foremost, this mental model that people have at cyber security equals confidentiality, which means username and password, digital certificates and encryption, that is fundamentally a flawed mental model of cyber security. This is the reason why it’s an afterthought. Because they say we’re going to setup the whole network and when we’re done, we’re going to lock it down with these things. No. That’s not security.

Security is a very simple concept. That said, I will provide data on a need basis, just like in the military. On a need basis and that’s it. There is no such thing as global authentication of anything. You are a certain person, l you have a certain job function, you will only be allowed access to XY or Z, nothing else. Kind of like an intranet that’s customized where people, based on their log in privileges, only sees certain things on the intranet page. Other people with other job description see others.

That’s an example of what I’m talking about. Sharing knowledge or data on a need basis. If you keep that criteria, in the design of a network, you will see all these rules that I’m talking about, get implemented right off the bat.

Now, that’s one. Second, I’m going to monitor what you’re doing. Even though I have given you access to only things that you allowed, you may do nefarious things with those assets. I got to watch you also. That’s where they hold intrusion detection comes in. Intrusion detection will work if data was not provided on a need basis because by providing data on a need basis, you are quieting down the network, you’re only allowing authorized traffic to move and nothing else.

If they try to access other things, they won’t be able to. No data will flow. The quiet networks are easy to monitor and identify anomalies. Now when you’ve done both of those, then, you lock it down with additional things, with encryption and authentication and all of that. More robust.

Availability of data, availability of applications is paramount. The integrity of data is very important and then the confidentiality. The problem that we’re facing in the energy industry is we’re getting a lot of people from financial services, banking industry, healthcare, popping into the energy sector with their mental model of confidentiality being paramount. Because over there it is.

Your social security number, your account number or things like that are very confidential information but in a utility industry, the oil pressure or water pressure or the temperature of the oil in it in a transformer are not confidential. Their integrity and availability are important to make decisions but their confidentiality is not that important.

So we need to shatter some of these mental models about security being equal to confidentiality because that is what makes it an afterthought rather than sharing data on a need basis and monitoring and preventing anomalous behavior.

[0:27:37.0] AA: In our last conversation you used that term hyper-quiet networks which I think was a great one to think of what these networks look like and once you say simplifying the track like it is growing across these networks, it is a lot easier to monitor to understand them right? It is not like it is doing a hundred things. It is doing two things, three things, right? And if it goes out of bounds.

But once the system is started to be architected this way, are there other benefits that you start to see beyond one security perspective? I mean in thinking through that organization, did they start to see better up time? How do you sell this to people who aren’t or security isn’t their first driving need?

[0:28:24.9] EI: The first benefit that you see with hyper-quiet networks which occur when you create very granular access control less on switches and do not allow multi cast or broadcast packets to fly around but only unicast packets that are destined for specific IP addresses. What it does it is reduces the CPU utilization of routers and switches and firewalls and that in itself improves the throughput. You have fewer buffer misses.

You have fewer dumps of data because there isn’t enough memory to hold it while the other packets are being forwarded. Quiet networks with low CPU utilization are very, very effective in improving the uptime or the availability of applications. They also reduce the latency that an end user will experience. So almost think of it like Christmas Day and you are driving around town. Look how much faster you can get everywhere because all the shops are closed and people are not on the street.

[0:29:33.4] AA: Yeah and then it sounds like too, beyond hyper-quiet networks like this is essentially installing virtualized machines at the various end points running some of these different pieces of equipment or whatnot.

The amount of effort needed if one of those breaks goes a real kind of bear of a project to try and rebuild that whole operating system to once reveal in terms of just reinstalling the server and the base image that you have of whatever different system is running there?

[0:30:12.5] EI: Yes, so the same way that we, the IT department of the company will create an image of your laptop and if your laptop gets corrupted it just reformats the disk and puts a new image on. This is even one level more automated than that where the registries are not even real, they are virtual.

So when you bring in the last stable VM instance and you put it in there, it’s as if it was the original OS in the VM connected, sitting on this hardware. There is no memory of the last instance and that is very good to disconnect the actual internals of a machine with the application by using virtualization.

The other benefit that it has is that even though there is some overhead in memory and processing in managing a VM instance, on the whole it is a more efficient use of the hardware resources but if you can put multiple instances on a single piece of hardware. Because not all applications need to access the hardware at the same time.

So you have the ability to use the hardware processors more efficiently by having multiple VM instances on the same piece of hardware as oppose to locking it down or putting all the applications in one OS. So these are some of the benefits of virtualization in addition to the benefit of hardening the system making it more resilient to permanent damage.

[0:31:50.6] AA:  I’ve had lots of conversations where one of the biggest challenges is that sort of different - life cycles of different pieces of equipment, right? The sort of standard cellphone or even laptop it has an expected usable life of somewhere between three and maybe 10 years on the out in a laptop maybe but even after five, it is starting to get pretty dated whereas a good piece of equipment in a utility or a power station or a manufacturing facility, it is usable like maybe 25, 30 to 40 years.

And so thinking about how those two interact and I think your ideas about taking the operating system and putting it in a virtualized environment, it’s not necessarily directly running on that piece of hardware or at least, deeply embedded in that hardware is really nice. It’s almost like how we have cars now where you can – all cars, right, you change the brake pads and you change the tires and you change the wind shield wipers because we know they wear out much more quickly than we expect them to.

So it is almost like using virtualization as an expectation in the design. So I would love to have your thoughts in whether you are seeing people start to design that way or whether that is taking existing equipment and sort of setting it up that way.

[0:33:21.3] EI: So if you look, let me focus on the power sector that this will apply to other verticals also as it is also Moore’s Law, the price of hardware is coming down very fast. In every 18 months, the memory price becomes half the processing price becomes less and the processing speed doubles. So you are getting all of these benefits where getting a powerful computer to host your application, that powerful computer is becoming cheaper and cheaper.

If that is the case, there is very little excuse for keeping legacy power systems because of just the hardware. In other words, if a power system is residing on a hardware server like the standard computer, the business case is to keeping that old hardware doesn’t make sense because the application license fee from a Schneider ABD or Siemens is a lot more expensive than the price of the hardware on which it’s sitting.

So my recommendation is let’s take advantage of Moore’s Law and swap out the hardware. Still keep the old license. But then you are able to with their larger memory and processing capability be able to virtualize. There is minimum requirement for virtualization, you can’t just take an old Windows Vista type computer and start thinking you’re going to virtualize that. It is going to slow it down tremendously.

So let’s take advantage of the low cost of hardware but just keep the legacy application because it’s upgrade may require a big fee and virtualize it so that you can immediately start benefiting from VMs and the resilience and the efficiency of it without having forklift upgrades for the actual power system itself.

Now in some cases you cannot do that because the application and the hardware of the vendor is all built into one machine. It’s a purpose to have machine so there you have to wait for it to reach end of life but I have seen many controllers of systems that are just sitting on regular Windows server.

You know, R2 2012 or something so if it is like that there is no harm in spending a couple of thousands of dollars and moving it to the next level of computing.

[0:35:51.4] AA: Yeah, I am curious and I have heard different estimates for sort of how much legacy stuff is out there upwards to like 90% of different systems are legacy systems. In your experience, what’s the ability to virtualize? Where are things not be virtualized?

Is it like Windows 2003 can be virtualized, Vista can be virtualized, where does that sort of stop being possible to virtualize things? Where does that road happen?

[0:36:19.2] EI: If you don’t have it a computer grade device, it will be hard to virtualize. You should be able to run –

[0:36:26.2] AA: And by computer grade you mean what?

[0:36:28.3] EI: By computer grade I mean something that you can install standard applications on if you want to use the machines that way. You know like an Xbox is basically a glorified Linux machine, right?

That’s a computer but what’s not a computer is like the intelligence on a sensor or fire alarm or something like - those are purpose built. You know there may be an ASEC chip there but it is not supporting standard computation within separate OS and all of that. That’s where you can virtualize.

You need more resources if many times when things are purpose build, they keep the least amount of resources as possible because it is purpose built just for that function. That’s when it becomes difficult to virtualize unless the vendor has created the purpose build device to be virtual and that’s probably what’s going to happen to the next generation devices.

Many of the legacy features of hardware that are out there has so limited computational capability that trying to put a VM on it would make it keel over and die. That’s a fastest way to make it inoperable and we tried that on certain machines. It didn’t work. When I was at my previous job.

So even Windows 2003 was having difficulty, when we got to the upgraded Dell machines with the 2012 software, the OS then we were able to virtualize it and even then, we had to put a lot of extra memory to make it work but it’s never cheap.

You think if this device went to keeled over and died and you think about the monetary cost of that outage then the invest in a couple of thousand dollars in a piece of hardware is well worth it.

[0:38:21.2] AA: So if someone wanted to start thinking about doing this, like you said there’s visionary CISOs who have this - who are lucky enough to have the tenure and the experience but having run around and met a lot of people in the space, unfortunately there are too many jobs to do and not enough very experienced people to do them, how would they, short of listening to this podcast, how do they take the next step in terms of thinking about building these architectures and going from what is really high level discussion we are having here to more granular understanding, where would you point them?

[0:39:05.2] EI: I would first point them to creating three swim lanes either, just like on a freeway you have three lanes and each lane has a different purpose. So the left most lane you use for overtaking, the middle lane you drive when you are going the long haul and the right lane is the one that you usually take if you are real slow or you are planning to exit, right?

So there is a function for all three lanes. In the same way, if you do not have virtual local area network setup for IT functions and OT functions and management functions, you’re doing something wrong.

Mixing those swim lanes, creating one large [inaudible] in which you are moving the IT, the OT and management traffic is a big mistake from a cyber-perspective. So that is the first thing I would do in cleanup is go site by site and make sure that there is the creation of three virtual local area networks. IT, OT, management.

What’s in management? All the remote access through the secure shell SSH or secure sockets, their connections through remote devices for management. All the 5th log events that show the alarms from the devices should travel in that management VLAN. That’s what you use management VLAN for.

The OT VLAN you use for all the traffic of the actual power system application of water or oil and gas, whatever that OT traffic is. Protocols like DMP3 distributed network protocol or IEC 61850 for substation automation or mod bust or OPC, all of these protocols are OT protocols. They can run in DOTV LAN.

And in the IT VLAN, you run all the IT functions. The DNS stuff, all the email server stuff and the ERP systems, all that runs in IT. So if you keep the swim lane separate in every site you are decades ahead of other people. Because right now today that doesn’t happen across the board. They get sloppy and they start mixing the swim lanes and that’s where the trouble begins.

The nice thing about having a management VLAN with the assist log events coming through it is there’s an element of stealth in cyber security. So if an event occurs, the hacker doesn’t know about the event that they created because the alarms from that event are going through the management VLAN back to assist Log server that actually resides on the IT side and from the assist log server, emails could be sent to the employees and also you could visualize it on a splunk-like tool to figure out and correlate multiple alarms into an event.

So that is what I would recommend, to separate these at least with three swim lanes and towards that layered defense architecture.

[0:42:07.2] AA: What would you say in terms of percentage wise from an adoption perspective people who are architecting their systems here, are we at 1%? Are we at 10%, are we at 50%?

[0:42:20.8] EI: It’s hard to tell because what you are asking me is what percentage of security people in enterprises today are visionaries. I haven’t really done a focus group or a poll to say that. So it would not be - but as I go around the country, I get a lot of nods from utility either they tell me that, “We are already implementing many of these practices but not all of them,” or they tell me, “You know these are great ideas. I wish I had the power to do it”.

Because I have been telling them like the AFLAC Duck, I’ve been screaming AFLAC and people are not listening to me. I don’t get anyone saying, “Oh this is not going to work,” or, “This is useless.” They all empathize and they all support the vision but what I am seeing is the greatest impediment to implementing this is the busy schedules of people in just trying to keep the lights on. That is the greatest impediment that they say, “We don’t have the luxury of time to migrate to this amazing architecture”

And that I believe is the result of the way cyber security is viewed in corporate America. Unfortunately, it is still in the realm of IT, which is a big mistake. How can a company that is so digitally dependent on wealth generation relegate its cyber security aspect to the IT? It should be front and center as the supporter of business continuity and have its own budget, its own thing the way we have money for IT. The way we have stuff for ergonomic furniture and for sensitivity training against prejudice and all of that because these things can cause law suits that can ruin that company that’s why they give it priority.

Cyber is not getting that attention yet and that needs to change. If that changes and the money is freed up to support the function then these acts would be done is what I am recommending.

[0:44:28.9] AA: And I guess one of the things that is interesting is for a non-IT person, how would they think about measuring whether you are moving forward in this vision, right? Because I think in that old adage, what you measure you improve and you can’t improve anything unless you are measuring it, right? How would you measure the success or even begin to measure some of the ideas that you’re talking about?

[0:44:54.8] EI: One of the things that I would do, first of all, I would be the Sherlock Holmes myself if I am an employee and see if I can get into places that I am not supposed to and then report to management when that happens. That is one thing I would do as an employee. Of course, my career might be limited in that company but you know, if I really was sincere I would poke around and see if those policies, if that cyber people or IT people are saying that we have access based on need, let me test it out.

The other thing I would do is I would encourage frequent like in six months or a year, have cyber penetration tests done in the company. Red team, blue team, kind of exercises to identify the vulnerability.

I feel like too many people are drinking their own cool aid when the cyber people are telling the corporate, “Yeah we’re all secure because we tested it.” I really question what did they test. Will they ever really identify a vulnerability and tell the management about it? I doubt it.

[0:46:04.3] AA: Yeah, I am curious of your thoughts, I mean you even use the word secure and I wonder, some of the people that I’ve started to talking to and some of the more visionary CISOs are people almost abandoning the term security and running a podcast where security is literarally on the title, I’m of two minds but yeah, they are really talking about resiliency and then you start talking about resiliency that those measures are time to respond.

You know, up time or if it happens how quickly can we be back up, the cost of doing that. What’s your thinking? And even if you thought you’re the COO or the CEO of a company that has operation - OT systems, what would be the measure that you would say, “Hey this is our North Star, for us. This is what I want to make sure that we’re improving on.”

[0:47:08.3] EI: I think it starts with human beings first and the tangible measure would be, do I have a team of cross cutting skills that is there to do the changes that are necessary to build this layered defense.

That is the first thing that I would do is to cultivate that team. Second, I would provide the monetary resources and free up their time to go about and do this and the third thing is I would also help them educate the rest of the people as to what they did so it doesn’t look like a black box.

Because ultimately, the security of the company is dependent on every employee not just the small cabal of cyber people. Then when it comes to resilience, a lot of people don’t understand the concept of resilience. They use the word loosely but they distinguish it from redundancy and that’s not what resilience is. Having another one that can come on when the thing shuts off is not resilience because there are many traumatic events that can occur, whether it is weather related or an advance persistent threat that would take all of the duplicates out with the original.

So that doesn’t really create resilience. That helps with the reliability under normal circumstances to improve the availability of the application with the hot swap but when there is a persistent attack or a natural disaster that hits your geographical area, then the duplicates don’t matter. It’s much more important to create failure scenarios and then develop mitigations for those failure scenarios.

So that’s what I would do if I was the CEO or COO is to have these themes work with the company employees to do those contingency plans in the good times and that is how I would get the resilience and the security.

[0:49:05.5] AA: What would be your definition of resiliency?

[0:49:07.5] EI: My definition of resilience is that it’s the incremental ability to recover from a degraded state that you get by creating proper decision tree models of failure scenarios. So you think of a system, you think of all the potential ways it can fail whether from an active disaster, a human intervention or a system error and then you sit as a team of people, business cyber networks and you say, “How will I recover from this degraded state? How would I fix the parts that got hurt?”

And as you develop those decision tree models with this than anything else, you are building incremental resilience because when the bad time comes, you know exactly what to do in each eventuality. Short of that we don’t have resilience.

[0:49:59.7] AA: Yeah, I have seen some cool diagrams. I think out of [inaudible] that I’ve had that modeling done where, hey this is some that could be degraded 90% but we could still essentially get the job done, right? And then we have another pathway that we could also use that you’ve got and then you start to think about, okay how many different pathways is it possible to have?

And you still got – you never had full certainty and a 100% sort of availability. But it just gets much more, you start to think okay it’s becoming very, very challenging for either an adversary or a series of events to happen. Yeah, I know I think –

[0:50:42.0] EI: The most prominent example I would provide you of resilience is the human kidney. The kidney has over a million nephrons which are tiny capillary tubes through which the dirty blood flows through and gets purified and the waste goes into the urine duct and then the clean blood goes back into the body. There are millions of that nephrons that are in the kidney.

Now this is the resilience that even if 90% of those nephrons are blocked as a result of the condition called nephritis when those capillaries get blocked because of the inflammation that can occur from hypertension or diabetes or other methods, 90% of the kidney still functions and you can live without dialysis and not only that but there are certain types of kidney failures that are called acute kidney failure where the kidney has the ability to repair the damage and come back to its original level.

It’s only the chronic kidney failure that requires dialysis. Acute kidney failure requires dialysis for a short period of time until the kidney repairs. That is resilience.

[0:52:02.8] AA: That’s awesome, unfortunately I think based on time we’ll just have to end there but I think that is a great image to end up. We’ll just start designing our systems like kidneys, right?

[0:52:12.2] EI: That’s right.

[0:52:14.4] AA: Erfan, thank you so much for the time and for your thoughts. I really appreciate it. You know one of my most thoughtful interviews that I have done in a long time and I think people will really enjoy hearing it. Thanks so much.

[END]

 

Editor