“Where do we store our data?”
It’s a question that comes up constantly in the tech world and serves as a fundamental pillar of the software infrastructure of organizations everywhere. The way that data is stored, the speed with which it can be accessed, and the way with which it is protected are all critical for businesses to function properly. Data lakes are the newest means of storing data, but how does it work? Are they right for you? And, is there a way they can actually help you with software asset management?
At MetrixData 360, we have found that data lakes can be particularly useful in our efforts to wrangle our client’s software environment into order.
What is a Data Lake?
For a long time, data was stored in warehouses, a familiar term for the veterans of the tech industry. While it may be easy to assume that a data lake is just the newer version of the data warehouse, there are a few distinct differences.
A data lake can be thought of similarly to real-life lake water. The data gathered in a data lake might stream in from a variety of sources, accepting and retaining data from all sources, types, and schemas.
This data may not be formatted in any way, with no hierarchy or organization to speak of, instead it is in its rawest form, neither processed or analysed. This allows businesses to apply the scheme and organization model that best fits the nature of the data that data lake houses, often proving both its greatest strength and its biggest drawback.
How a Data Lake Works
The sheer volume of data that companies have can be overwhelming and traditional data management isn’t equipped to handle big data or its analysis. Data warehouses are designed much like their namesake, with rows and columns of organized data. While this provides limitations like lack of flexibility and the requirement to standardize all data that is stored in these warehouses, they do allow for quicker operations.
In contrast, data lakes are commonly built using either Hadoop or through the use of Infrastructure as a Service (IaaS). Both AWS and Azure offer data lake architectures to store and analyze current data.
Data lakes provide their structure through one or two methods:
- Metadata stores
- Self-describing data formats
Why is a Data Lake Needed?
If you’re working on a big data project, you’ll need to know what data you’ll need to reach your desired outcome and you’ll also need to get your hands on the right data to analyze and leverage to better achieve those outcomes.
One of the major benefits of data lakes is their ability to provide cheap scalability, allowing you to keep large quantities of data for a good price. Data lakes can allow you to draw from a variety of data, and in turn store any data you wish on the data lake and create unlimited ways to query in search for data, providing you with tremendous flexibility. The use of data lakes can help break down data silos and allow for a unified view of data across your organization.
Data Lakes and Software Asset Management
Data lakes can prove quite useful in assisting efforts surrounding software asset management if executed correctly. Auditors don’t typically look to data lakes when conducting a software audit at the moment. We suspect they will become a target of scrutiny in the future, as their use becomes more sophisticated and wide-spread.
A good example of when these data lakes might be accessed is in regards to hybrid use benefits in Azure, which allows you to bring your own on-prem licenses to Azure at a discounted rate.
Technically speaking, there would be nothing stopping your company from using hybrid benefits to deploy software on Azure while still having it installed on-prem, and Microsoft is currently turning a blind eye to this loophole, trusting that you’ll respect your arrangement.
However, with the rough year 2020 has proven for everyone (except Zoom), Microsoft may be eager to make up losses by conducting audits in new areas. This is why it is also best to understand how the data stored in these data lakes can be used to your advantage.
Data lakes can help you hunt down Shadow IT and they can prove to be an excellent resource in the event of an audit. This is because they can often provide missing data such as VM guest to host relationships, processors, cores, perhaps even unique details about your software environment.
They can also be used as a means of tracking cloud usage and cross-referencing other data sources. In order to use your data lakes to serve your SAM goals, you’ll need to make sure the data is accurate and complete, you can do so by considering the following:
-
Cross Reference Data Lake Resources with that of Your Active Directory:
Your AD is one of the first areas that will be consulted in the event of a software audit. Within a company, the AD is often disorganized and is far from an accurate picture of your whole environment. Which is what our Active Directory Reporting Tool is for, allowing you to create an easy to understand chart of assets within your AD. For more information you can check out our AD reporting tool, here. -
Compare with Your Inventory Tools such as ServiceNow CMDB:
SAM tools often provide the basis of a company’s software asset management solutions and are often used during a software audit, they fail to provide an accurate picture of your data. For this reason, it is important that you verify their accuracy.
Related: What are SAM Tools?
Get Your Data Lakes to Serve Your Goals
Data lakes are an excellent way to keep your resources stored and organized. It’s important to be aware of how data lakes can potentially help your organization remain in compliance with software vendors and keep your software infrastructure organized by assisting in software asset management.
Software Asset Management in and of itself is a great way to allow your organization to discover an estimated 20% to 30% of savings pulled out of their current IT spend through simply cutting out unnecessary licensing and making sure that costly fines and unbudgeted spending are avoided.
If you’d like to learn more about how Software Asset Management can benefit your company, you can check out Software Asset Management for Beginners.