Tuesday, September 27, 2016

Is there such a thing as too much unit testing?

Testing, Unit Tests, TDD (Test Driven Development) are one of the first thing that we learn when we are in university and study computer science. A part of us might be lucky enough to have a specific course for TDD.

Code is written by people for machines that needs to resolve problems. People do mistakes and this is why it is so important to test our code. I didn’t have the opportunity to see until now a working application, without bug or issues from the beginning.

I would say that a complex application that is not covered by unit tests and without a testing process will end up in the trash. At the beginning of each project there is a classical discussion related to unit tests and code coverage.

- Do we need unit tests? 
- Yes.

- What is the code coverage target?
- 80%, 60%, 20%, 100%....

It is pretty clear that we need unit test. An engineer needs to be able to test his code and check if what we develop is working as expect or not. The response for the first question is pretty clear. But when we jump to the next question – the discussions can take hours and hours.
The development team will try to push the value as high as possible. In contrast, management might push back. Constraints like time, budget, complexity or quality level might impact the decision.
In situations like this, both parts might have good arguments to sustain their point of view. We are in a gray zone, where people can easily fall in a defense mode. I was very often in situations like this and unfortunately, managers will win. They are the one that takes the final decision.

What we shall do in this situations?

The first thing that we need to do is to map all the risks that might appear if there are no tests or if the code coverage is too low. Once we done this, we need to stay with all the team and identify how this cases could be mitigated.
Mitigation for this kind of problem could be strange, but acceptable at project level. Solution like increase the number of testers, bugs will be solved in production or accept quality level to go down are strange solutions for technical people, but this can be acceptable at project level.
The most important thing is to create the risk map. In this way people that can take a decision can have the whole picture in front of them.

The second thing that we need to do is to try to identify what are the components where complexity is high and the risk to have issues are imminent. Once we identify them we can requests that at least these components to be covered by tests.
In this way we can ensure that the most complex part of the system will be testeed and the development team can write good and working code from the beginning.

The 3rd things that can be done is to try to focus to write tests that covers custom logic (business logic). It is more likely to have an issues on custom code that calculates the discount level than in the one that save the results in a database or make a remote request. Of course also at persistence level we might have a bug, but because this layer is used by many other components also, the risk of having issues there without detecting it at development level is much lower.
When the available time for writing tests is not as much as we want we shall focus to cover the most important stuff with tests. Cover the part of the systems where you know that you will have issue.

Conclusion
In the end, there is trade-off. Getting enough time to have 100% coverage is not often. Even with 100% code coverage, you will still have bugs in production.
Don’t forget that we write code that runs on machines, but needs to be read by people.

Is there such a thing as too much unit testing?
No, but taking into account time, scope and quality, the quantity of unit tests can be impacted, even if there is a direct impact on this triangle (time, scope, quality).

Friday, September 23, 2016

Storing IoT data - Another perspective

The current trend in IT industry, especially in IoT world is to store all the data that is produced by all the devices or systems that are connected to a network. The storage is so cheap, that companies prefers to store and archive all this data, without caring if the data can be used now or in the future.
The main scope of this approach is to have all the information that might be needed in the future, if a system like Machine Learning or Hadoop is used. You never know what parameter or logs might become relevant in the future for an insight or a trend.

Things are becoming interesting in the moment when your devices contains more than100 or 200 sensors per device, with a sample rate of 1s or 0.1s.
Let’s take as example a bottle manufacture, that has plants all around the globe. Each plant has 800-1000 devices connected that runs 24/7.
The quantity of data that is produced every day can reach easily 0.2-0.5TB. Each day new data will arrive to the platform in the warehouse. Collected data might be processed and stored for later use. But, not all the time all the data is processed. You might want to keep data for later use for systems like Machine Learning.

The thing that should pop up in our mind is:
Do we need to store all the data that is produced by a plant?

At machine and gateway level, it might be very important to collect metrics at 0.01s or 0.05s time interval. But at plant level, this can be irrelevant. At plant level 0.5s or 1s time interval can be more than we need. This means that we already reduce the size of the data that is produced by a device from a plant with a factor of 5-10x.
When we look at global level, the information that are produced by devices are not relevant at second level. Not only this, but not all metrics are useful outside the plant. This means that from 100 or 200 counters we can end up with only half of them that are relevant, at global level.
This means that the data that we store at global level decrease drastically, but in the same time, we have all the relevant information that we might use in the future.

This approach doesn’t mean that we need at plant level complex processing system. On-premises or cloud, the system will look the same. It is our decision when and where we want to store data - at gateway, plant or global level. We could even have the gateway in cloud.

As an example we could have all the information produced by devices at gateway level stored for 7 days at device level. All the information that is older than 7 days is deleted automatically. The gateway can have a virtual plant in the cloud, where all data are stored for 1 year. At this two level the time interval of collected metrics is very low (0.1s and 1s). All this information is moved in the global repository that stores only a part of the counters at 1s time interval. This data can be later used for analytics and prediction.

As we can see we could have different approaches. Not all the time there is a need to store all the data that are produced by the devices forever. We can filter or decide what is the sample rate for each counter.


 

Friday, August 19, 2016

[Post Event] ITCamp Summer Community Event, August 18 - Cluj-Napoca

In August 18, 2016 we had IT Camp Summer Community Event in Cluj-Napoca. There were more than 55 people that attended to this event. In comparison with past events, we dedicated more time for networking between the sessions and at the end of the event.
There were two session in the afternnon. The first one was about .NET Security (including .NET Core) and the seccond one was about how we can write actor based concurrency using Elixir. Personally I discover a new word that Elixir has and is available for more than 30 years in Erland world. Seeing a system that has availability with 8 nines (99.99999) is something impressive, but thinking that such a system was availalbe 25 years ago is more impressive.
As usualy a big THANK YOU to all attendies and people that were involved behind the scene. And another THANK YOU to our sponsors that made this event possible:

Evozone
Recognos
You can find below the abstract to sessions and the slides from each of them. At the end of the post you can find pictures from the event.

See you next time!

.NET Security (Radu Vunvulea)
Abstract: When is the last time when you pushed an update to .NET framework in a production environment? Did you define an updated process of .NET framework for production environments? Is the support team aware about what can happen if a security update is not pushed to the machines? 
In this session we will take a look on some security problems that .NET framework has and what can happen if we don't take security into account. We will take a look on different versions of .NET (including .NET Core). Best practives related to this topic will be covered and debated.
Slides:

Actor based concurrency with Elixir
Abstract: Because of increased demand for interconnected systems(IOT, SAAS) we need new improved ways of handling reliable (soft)real-time systems. One battle-tested approach is based on the Actor Model pioneered by Erlang and brought to the masses by Elixir.
Topics that will be covered:
- short introduction to functional programming principles with examples in Elixir
- actor model concurrency: why is it needed, advantages & disadvantages
- OTP patterns for handling agents: GenServer & Supervisor

Tuesday, August 9, 2016

Azure CDN – Solutions for things that are not available (yet)

Last week I had some interesting discussions around payload delivery and CDNs. I realize how easily people can misunderstand some features or functionality, thinking that it is normal or making the wrong assumption.
In this context, I will write 3 posts about Azure CDNs, focusing on what is available, what we can do and what we cannot do.


Let’s see what we can when we need to do when we need the Azure CDN features that are missing in this moment in time. In the last post we talked about this features, we will not focus on explaining them – for this please see the previous post.

SAS Support for Blob Storage
If you need on top of Azure Store, a CDN system that has support for SAS, then you need to be inventive. The simplest solution is to replicate the content in multiple Azure Storage Accounts or even in the same Storage Account (if the problem is download speed) and create a web endpoint (that can be hosted as a web app), that will provide the locations from where the content can be downloaded.
This web endpoint, can have a Round Robin mechanism (the simplest thing) to serve different locations.

Using this approach, you will also need a mechanism that replicates the content in multiple locations. AzCopy might be a good out of the box solution.

HTTPS for Custom DNS Domains
The same store with Azure CDN is with Azure Storage. In this moment we don’t have yet support for HTTPS and Custom DNS Domains. In this case if you really need HTTPS, then the only solution is to serve the content from an Azure Web App. There not to many options for now.

HTTPS Client Certificates
The story is similar with the previous one. We don’t have to many options. The best way for now is to use an Azure Web App and serve your binary content from there.

HTTP/2
None. As above an Azure Web App.

Fallback to custom URI
For this situations, we have a simple solution. We can go with the approach presented in the SAS Support example. The additional thing that we need to do is to have multiple Azure Web Apps that can server the content location and add an Azure Traffic Manager in front of them.


Access logs of CDNs are raw data
If you really need this logs, then you need to use Azure Blob Storage. The above approach and activate logs in each Azure Storage Account.

As we can see, there are so many features support by Azure CDN. Of course we will not be able to find a perfect solution that supports all the things that we need. The same is happening with Azure CDN. But what is important to remember that a part of the features that we are missing are already marked as In Progress in the feedback portal of Azure. This means that soon, we will have support for them also.

Monday, August 8, 2016

ITCamp de vară în Cluj-Napoca, 18 August

For Romanian colleagues.
În luna August, ITCamp organizează un nou eveniment pentru profesioniștii IT din Cluj-Napoca. Evenimentul urmează să aibă loc pe data de 18 August, în clărirea Evozon.
Participarea la eveniment este gratuită. Mulțumim sponsorilor pentru sustinere (Evozon șiRecognos).
Program:
17:45 - 18:00 - Registration (coffee and beverages)
18:00 - 19:00 - .NET Security (Radu Vunvulea)
19:00 - 19:30 - Break (coffee and beverages)
19:30 - 20:30 - Actor based concurrency with Elixir (Adrian Magdas)
20:30 - 21:00 - Networking (coffee and beverages)

Descrierea sesiunilor:
.NET Security (Radu Vunvulea)
When is the last time when you pushed an update to .NET framework in a production environment? Did you defined an updated process of .NET framework for production environments? Is the support team aware about what can happen if a security update is not pushed to the machines?
In this session we will take a look on some security problems that .NET framework has and what can happen if we don't take security into account. We will take a look on different versions of .NET (including .NET Core). Best practives related to this topic will be covered and debated.
Actor based concurrency with Elixir
Because of increased demand for interconnected systems(IOT, SAAS) we need new improved ways of handling reliable (soft)real-time systems. One battle-tested approach is based on the Actor Model pioneered by Erlang and brought to the masses by Elixir.
Topics that will be covered:
- short introduction to functional programming principles with examples in Elixir
- actor model concurrency: why is it needed, advantages & disadvantages
- OTP patterns for handling agents: GenServer & Supervisor
Sponsori:
Evozone
Recognos

Sunday, August 7, 2016

Azure CDN – Things that are not available (yet)

Last week I had some interesting discussions around payload delivery and CDNs. I realize how easily people can misunderstand some features or functionality, thinking that it is normal or making the wrong assumption.
In this context, I will write 3 posts about Azure CDNs, focusing on what is available, what we can do and what we cannot do.


I was involved in many discussions when people assumed that different features are available on Azure CDN and were ready to change the current architecture based on that wrong assumptions. Let’s take a look on some features or functionality that we don’t have on Azure CDNs, but many times we are assuming that we have them.

SAS Support for Blob Storage 
The most common functionality that people think that is available on Azure CDN is Shared Access Signature for Blob Storage. Shared Access Signature (SAS) is one of the most used and powerful feature of Blob Storage. On Azure CDN we have the ability to cache a blob storage using the full
URL including the URL.
Maybe because of this, people have the impression that Azure CDN will take into consideration the SAS validity and once the SAS will expire it will also invalidate the cache.
The reality that in this moment Azure CDN treats an URL to an Azure Blob that has a SAS like just another URL. If the content can be accessed, then it will copy the payload and replicate to the CDNs. The content will be removed from the CDN nodes only in the moment when the TTL value on CDN expires, that doesn’t has any connection with SAS.



HTTPS for Custom DNS Domains
Even if Azure CDN has support for custom DNS and also has support for HTTPS, it doesn’t mean that you can use HTTPS using your own domain name. Sounds strange, but sometimes can be confusing when you have two features supported, but the combination between them is not supported.
This means that if you need HTTPS than you will need to use Azure CDN DNS. The good news is that this is a feature that on Azure Voice is most voted for CDNs and very soon we will have support for it.

HTTPS Client Certificates
When you are using Azure CDNs it is important to remember that even if there is support for HTTPS, there is no support in this moment for client certificates. You are allowed to use only SSL certificates provided by the CDN.
This is why we don’t have support yet for client certificates on custom DNS domains for Azure CDNs, but things will be much better very soon.

HTTP/2
The new protocol of the internet, if we can call it in this way is not yet supported. In general, this is not a blocker, except if you have a complex web application where you want to minimize the number of requests that are done by the client browser.
For this situations, working with Azure CDN might be not so simple, but there are work arounds.

Fallback to custom URI
Some CDNs providers, give us the possibility to specify an URI that can be used as a fallback when the content is no found in the original location. It is useful when you are working with an application where availability time is critical and you need to be able to provide a valid location for all content.
Of course, with a little of imagination you can resolve this problem pretty simple. Putting between Azure CDN and your content Azure Traffic, that would allow you to specify a fallback list of URIs.

Access logs of CDNs are raw data
Many people are requesting that they want to have access to raw data of logs, similar with the one from Azure Storage. This might be an useful feature, but in the same time I have some concerns related to it.
The main scope of an CDN is to cache and serve content that is requested often from a location that is near as possible to the client. It is used for cased when the RPS (requests per second) is very high. Generating a file with raw data, could be expensive and to end up with logs that are big. Processing such files might be expensive.

It is important to remember that a part of this features are already planned by Azure team and near future we might find them in Azure CDN. Also, don’t forget that an out-of-the-box solution is hard to find and with a little of imagination we can find a solution for it.

Wednesday, August 3, 2016

Azure CDN – Available features and functionality

Last week I had some interesting discussions around payload delivery and CDNs. I realize how easily people can misunderstand some features or functionality, thinking that it is normal or making the wrong assumption.
In this context, I will write 3 posts about Azure CDNs, focusing on what is available, what we can do and what we cannot do.

Let’s start with the first topic, and talk about features and functionality that is available now (Q3, 2016) on Azure CDNs.  You will find a lot of useful information on Azure page and I’m pretty sure that you already check it.
In comparison with the last few years, there was a big step that was done by Microsoft. They signed partnerships with Akamai and Verizon (both of them are one of the biggest CDNs providers in the world).

This was a smart choice, because developing and construction your own CDNs is not only expensive, but there is a lot of effort that is required. There are some things that doesn’t make sense to construct by yourself if you don’t need something additionally.
A nice thing here is that you don’t need a dedicated contract with each provider. You only need your Azure Subscription and no custom contract or invoice from Akamai or Verizon.

In general, the CDN features that are provided by them are similar with some small exceptions, that (not) might impact on your business.  Based on the features list, we could say that Akamai is the most basic one and Verizon has some additional features, but both of them are offering the base functionality that is required from a CDN.

The things that are available only on Azure CDN from Verizon are:
      1. Reports related to Bandwidth, Cache status, HTTP error codes and so on.
      2. Restrict access to the content, based on the country/area. Verizon allows you to specify a list of countries (DE, UK) or areas (EU, AP – Asia/Pacific) that can(not) access that content. The filtering is done based on the client IP and is per directory. It is good to remember that this rule is applicable recursive at folder level, there is no support for wildcards and you can specify one rule per folder that can contains multiple countries (areas). Being a recursive rule, if you define a rule one level down, only the rule defined on the sub-folder will apply – simple rule engine.
      3. You can pre-load content in cache. Normally, content is loaded in cache only when it is requested for the first time. Azure CDN from Verizon allows us to specify content that we want to pre-load. Don’t forget that you need to specify exactly the file that you want to pre-load and you can pre-load maximum 10 files per minutes (per profile).  There is a kind of support for regular expression, but each result from it needs to be a relative path to a file.

Now, that let’s see what are the features that are supported by Azure CDN from Akamai and Azure CDN from Verizon:
      1. HTTP support
      2. DDOS protection
      3. Dual Stack support (IPv4 and IPv6)
      4. Query String Cache – how content is cached, when the path is a query and not static. There are 3 options available that allows us to cache each unique query, ignore query string and not cache URLs that contain query strings. The last option is very useful.
      5. Fast purge – Allowing us to clean the cache, even if the TTL (Time To Live) of content didn’t expired yet. The purge can be done from Azure Portal, PowerShell or by REST API.
      6. REST API for CDN configuration (as for all other Azure Services)
      7. Custom domain name support
      8. Caching content from Azure Storage. Access to Azure Storage is done using REST API. In this way we can cache blobs and even Azure Tables (as long as we know that it is a static content).

Azure CDN from Verizon is available in two tires, Standard one and Premium. Of course, except the price there are features that are available only on Premium like:
      1. Real-time CDN Status – Is useful if you need real time data related to bandwidth, number of connection, errors codes and cache status. For CDN, normally, it is nice to have, but you do your job without this kind of information
      2. Advance Reports – Detailed geographical reports related to traffic and different views per day, hour, file, file type, directory and so on.
      3. Customizable HTTP behavior based on your own rule

As we can see, there are a lot of features available for Azure CDNs. For normal use cases, you can use without a problem.