Sunday, February 19, 2017

Azure Price estimation - what you need to take into account

Calculating the cost consumption for a solution that runs in cloud can be sometimes challenging. First of all you need do a forecast of how many resources are required and secondly you need to put on the paper all things that are billable by the cloud provider.

Azure Price Calculator
The pricing calculator offered by Microsoft for Azure is a nice tool, that helps you to take into account all the items. It is able to persist the cost even if you close the tab and when you are done you can export it to an excel file. For basic estimations is good, but I expect that you'll end up pretty fast with a complex excel.


Excel Calculator
In the last few years, I was involved at least 2 time per month in Azure consumption estimations for different projects - from the small one that consume a few hundreds of euros per month to big ones where you end up with a bill that has six or seven figures per year. 
You'll find different templates on the internet that you can use and don't be surprise if you end up creating your own template.

Tips and Tricks
I have a list of items that I take into account when I do price estimations for Azure. It doesn't covers everything, some things I ignore and other I consider them by default added them to the cost estimations. It's only a checklist for me, that I decided to share with others.

Things to consider
General
  1. Special support from Azure (premier support)
  2. In general computation is the main driver cost, but don't be surprised is traffic will also be around 20-30% of your cost, especially in IoT era
  3. Development, integration, testing environment also cost money, deallocate them when you don't use them
  4. Development, integration, testing environment don't need to run at full capacity
  5.  Select the right tire - to big is just waste of money
  6. Change the tier based on your needs (not only VMs, also SQL tier)
Computation
  1. VMs disk are not included in VM price and needs to be included in estimations. You pay only what you use
  2. Outbound traffic from VM
  3. The minimal number of VMs of computation units is 2, never 1
  4. Scale down when you don't need all the power

Storage
  1. Outbound traffic for Azure Storage (blobs, Azure Table)
  2. Outbound traffic when you do a sync between two different Azure Regions (storage, DB, ...)
  3. Storage cost for backups
  4. Cost of transactions, especially on Azure Tables
  5. Size of the logs
  6. Use CDNs as much as possible
  7. Non-critical use cases can implement a back-off mechanism for cost optimization (reduce number of transactions)
DB
  1. Storage cost of SQL Azure backup
Messaging
  1. Outbound traffic for Service Bus, Event Hub and other messaging system
  2. Number of events price for Event Hub
  3. Number of messages for Service Bus
Azure AD
  1. Number of users 
  2. Multi-factor authentication
  3. Number of applications
Service Fabric
  1. Minimum number of VMs in cluster (is not 2)

Saturday, January 28, 2017

Azure Regions products and services availability

Cloud is evolving. Nowadays we can say that cloud infrastructure is mature enough to support most of the business scenarios on the market.

Evolution
This evolution changed also how we see cloud, where Data Centers (Regions) are available and who is the owner of them. Microsoft is the best example for this. In this moment Microsoft has 3 types of Azure Regions:
  • Hosted and managed by Microsoft - public available for all of us
  • Azure Government Regions - exclusively available only for government (solution providers and customers that work with government)
  • Azure "Private" Regions - special Azure Regions where Azure infrastructure runs, but controlled and owned by a data trustee like China or Germany

New times, new problems.
In this moment in time, Microsoft have 32 Azure Regions and 6 additional already announced (coming soon). Each of Azure Region has his own update and delivery timeline. It means that when a new service is launched it will not be available in all 32 Azure Regions.
For regions managed by Microsoft, the updates and delivery of new services arrives faster, but there are other regions like China or Germany the last updates or new services might not be deliver in the same timelines.
A good example is Azure Search that is available in North Europe and West Europe regions but is not available in Germany or UK regions. This means that we need to be aware of what kind of services we are using and to ensure that they are available in the regions where we need them.

Product available per region
When we design our system we need to know exactly what product (service) is available in each region. Microsoft made a lot of improvements in this area. They created a dashboard that allow us to check what service is available in each location - https://azure.microsoft.com/en-us/regions/services/.
Another useful dashboard, that can help you a lot during the presales is https://azure.microsoft.com/en-us/regions/. All the Azure Regions are marked on the map. Additional to this you can find also regions that will be available in the future.

Wait, one more thing
Even if a product is available in a specific region this doesn't mean that the last version is available. A few months ago I encountered some problems with a solution that was deployed in Azure Regions from Europe, USA and Asia. The client extended his business to China where we realised that the Azure Storage version is not the same. Happily, in our case the problem was solved with some updated in code. Temporary we have a branch of our solution for China.
This problems usually occurs for Regions that are not controlled by Microsoft directly and a partner of government is the owner of the data centers. For regions controlled by Microsoft, you will usually find the last version of the product in a short period of time.  

Conclusion
Keep a eye all the time on what Azure products are available in each Azure Region and take into account that new features might not be available immediately in all regions. 

Monday, January 23, 2017

In-memory data protection (encryption)

In this post we'll talk about memory encryption and why we need to care about it.

Context
With EU regulations, IT solutions needs to secure data content about citizens End to End (E2E). This means that we need to offer not only a secure storage, a secure connection between different nodes, encryption at payload level, but also we need to offer a secure mechanism to store sensitive information in memory.

Why
You might say that this is not important if the servers where data is processed is secured. Yes in an ideal world. There is no bug free software.
This means that M&S (Maintenance and Support) team that has direct access to that server might need to do a memory dump. The memory dump shall be encrypted, shared only with a limited number of people and many more. They could even share the memory dump with external 3rd parties that offer support for other systems or software that are used on that machine.
For this we need to ensure that data privacy is respected.

If the server is not configured properly, the system might create automatically a memory dump when an error occurred and upload it to different providers. On top of this, M&S teams usually prefer to configure the production system in a such a way that a memory dump is created automatically when a crush is detected.

How to encrypt data in memory
There are different libraries that can help s to achieve this goal, one of them is coming with .NET library - Memory Protection. If give us the possibility to encrypt an array of bytes that is stored in memory.
ProtectedMemory.Protect(toEncrypt, MemoryProtectionScope.SameProcess);
ProtectedMemory.Unprotect(toEncrypt, MemoryProtectionScope.SameProcess);
As we can see, the encryption is very easy to use. The output of encrypted content is an array that is stored in memory. This array will be encrypted/decrypted. Having an array, it's allowing us to protect any kind of data.

When you are working with encryption, we don't need to forget about block size. As other encryption mechanism this library is using block size of 16 bytes for encryption. It means that if we encrypt a string or other type of data we need to ensure that the byte array block size is multiple of 16.
Unfortunately there is no out of the box support for this.
Below you can find also an extension method for string that add padding with default character. Of course this is not perfect solution, a padding standard like PKCS7 shall be used. When content is unencrypted, the default character used for padding is removed.
public static class StringPaddingExtensions
{
    public static byte[] ToByteArrayWithPadding(this String str)
    {
        const int BlockingSize = 16;
        int byteLength = ((str.Length / BlockingSize) + 1) * BlockingSize;
        byte[] toEncrypt = new byte[byteLength];
        ASCII.GetBytes(str).CopyTo(toEncrypt, 0);
        return toEncrypt;
    }

    public static string RemovePadding(this String str)
    {
        char paddingChar = '\0';
        int indexOfFirstPadding = str.IndexOf(paddingChar);
        string cleanString = str.Remove(indexOfFirstPadding);
        return cleanString;
    }
}
 string contentToEncrypt = "Hello Word!";

 byte[] toEncrypt = contentToEncrypt.ToByteArrayWithPadding();
 ProtectedMemory.Protect(toEncrypt, MemoryProtectionScope.SameProcess);

 ProtectedMemory.Unprotect(toEncrypt, MemoryProtectionScope.SameProcess);
 string decryptedContent = ASCII.GetString(toEncrypt).RemovePadding();
I added on GitHub a full sample, with a unit tests that shows how to encrypt/decrypt content: https://github.com/vunvulear/Stuff/blob/master/MemoryEncryption/MemoryEncryption.cs

It is not bulletproof
Even if the content is encrypted while it is in memory, this doesn't means that we can do anything with it. There are moment during transition and processing when content will be in clear text. In this moment, a memory dump could catch easily data in clear.
Protected memory encrypts data that is stored in memory, but it doesn't allow us to process it. To be able to process it or do any kind of transformation we will need it in clear text.
The below diagrams shows when data is in a unprotected state.
As we can see, Protected Memory offer us protection only at one step. For the other steps we need to ensure that content is protected using other mechanism. With RED are marked steps when data is in memory and a memory dump could catch data in clear text.
In top of this, we need to ensure that GC clears that memory locations where data is stored in clear text.

Why there is all time a risk
Unfortunately, most of the times we need to process data that is often strings. Strings are one of the weakest data types, especially in .NET and Java. Using pinning you could have data that is saved all the time, but this is more common in C++ then .NET. Wring in this way for .NET will be a challenge and testing a nightmare.

Conclusion
Data protection is not only a complex discussion, but also cannot be bulletproof. Because of this we need to find a balance between the level of protection we need and how much we want to invest in it. In the end it is a risk mitigation.

Thursday, January 12, 2017

Microservices and Team Setup - Is not only about code

Solutions that are based on microservices architecture become a commodity. Teams are not afraid anymore when they found out that the next projects is based on micro-service architecture.
The transition from messaging base architecture to microservices architectures was very useful for team to understand the main concept and how a solution should look like. 

Trigger
Working with different frameworks or tools specific to microservices will make the team curious and happy at the beginning. Most of the people are happy when they have team allocated to learn new stuff. But after this phase, I identify that there is usually a collapse. In the first moment when we are ready to start 'production' code, people starts to have hiccups and each of them have problems when they need to make a deploy, run their code in containers and so on.

Cause
It is easily to forgot or to ignore that microservices is not only at service/code level. It is also at infrastructure level. There is a lot of scripting and automation that needs to be done to be have a success story. 
Even testing your own code is not so easy anymore. You microservice will need to run in a sandbox or you need also the services around you own service to be able to do the integration tests. 

Effect
Most of the team members are fighting with deployments, where the service should run, how to setup environments, trying to write some deployments scrips and so on. There focus is not anymore on writing the business logic that shall run in the services themselves, but on the infrastructure part. They happens usually in a disorganized way, each person is looking for the same stuff, looking in the right place and many more. 

Key People
In such teams, there is a high need of DevOps and people that need to think and prepare environment and how deployments will work before the development team start they job. 

Solution
From the 0 day of the projects, you shall be aware that configuration and infrastructure part of such solution will be different from the other type of applications. You'll need people allocated on this part, that should support the team when they had problems. 
The need of a scrips that allow development team to create the environment that they need is high and can be a time-saver.

Conclusion
In any projects, you shall never forget or neglect this kind of things. 
Sharing is everything, but also specialization is important. Each member shall focus on the thing that they are the champions and you need champions for each item.


Saturday, January 7, 2017

[IoT Home Project] Part 7 - Read/Write data to device twin

In this post we will discover how to:
  • Tool that can be used to read reported properties from device twin and set desired properties
  • Read how often sensor data are reported to backend from desired properties
  • Write how often sensor data are reported by device using reported properties
Previous post: [IoT Home Project] Part 6 - Stream Analytics and Power BI
GitHub source code: https://github.com/vunvulear/IoTHomeProject 

What is device twin
It’s a collection of properties that specific for each individual device. Device twins contains 3 types of structure (where the last two are similar, but have different roles)
  • Tags: Device specific information that resides on the Azure IoT Hub and never reach the device 
  • Desire Properties: Properties set on Azure IoT Hub (by backend) and are delivered to the device 
  • Reported Properties: Properties set by Device and are delivered to Azure IoT Hub (to backend)
You might find odd that there are 2 types of properties/structures in device twin – tags and desired/reported properties. From the usability point of view, this is useful and handy. Tags are used when you want to store information related to a device on the backend without sending it to the device.
A good example is customer ID or device location. There is no need to send this data to device if the device doesn’t need this data. In contrast, desired/reported properties are all the time send/received from device. Properties like device configuration, current status or different counters can be found inside them.
More about device twin

Tool that can be used to read reported properties from device twin and set desired properties
We could create such a tool, but it doesn't make sense. Also, the Azure IoT Hub team made a great job and created a desktop app that can be used to read, write device twins, see the list of devices, add new devices, sends commands and many more. The tool can be downloaded from here.
There is also a version for Node.JS - https://github.com/Azure/iothub-explorer 

To be able to use this tool, you'll need to specify the connect string to IoT Hub in the first tab. Once you done this you can navigate to the 'Management' tab where all the devices that are registered are listed. Select one of the devices and click on 'Twin Pros'. 
The new window displays all the information from device twin. Now, to set a new value of 'sensorDataTimeSampleInSec' will need to add it to the right panel and click on Send (as below).
Done, we set a new desired properties, that can be received by device.

Read how often sensor data are reported to backend from desired properties of device twin
The support from Azure IoT Hub is great from this perspective. The API is well documented, full with examples. On top of this there are SDKs for the most important programming languages. 
But for now, if doesn't make sense to develop such an application. 

Until now we used HTTP as communication protocol between device and Azure IoT Hub. To be able to use device twins, we need to change the protocol to MQTT. This change will not affect the current code. the only thing that we need to change is the protocol in the DeviceComunication.js.
var Protocol = require('azure-iot-device-mqtt').Mqtt; 
We are done, we now use MQTT.

The 'getTwin' method of Device.Client retrieves our device twin from the backend, including the desired properties.  
client.getTwin((err, twin) => {
...
var sensorDataTimeSampleInSec = twin.properties.desired.sensorDataTimeSampleInSec;
...
}
In the previous step we specified in the desired properties of device twin a property called 'sensorDataTimeSampleInSec'. Once we get this information we need to update the Config object with the new value and update the configuration file.
The update of the configuration object is simple. Once we update it, the most simple thing that we can do is to write all the configuration back to config file. For this we will use a module called 'jsonfile' that can be used with success when you need to read/write objects to files in JSON format.
// Update configuration 
Config.sensorDataTimeSampleInSec = sensorDataTimeSampleInSec;

// Update configuration file.
Jsonfile.spaces = 4
Jsonfile.writeFileSync('./config.json', Config);
Once we done this we need to stop the timer that was set in 'collectSensorData' and start a new one with the new time interval. To be able to do this we need to store the object that is retried by 'setInterval' method and call clearInterval in the moment when we wast to stop it.
collectSensorInterval = setInterval((grovePiSensors, deviceCommunication) => {
...
// Device Twin updated
clearInterval(collectSensorInterval);
var grovePiSensors = new GrovePiSensors(Config.grovePiConfig);
collectSensorData(grovePiSensors, deviceCommunication);
The thing that I don't like is related to collectSensorInterval object that was set as global var in app.js. In this moment I didn't had time to find a better way how to persist this value and to be able to access it from the callback. For now it works.

And we are done. We can now change the time interval from backend and automatically the device will update himself and change the sensor data time sample.
The code that was impacted by this changes is app.js and DeviceCommunication.js

Write how often sensor data are reported by device using reported properties
At this step we want to update the reported properties of device twin. In this way we can know what is the current configuration of our device and how often the the sensor data is reported. This action shall be called each time when the value of 'sensorDataTimeSampleInSec' is updated.
To update reported properties of device twin we need to create an object with the values that we want to report. The good thing is that we need to specify only the deltas. The one that we don't want to update will not be removed from the reported properties.
As we can see in the below image, each properties has a value that specifies when was updated.

This can be done as follow:
function deviceTwinReportUpdated(sensorDataTimeSampleInSec, twin) {
  var patch = {
    sensorDataTimeSampleInSec: sensorDataTimeSampleInSec
  };
  twin.properties.reported.update(patch, function (err) {
    if (err && this.debug) {
      console.log('Error reporting properties: ' + err);
    } else {
      console.log('Device Twin report completed: ' + JSON.stringify(patch));
    }
  });
}
We just set the new value and call update method of the reported property, where we specify the patch that we want to update.

Remember

  • Tags can be set, read and accessed only by backend 
  • Reported Properties are set by device can be read by backend 
  • Desired Properties are set by backend and can be read by backend 
  • Use version and lastUpdated properties to detect updates when necessary

Next Step 
We will setup our node.js application in such a way that it will start automatically each time when we start our device.

Friday, January 6, 2017

Azure IoT Hub and message priority

Azure IoT is evolving. As I sad last year, each week that pass new features are added to Azure IoT Hub that consolidate the service and help us to have a better connection with our devices.

I often seen requirements that specifies that there are alerts that needs to be send to/from device with higher priority than the rest of the messages. There is no perfect workaround and each use case might required a different approach. We will also discover that a new functionality offered by Azure IoT Hub covers one of the flows.
There are two main flows where priority messages may be required:
  • Device To Backend Communication (Device2Cloud -D2C)
  • Backend To Device Communication (Cloud2Device - C2D)
We would discuss each flow separately. At the end end of the post we will map the workarounds and the current solutions on the flows.

Message priority on Device To Backend Communication
The use case is not complex. We need to be able to send messages to device to our backend with different priorities. For example when an alert is triggered. Even if Azure IoT is fast, we need to be able to consume messages as fast as possible. 

1. Routing Rule
The newest and most powerful feature that is available now for this scenario is to use Routing Rules. We can define a routing rule that looks at the message heather and redirect the message to a specific Service Bus Topic, Service Bus Queue or Event Hub. 
In this way we can redirect messages with different priorities to different consumers that can handle.

2. Stream Analytics Jobs
This classical solution involves an instance of Stream Analytics behind Azure IoT Hub that looks as messages and redirect them to different outputs, based on their priority. 
It is similar with Routing Rules, but a little different, because we need to scale the Stream Analytics based on load. Also, when we are using Stream Analytics we can look at message content, in contrast with Routing Rules that can routes messages only based on message heather.

From performance perspective, it is pretty clear that is much faster if you look only at message heather and not inside message body. 
The downside of both solutions is that in both cases the message are leaving the device with the same priority. There is no way to specify at device level the priority level. Even if from technical perspective this is acceptable, because in the backend we are doing near-time processing and routing of messages, there are clients that might have clear requirements that specify that priority level shall be specified at device level and a message with higher priority needs to be consumed before a message with lower priority on backend.

3. Device Twin 
For this kind of requirements Device Twin can be a solution. If we have only 2 or 3 level or priorities, than we can send higher priority inside Device Twin. This shall work fine as long as the high priority messages are not often and the size of the message itself is not big.
In this scenario we are using Device Twin as a secondary communication channel where we are sending messages with higher priority.

Until now we looked at solutions based on Azure IoT Hub. There also other solutions, that are using other services or multiple instances of Azure IoT Hub. This means that the complexity of the system will increase and running cost will be higher. Of course, everything is coming with a price.  

4. Multiple instances of Azure IoT Hub
This solution involves having 2 instances of Azure IoT Hub. One for messages with normal priority level and another one with higher priority. We already needs to manage two instances of Azure IoT Hub, register devices to each of them, pay for both of them and so on.
Even if the solution is ugly, don't think that the cost will double from IoT Hub perspective. In theory the number of messages with higher priority should not be the same as with normal messages. Meaning that the instance size of IoT Hub can be smaller. 
This workaround can become ugly if you need multiple priorities levels....

5. Secondary Channel
It involves having one or more channels that can be used to send message with higher priority. Like in the previous case, where we had a secondary instance of IoT Hub, in this case we can have an Event Hub, a Service Bus anythings that can receive messages. Even a Web App that expose a REST API should work. 
Solutions like force you to reinvent the weal again. Things like security, redundancy, authorization and authentication, registration needs to be defined one more time, even if in Azure IoT Hub they are coming out of the box. 
Be aware, even if the solution might sounds appealing, it is more complex than this and in one way or another you'll need to write features that are already defined in IoT Hub.   

Message priority on Backend to Device communication 
In this case, we need to send messages with different priority from Backend to Device. In this moment there is no support for such a requirement.

1. Device Twin
Device Twin can be used as for Device to Backend communication to push data with higher priority to device. The good part is that the device will receive the message almost in real time, but the size and the quantity of messages with higher priority is limited.

2. Multiple instances of Azure IoT Hub
This solution involves using multiple instances of Azure IoT Hub (at least 2). For each priority level we might have a different instance of IoT Hub. In this way we can send messages with different priority faster than the rest of the messages.
The biggest downside of this solution is not only that we need to register the device in two different instances of IoT Hub, but also at device level, there will be two 'agents' that will need to run at device level and listen new messages.

3. Secondary Channel
For message with higher priority we can use an external channel. Based on my past experience I highly recommend Azure Tables that are extremity fast and can be used with success for this kind of use cases. Each device can have his own table, where we can use Partition Keys to specify different priorities.
Yes, of course, more complexity at system level and more things to take care from security perspective, but this is life (smile). 

What I prefer?
Let's assume that it is impossible to convince the client to change the requirements. 
In this case for Device to Backend communication I would go with only two channels. All messages with normal priority or lower would go inside Azure IoT Hub, where I would define routing rules. For critical messages I would add them to device twin. I would fight to reduce the number of messages that are marked as critical, challenging each message that is marked as critical.

For Backend to Device communication I would go with an approach where I would use Azure Tables for cases when the number of messages with higher level is not negligible. When the number of messages with high priority is low I would go with Device Twin.

As you can see I didn't looks at Commands. Even if Azure IoT Hub supports commands, they are send on the same channel as the rest of the messages. It means, we cannot map different priorities on top of them, but can be used with success when we need to track different notifications related to a message (command).



Keep in mind that there is no perfect solution or service. Try to find the best solution that resolve your needs. 

Thursday, January 5, 2017

[IoT Home Project] Part 6 - Stream Analytics and Power BI

In this post we will discover how to:
  • Store all sensor data into blob storage
  • Calculate the average temperature and humidity for every 1 minute time-span
  • Store the average temperature and humidity information to to Azure Tables
  • Send the average temperature and humidity information to Power BI
  • Create reports in Power BI to display information for the current day
  • Create a web application that displays Power BI reports

Store all sensor data into blob storage
Once we have sensor data in Azure IoT Hub, we can do anything with them. Let's create a Stream Analytics job that take sensor data from Azure IoT Hub and push it to Azure Blob Storage. Blob Storage is a very cheap place where you can store data. It is perfect to store bulk data.
The first step when you create a job for Stream Analytics is to specify Inputs and Outputs. For now, we will need as Input the IoT Hub. The input can be easily created, especially when Stream Analytic job and your instance of Azure IoT Hub are in the same subscription. I named my input data from IoT Hub - 'inputDeviceData'.
The output of the job will be Azure Blob Storage. For this we'll specify an output to the blob storage where we want to persist our content - ''outputSensorDataBlobStorage'. When you write to Blobs, multiple files are created. Because of this it is required to specify a container, where all the files will be created.
Last think that is necessary to do is to specify the query.  
SELECT
    deviceId as deviceId,
    sensorInf.temp as temp,
    sensorInf.humidity as humidity,
    sensorInf.distance as distance,
    sensorInf.light as light,
    EventEnqueuedUtcTime as receivedTime
INTO
    outputSensorDataBlobStorage
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
As you can see, in the FROM field we specify the input source and using INFO the ouput (blob storage). Even if in this moment we don't have multiple messages, I want only messages of type 'sensorData' to be stored. To do this we need to use WHERE clause.
In the SELECT we specified the list of fields that we want to store in the output. Do you remember the message format that we send in JSON format from Raspberry? This is how I know the name of each property from INPUT.
As you can see, there is a default property called 'EventEnqueuedUtcTime' that represents the time when the event arrived to Azure IoT Hub. I decided to ignore the device time and use this value as reference.
Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Calculate the average temperature and humidity for every 1 minute time-span
To calculate the average temperature for a specific time interval, the most simple way is on top of Stream Analytics. Each job allows us to specify a time window where we can apply different aggregation function like average (AVG).
SELECT 
    deviceId as deviceId,
    System.TimeStamp as timeslot,
    AVG(sensorInf.temp) as avgtemp,
    AVG(sensorInf.humidity) as avghumidity,
    AVG(sensorInf.distance) as avgdistance,
    AVG(sensorInf.light) as avglight
INTO 
    outputSensorDataConsolidatedTableStorage
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
GROUP BY
    deviceId,
    TumblingWindow(second, 60)
As we can see above, I used GROUP BY clause to specify the window frame, in our case is 1 minute. This means that all values for each minute will be aggregated. For sensor data I applied AVG.

Last, but not least, we need to specify the output of our job, that will be in this case an Azure Table. Don't forget to do this job before writing the query. On top of table name you need to specify two additional fields. Partition Key and Row Key. This values represents the properties name from the query (SELECT) that will be used to set Partition Key and Row Key.
This values are mandatory and are important. The combination of this two keys forms key of the entity that is stored in Azure Table. This combination needs to be unique and is used to retrieve data from tables.
For cost optimization you can increase the batch size to 10 or 50. But in our case the impact of cost is low and will force us to see data in Azure Table only after 10 or 50 minutes - we don't want this.
Both queries can be inside the same job.
This is how the table should look like.
Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Send the average temperature and humidity information to Power BI
For Power BI, the trick is not to send data, there is already a connector build-in Stream Analytics. The trick is create and use Power BI. It is not complicated at all, but is might be something new. For me, it was the first time when I used Power BI as service and I didn't need more than 2 hours to discover all things that I need. 
First, you need to go to https://powerbi.microsoft.com/en-us/ and subscribe for a new account. Once you do this, you can go back to Stream Analytics and create a new output for Power BI. Dataset name and Table name will be later used to create the reports. 

The query that send data to Azure Table and the one used to send data to Power BI looks the same. The only difference is the output. We can calculate the AVG data only once and send it to both outputs. This is done using WITH statement that defines a temporary source - if we can call it in this way. The final query should look like this:
WITH avgdata AS (
SELECT 
    deviceId as deviceId,
    System.TimeStamp as timeslot,
    AVG(sensorInf.temp) as avgtemp,
    AVG(sensorInf.humidity) as avghumidity,
    AVG(sensorInf.distance) as avgdistance,
    AVG(sensorInf.light) as avglight
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
GROUP BY
    deviceId,
    TumblingWindow(second, 60)
)

SELECT 
    *
INTO 
    outputSensorDataConsolidatedTableStorage
FROM
    avgdata

SELECT 
    *
INTO 
    outputPowerBIDataConsolidated
FROM
    avgdata

Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Now, that we have defined all inputs and outputs for Stream Analytics, let's take a look how it should look like (see below).

Create reports in Power BI to display information for the current day
Even if this might sounds complicated is extremely simple. If you navigate to Power BI dashboard for your app, you'll be able to find in Datasets 'iot' with a table called 'sensorConsolidated' (see definition of the Stream Analytics Output from the image below). This will be used to generate our reports. 
To display a chart that display the average temperature data you need to:
  • Create a new report in 'Reports' section
  • Select 'Area Chart' ad drag a new chart
  • Select 'avgtemp' and 'timeslot' from fields section
We need time-slot field because the this fields contains the minute for which the average temperature is calculated.

You can create similar reports for other fields also. Feel free to play around with different time of reports. Also, if you want you can drag fields to 'Values' section, that allows us to calculate the min/max/avg for a specific interval. In my case, for each data, as you will see in the image from the next section, where I created a small dashboard. 
I don't want to go to deep in Power BI, because it's a complex subject. The tool is extremity powerful and can be used to create in a few seconds powerful reports.

Create a web application that displays Power BI reports
The cool thing and Power BI is that allows us to export the report to a web page. Basically, from the report we can get the HTML and needs to be added to the page where we want to show our reports. The HTML is an iframe that is populated by Power BI. Pretty cool.
From Report if you go to 'File>Publish To Web' you shall end up in the right location where iframe is generated and ready to copy/paste.
To publish the report we'll create a Web App (Free tier is enough) with an empty page where will past the iframe HTML from Power BI. You'll see on GitHub that I created a empty ASP.NET Core application that I deployed in the Web App. This was only for fun. In this moment index.html page that is under 'wwwroot' is more than enough.
Don't wary if you see that the report is not updated in you web app every second. Inside Power BI there is a cache for reports that updates every 1 hour. 
The web app is live for now - http://vunvulear-iot-portal.azurewebsites.net. I don't know for how long I will keep the web app, but now is working (smile). A print screen can be found below.
The source code for this web app can be found on GitHub: https://github.com/vunvulear/IoTHomeProject/tree/master/webapp-iot-portal
HINT: To be able to set as default page index.html in a ASP.NET Core application you need to navigate to Startup.cs and specify in the Configure method to use default files and load static content. Static content is expected to be found under wwwroot folder. 
        public void Configure(IApplicationBuilder app)
        {
            app.UseDefaultFiles();
            app.UseStaticFiles();
        }
At the end of the post, let's take a look on our web report dashboard.

Conclusion
Each time I'm impressed how we can create and deliver functionality so easy using out-of-the-box services. Stream Analytics and Power BI change the way how we need to write our applications.

Next post
[IoT Home Project] Part 7 - Read/Write data to device twin

Next Step 
In the next post we will start to use device twin to control how often sensor data are send to Azure IoT Hub.