Gordon Strodel, Archetype Consulting

Gordon Strodel

Data & Analytics Solution Owner for Slalom Boston. Background in Mechanical Engineering. Obsessed with solving problems elegantly and permanently.

Resume (PDF)

📍 Boston, MA

Arrived! IoT data in Google Cloud

Arrived! IoT data in Google Cloud

Welcome to Google Cloud! ☁️

I am pleased to report, that after an initial failed attempt, and one expired free trial later, I am finally publishing IoT data to Google Cloud. For those of you who read, referenced, or followed-along with my IoT series (Part 1 and Part 2), I illustrate in Part 1 some of the challenges of getting data into the cloud and my initial success with Microsoft Azure. Part 2 covers more detail in Azure as I spin up a “traditional” SQL Server database in the cloud for storing data. After reading this article, you’ll know how to get IoT data into Google Cloud too!

Let’s jump immediately to the good stuff. Here it is, my IoT data saved into BigQuery and visualized in Data Studio. You can see the report full-screen here.

Setting up the Cloud Components

For this project, we will leverage the components of Pub/Sub, BigQuery, and DataFlow. If you are using the Particle.io platform, the integration to Google Cloud is very easy and well documented. I’d recommend starting with their recommended set-up for the webhook and pub/sub credentials. If you are new to Google Cloud and have not yet created an account, please do so here.

Particle.io ‘s architecture — They provide an out-of-the-box integration (via webhook) with Google Cloud

Particle.io ‘s architecture — They provide an out-of-the-box integration (via webhook) with Google Cloud

At this point, you should have set up the Google Cloud Console, a Pub/Sub topic, and provisioned the Particle.io user with access to the topic. Again, Particle.io has a great write-up on this process, so make sure you have the webhook working in “test” mode before proceeding!

After you set up the integration in the Particle Console, make sure the device firmware on the IoT device itself issues a publish command. Here is an example from my code. Notice I am publishing a JSON object made up of several variables: Location (Lat/Lon, string), Device ID, Humidity reading, Temperature reading, Soil Moisture Sensor’s value, and a timestamp of the sensor reading.

String parsed_data = String::format(
"{ \"location\": \"%s\", \"device\": \"%s\", \"humidity\": %d, \"tempF\": %d, \"soil\": %d, \"voltage\": %f, \"sensor_time\": \"%s\"}",location.c_str(), myID.c_str(), humidity, temperatureF, cap_sensor_val, voltage, sensor_time.c_str());
Particle.publish("webhook_name_here", parsed_data, 60, PRIVATE);
delay(5000);

The data payload for each sensor’s reading shows in the console like the following:

Example JSON payload from my IoT sensor

Example JSON payload from my IoT sensor

After the firmware is loaded on the device and the Google Cloud integration is enabled, you should see the successful publish and webhook results in the Particle Console. Here is a screenshot of one of my JSON payloads with the corresponding webhook and response lines in the Events pane.

webhook-results-redacted.png

At this point, you have your device publishing messages to a Google Pub/Sub topic but since we don’t have anything else hooked up, the data isn’t going anywhere. Please make sure it is working correctly and the results are being displayed in the Particle console before proceeding further in this tutorial.

If I can be of any help getting to this point, feel free to contact me! 😊

Setting up the rest of the data flow

Overview of the components we’ll be using in this tutorial.

Overview of the components we’ll be using in this tutorial.

After data is flowing to the Pub/Sub topic, we’ll focus on starting up a Data Flow job to move the JSON messages into a BigQuery table. Illustrated above is the architecture we are building as part of this tutorial. Eventually, we want to visualize the data in Google’s Data Studio. But first, you have to get the data saving into BigQuery. To start, we must specify the destination schema and table in BigQuery. Follow Google’s guide for creating a new data set and schema in BigQuery. Then create a new table. Here is a screenshot of the table I created:

bigquery_table.png

After creating the table, you can return to Data Flow and we can set up the final piece of the data pipeline to take the messages being received from Pub/Sub and push the data into the BigQuery table we created earlier.

  1. Open Data Flow, and choose “Create New Job from Template.”

  2. In the dropdown, choose option “Cloud Pub/Sub to BigQuery”

  3. Copy-Paste the following items into the dialog boxes:

    1. Pub/Sub name and topic used as input

    2. Big Query (fully qualified) table name and schema used for output

    3. A temporary bucket in Google Cloud Storage (We didn’t talk about this, but GCS is so simple, I’ll let you figure it out. If you get stuck, start here.)

  4. Expand “Advanced Options” and add these parameters to save costs

    1. Max Workers = 1

    2. Machine Type =n1-standard-1

Ready? Ok, click “Run Job” 🤞

If it works, you’ll see the Data Flow job start up and display a screen that looks like the following image. At this point, data should be flowing between Pub/Sub and BigQuery. Quickest way to test this is confirm a payload was sent via the Particle Console then query BigQuery’s table to see if the record exists. If not, check the error logs in Data Flow.

As you can see from the screenshot above, I accumulated 161 message failures in my testing because I was having trouble aligning the data types for the date-time values published from the sensor readings. If you use my code, you should be fine.

Visualizing the Data

Ok, now that we have the data saving into a table, let’s work to visualize it. After you create the BigQuery table, I’d recommend creating a view on top of it so we can use the data in Data Studio later on. Here is some code to get you started:

SELECT
device,
PARSE_DATETIME("%m/%d/%Y %H:%M", sensor_time) as sensor_datetime,
location,
voltage,
soil,
tempF,
humidity
FROM `particle-plant-monitor.particle_data.particle_iot_data_raw` 
WHERE sensor_time like '%2019%'
AND humidity > 0

A few notes on the SQL code above:

  1. I filter on values LIKE “2019” because of some in-development formatting I was doing with the dates. You probably don’t need that in your query. I simply use it to filter out the bad data in my data set has I haven’t deleted the old data.

  2. I also filter on Humidity > 0 as I am seeing a fair amount of payloads get sent with Humidity = -4 or -3 or something. Given I am using the DHT22 sensor, it’s not a data refresh issue. I think it’s an issue with the new Mesh hardware and the library I am using, but I can’t find the old community thread referencing the issue. More troubleshooting for another time!

Here is a screenshot below showing the data in the BigQuery view running the SQL above:

Once you have the output formatted as you’d like in BigQuery, click the “Explore in Data Studio” button to create a data source in Data Studio to create a report. Google will probably ask permission to link the BigQuery source with Data Studio. Click Yes.

Once in Data Studio, drag the Time Dimension to the rows and desired sensor value to the values. Below, I am charting the Average Soil Moisture reading over time for two different sensors on two different plants. Since I am writing this right after spinning up the data collection, the line charts don’t very interesting because it only shows a few data points. Give it a few warm days (to let your plant take up the water! More here.) and things will start to look interesting! From here you can use the functionality of Data Studio to your heart’s content to create whatever visuals you’d like.

Conclusion

Thank you for coming along for the ride on Google Cloud. I hope this article was helpful in you IoT or other streaming data needs! If you are stuck or simply want to learn more, feel free to get in contact with me directly.


Deep Dive

Using Google BigQuery ML to Predict when to Water your Houseplant

Using Google BigQuery ML to Predict when to Water your Houseplant

Testing IoT Mesh Network Latency

Testing IoT Mesh Network Latency