4th of July Hangover: Fireworks Trash

Fireworks are illegal in Massachusetts

But that doesn't stop anyone from setting them off. And during a time of year when Boston is festive with independence spirit, involving our men and women in blue to chase down some hooligans launching pyrotechnics doesn't seem like a good use of resources. Yet, according to local news, Boston residents logged 372 calls to Boston Police due to illegal fireworks. And there was at least one injury from fireworks in Mattapan on Saturday. Quincy also reported someone losing several fingers.

And ironically, advertising fireworks (or fireworks stores) is perfectly legal in Massachusetts. Just don't bring them here or you face a $100 fine.

Nevermind the trash that is left over from all of that...fun.

 Courtesy of CITY OF BOSTON/311 APP

Courtesy of CITY OF BOSTON/311 APP

I really liked the Boston Globe article on the fireworks trash problem, but being an analytics consultant, it left me with more questions than answers. Where was this a problem? Is it geographical, i.e. certain neighborhoods in Boston? What's the frequency of the reported issues? Is this fake news? Is this 2 people upset? Or do we have a larger issue here?

To answer all of these questions, we need more data. And like any data analysis, what turns out to be a simple set of questions, takes a lot longer to analyze and answer than originally planned. This post will detail out the analytical approach used, the tools needed, and speak to the conclusions drawn from the data.

(If you want to follow along, head over to the Boston 311 website and download the latest data yourself.)

Assemble the Data Set

At this point, I am going to assume you've downloaded the 311 data. After a quick glance through the data dictionary, the 311 request's text data isn't available for download and the City of Boston doesn't have a specific qualifier or field value for "Fireworks Trash." We need to enrich the data to get more information. Luckily, Boston has exposed the 311 data via a web portal and we can see case-specific details there. Here's one example service request which submitted the photo above. Time to explore web scraping!

Introduction to Import.IO

One of my favorite tools to use for gathering online data is a website called IMPORT.IO which is a visual web-scraping tool for extracting data from nearly any website. The neat thing about this tool is the easy-to-use visual interface which means you do not have to write any code (unless you want to!) There are also advanced features like scheduling extracts, extracting data behind a login page, etc. for those who want to pay for the premium product. For the purposes of this post, we will be using the Free plan, which limits you to real-time jobs, 500 queries, and access to 5 extracts "runs" of data. Things will get advanced, as we'll use some REGEX code to extract the Case Enquiry ID from the service request.

 IMPORT.IO homescreen. Accessed 7/7/2017.

IMPORT.IO homescreen. Accessed 7/7/2017.

For those interested in a comprehensive Users Guide to IMPORT.IO, please following this link. For the sake of this post, I will assume the reader has some experience.

Set Up Extractor in IMPORT.IO

Once signed into the website, go ahead and create a new extractor. Enter the URL from the Boston 311 web portal with the search query desired:

https://311.boston.gov/reports?utf8=%E2%9C%93&q=fireworks

Let IMPORT.IO process that page, and you will notice that is starts to construct a data set on the "Data" tab. Clicking on the "Edit" tab shows you where the web-scraper is gathering the various elements of the data. To complete the extract, I added a column after "META" and applied some REGEX code to extract the #101002151226 case id from the webpage into its own column. (We need this stand-alone column for Tableau later. You can also do this split in Excel using a MID() and FIND() functions.)

\S*#(?:\[[^\]]+\]|\S+)

Lastly, I also created a new column to capture the existence of a picture on the service request. Make sure to enable "Capture this link's URL" in the dropdown menu!

Just in case you don't get a prompt, the reader will also need to encode all of the sub-pages into the extractor. I used this list of URL's to capture the 20 pages of results from my search.

https://311.boston.gov/?page=1&q=fireworks&utf8=%E2%9C%93

https://311.boston.gov/?page=2&q=fireworks&utf8=%E2%9C%93

https://311.boston.gov/?page=3&q=fireworks&utf8=%E2%9C%93

...

https://311.boston.gov/?page=20&q=fireworks&utf8=%E2%9C%93

After clicking "save," go ahead and run the extractor and download the data to Excel or CSV. If you did everything correctly, the results should look like this:

 Data downloaded from IMPORT.IO

Data downloaded from IMPORT.IO

Analysis in Tableau

For those of you who don't have a copy of Tableau, you can always use the free, Tableau Public software.

Having downloaded the 311.csv file from the City of Boston website, and having gathered the IMPORT.IO web scrape results, we can finally join data in Tableau for visualization. Go ahead and add both files as data sources in Tableau. Join the files together on the "Case ID" which is a 12-digit unique ID for each service request. Use an INNER JOIN to isolate only the common service requests between our web scrape data and the data downloaded from Boston 311.

Drag in the Latitude/Longitude fields into a new analysis to create a map. Filter on Year = 2016 and 2017. Count Distinct CaseID as the measure. Add whatever fields you'd like into the Tool Tip. Voila! Map of Fireworks Trash! 

Analysis & Conclusions

Based on the mapping exercise and demographic of Boston, we can conclude that most incidents appear to be happening in the Mattapan, Dorchester neighborhoods of Boston. I can also see in the data that several service requests that show up in the web scrape data are not showing in the 311.csv file, example: #101002151739, despite the data from Import.IO showing well over 150 requests. I am wondering if the City of Boston is filtering the 311 data to only be requests that are open, complete and not those in-progress? Hard to be sure.

Either way, there is not quite enough data being returned by the analysis to draw any conclusions. Sounds like a "Part 2" will be needed to investigate why! πŸ˜Š


Deeper Dive!