First, you'll need to load a few of the libraries you'll use throughout the lesson.
from arcgis.gis import GIS
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
In order to access content and perform the mapping functions necessary, you must to be signed in to an ArcGIS organization. In the module below, replace the username and password with your account credentials. If necessary, you can also replace the domain.
agol_gis = GIS("", username="your_username", password="your_password")
The Montgomery County permit data has been published to ArcGIS Online. To access the content, you'll search for the Commercial Permits since 2010 item. To get more specific results, you can specify the owner's name. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set the parameter outside_org=True
data =
'title: Commercial Permits since 2010 owner: rpeake_LearnGIS',
'Feature layer',
The first result is the Commercial Permits since 2010 item you want. To access this, create a variable that calls the Commercial Permits result from the Living Atlas.
permits = data[0]
The Commercial Permits item is a Feature Layer Collection, so calling the layers
function gives you a list of FeatureLayer objects. The permit layer is the first layer in this item. To visualize this layer on a map of Montgomery County, Maryland, you'll create permit_map
as a map item and add the permit layer data to it.
permit_layer = permits.layers[0]
You can add different layer objects to the map, including FeatureLayer, FeatureCollection, ImageryLayer, and MapImageLayer, by calling the add_layer()
Now that you've added the permit data, you'll explore its contents. Geographic data doesn't only contain information about location; it can also include other attributes not seen on a map.
To explore these attributes, you'll convert the layer into a spatial pandas dataframe.
from arcgis.features import SpatialDataFrame
sdf = SpatialDataFrame.from_layer(permit_layer)
Because this dataset contains several years worth of data, it is fairly large. Instead of viewing the entirety of the dataset at once, you can choose a small segment as an example. To see just the records at the end of the dataset, use the tail()
The permit data contains a long list of attributes that include information about the type, status, and value of the permits issued. Some attributes have self-explanatory names, while others may have names that can be difficult to understand without context. To see which attributes may be of interest to your visualization, you'll list them using the columns
For more information about the dataset, you can use the describe
function to get a set of summary statistics. Adding the property 'T' will transpose the column titles so they are written as rows.
While you've listed the attribute names already, it is also useful to know the type of each attribute. Attribute types determine how data can be analyzed and mapped. You'll query the types of attributes using the dtypes
To get a better idea of the data in each attribute, you can call the unique
function to list all the unique values in a column. One of the attributes you're interested in is Work_Type, which will show how many contruction projects are new or what improvements are being made to existing structures.
As you saw above, the Status attribute indicates level of completion. There are four permit statuses: Finaled, Issued, Open, and Stop Work. To see how many projects fall under each category of completion, you can use the groupby()
method. This method sums the records by attribute value.
permits_by_status = sdf.groupby(sdf['Status']).size()
Now that you've calculated how many projects are in each stage, you'll visualize them in a pie chart. Many common visualizations can be done with either matplotlib or seaborn. You'll import and use matplotlib below.
%matplotlib inline
import matplotlib.pyplot as plt
permits_by_status.plot(kind='pie', legend=False, label='Permits by Status');
The pie chart you created shows how many permits are listed under each of the four statuses as a percent of the total number of permits. Most of the permits are either Issued or Finaled. Finaled permits are issued permits that have also had the requisite inspections performed.
It's helpful to visualize the spatial distribution of permit attributes on a map. You'll symbolize the map so that each permit's symbol represents its status using the argument renderer_type='u'
sdf.spatial.plot(kind='map', map_widget=permits_by_status_map,
renderer_type='u', # specify the unique value renderer using its notation 'u'
col='Status') # column to get unique values from
It is also important to understand what types of permits have been issued to see what structures are being built where. As you did with permit status, you'll list sum the permits issued by type.
permits_by_type = sdf.groupby(['Use_Code']).size()
The series is not listed in any useful way. To sort permit types from highest to loweset, you can use the sort()
permits_by_type.sort_values(ascending=False, inplace=True)
The most common use code, Business Buildings, has almost twice as many permits as the second highest, Multi-family Dwelling. The top four use codes together account for the majority of all permits, so you'll focus on these use codes in your visualizations.
Before visualizing or analyzing your data, it is advantageous to clean it. This process involves removing attributes you don't need, renaming fields with unclear names, and filtering the dataset to only show permits with the four most common use codes. These changes won't permanently affect the original dataset, but they will make the data easier to work with and understand.
First, you'll remove the 'Declared_V', 'Building_A', 'Applicatio' fields using the drop
method. These fields describe aspects of the data that aren't important for your analysis.
sdf.drop(['Declared_V', 'Building_A', 'Applicatio'], axis=1, inplace=True)
To ensure that the intended attributs have been removed, you'll list the columns to check.
The fields are no longer listed. Next, you'll rename some of the attribute fields with shortened or unclear names so that their names are more descriptive. To use the rename
method, list the current attribute name followed by the desired name. Then, call the columns
function again to ensure they've been renamed.
sdf.rename(columns={"Descriptio": "Description", "BldgAreaNu": "Building_Area", "DeclValNu": "Declared_Value"}, inplace=True)
Next, you'll filter the permits to reduce the number of records in your visualization. As you saw previously, four types of permits make up over half the total number issued. Focusing on just these four types will reduce the amount of data to analyze without ignoring the most important types of development. To remove the other use codes, you'll create a filter.
First, list use the head
function with the argument (4)
to list the top four values in the permits_by_type variable you created earlier.
From now on, you only want to use the filtered permit records, so you'll create the variable filtered_permits
and add only the top four types to it.
filtered_permits = list(permits_by_type.head(4).index)
To visualize the top four permit types on a map, you'll apply the filter variable to the dataframe to drop all data that doesn't match the use codes listed in the filter. Double-check the filter's been applied by calling the head
of the dataframe.
filtered_df = sdf.loc[sdf['Use_Code'].isin(filtered_permits)]
sdf.shape, filtered_df.shape
The dataset is filtered. Instead of more than 11,000 permits, the filtered dataframe has about 7,500.
To visualize the data in the filtered dataset, you'll add it to a map.
filtered_map ='Montgomery County, Maryland')
As before, you'll plot the types of permits as unique values.
filtered_df.spatial.plot(kind='map', map_widget=filtered_map,
Your data shows permits, but what do these permits say about when and where growth is happening in the county? Your data also contains temporal attribute fields, such as Added_Date, which indicates when a permit was first added to the system. The field has several values that break down the data by year, month, and even hour.
You want to plot the data in the Added_date field by year, month, and week_of_day to see the patterns over each denomination of time. To break the single attribute field into separate columns, you'll use the to_datetime
function to call the Added_Date field and store it as a datetime item. Then, you'll list each of the three desired denominations of time to split the field.
sdf['datetime'] = pd.to_datetime(sdf['Added_Date'])
sdf['year'], sdf['month'], sdf['day_of_week'] = sdf.datetime.dt.year, sdf.datetime.dt.month, sdf.datetime.dt.dayofweek
To visualize patterns by year, month, and day, you'll use the seaborn library to create three charts.
Note: If you don't have seaborn installed, uncomment the follwing box and run the pip install command. The ! allows you to run system commands from within the notebook.
#import sys
#!pip install seaborn
import seaborn as sns
To plot the data by year, you'll use the countplot
function to total permits by year and present them as a bar chart.
sns.countplot(x="year", data=sdf);
The chart shows the number of permits issued each year since 2010. (The year 2017 has significantly fewer permits because the dataset only covers part of 2017.) You can compare the number of permits visually by the size of each bar. Although some fluctuation occurs from year to year, most years had similar permit activity with a slight upward trend. Though small, this increase in permits shows steady growth in construction permits issued.
Using the same function with the arguments x="month"
and x="day-of-the-week"
, you can also plot permits by these time denominations.
sns.countplot(x="month", data=sdf);
This bar chart changes to show the number of permits issued by month. Based on the chart, the highest permit activity occurs in June and July.
sns.countplot(x="day_of_week", data=sdf);
Almost all permit activity occurs on weekdays, especially Wednesdays. Government offices are closed on weekends, so few permits are issued then.
It can also be helpful to view the data on a timeline, or a line graph. To view the data continuously, you'll use the set_index
function with the argument datetime
to change the index field to the datetime field you created earlier when splitting the Added_Date field. You'll then use the resample
function with the argument (M)
to plot the permit totals by month.
ddf = sdf.set_index('datetime')
ddf['num'] = 1
Because you're looking at the data by month, you have a far more granular view. A huge spike in permit activity occurred in mid-2011. What caused this spike? Is it an increase in overall permit activity, or is it mostly an increase in a certain type of permit? You'll plot the number of permits based on Use_Code to find which one cased the spike.
fig = plt.figure(figsize=(15,5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(ddf['num'].resample('M').sum(), 'k', label='Total permits')
for use_code in filtered_permits:
x = ddf[ddf.Use_Code == use_code]['num'].resample('M').sum()
ax.plot(x, label=use_code)
Based on the legend, permit activity spiked in 2011 due to a sharp increase in the number of multifamily dwelling permits issued. This likely means that there was large residential growth in 2011.
In this workflow, you explored permit data for Montgomery County, Maryland. Based on your findings, you can share several visualizations with the city showing historical growth as well as the top issued permit types. You can also suggest when the permit office may be busiest recieving new permits-the spring and summer months, from March through July, show increased permit activity.
This workflow is based on the Learn ArcGIS lesson, Get Started with Insights for ArcGIS.