Section Four: Extraction Analysis

Extraction analysis tools are focused on converting large datasets into smaller ones. Quite often in GIS, we encounter datasets which contain way too much data for out needs, which leads to the need to make smaller, more concise data sets.  Extraction analysis tools apply to vector, raster, and data tables.  In Chapter Five, we were first introduced to the idea of selecting data - by attribute, by location, and interactively - and then exporting that selection to a new output layer.  This was one form of extraction analysis, and in this section, we are going to briefly look at selections and exporting again, as well as some vector and raster geoprocessing tools which use spatial interactions instead of selections to create smaller datasets.

7.4.1: Vector Extraction Analysis

So many times we obtain vector layers that are way more data then we need, either by the fact it’s often easier to find a complete data set (such as all 50 states versus a Colorado specific layer or all the highways in the US versus Colorado specific highways) or we don’t know what data we need until we examine the attributes (such as cities with a population 50,000 or more). No matter if we are targeting specific features right off the bat or we need to explore our data before making a selection, one of the most common vector layer tasks we do is extraction analysis.  In the broad world of geoprocessing categories, any time you reduce the size of a vector layer it is technically considered “extraction analysis”.  However, extraction analysis isn’t limited to just selection based tools and functions. Any geoprocessing tool which is designed to reduce the size of a data set, usually based upon spatial relationships, is considered to be extraction analysis. In this section, we will look at two extraction tools: Clip and Dissolve.

Selecting and Exporting

In Chapter Five, we took an in-depth look at how to select features by attribute, by location, and interactively. We also looked at the common uses of selected data - examining the results in the attribute table, exporting the data to a new layer, exporting the table, saving the data as a layer(.lyr) file, and limiting the features geoprocessing tools run on.  As we already spent so much time on that topic and have practiced it several times in lab, we are not going to take the time here to repeat what has already been said.  If you need a review on selection tools and the use of selections, review Chapter Five.

Clip

Clip is an extraction analysis tool, the fifth of the “top six” geoprocessing tools available in the Geoprocessing menu,  We have some vector layer that is too big - our "dough", and we need to extract the features which fall exactly inside some other polygon feature - our "cookie cutter".  The tool takes the input layer (the dough) and stamps out just the features that lie inside the borders of the clip feature (the cookie cutter).  

Let's look at an example of two ways to solve a problem, but with slightly different results.  You are given a polyline layer of all of the hiking trails in Colorado and a polygon layer of all of the National Parks.  Your task is to extract all of the trails within one park, Rocky Mountain National Park.  Since both the National Park polygon layer and the hiking trails polyline layer contain too much data, you're going to need to extract just the features you need.  To get Rocky Mountain National Park, you complete a Select by Attribute and export the selected (highlighted) features to a new layer, naming it "Rocky_Mountain_National_Park".  So far, so good.  Now, to deal with the trails.  If you were to complete a Select by Location, for all of the trails which intersect your new RMNP polygon layer, indeed, you'll get all of the trails inside the park, but in addition, you'll get any trail that extends outside the park, since Select by Location is really only capable of selecting features from one layer (trails, in this case) based on a designated spatial relationship (intersect, in this case) with another layer (RMNP, in this case).  Depending on the task at hand, this may be the right answer.  However, if your task is to find the total miles of trails found inside the exact park boundaries, the Select by Location method will inflate your answer, since it would include the miles of trails that extend outside the park boundary.  In order to get an exact value, you'll need to use the Clip tool (the cookie cutter tool) to cut the hiking trails (the cookie dough) exactly at the boundary of the park (the cookie cutter) before finding the total mileage.

Figure 7.9: Select by Location vs the Clip Tool for Trails Within Rocky Mountain National Park
RMNP Select by LocationRMNP Post Clip
Select by Location selects all trails which intersect the polygon that is Rocky Mountain National Park.Clip cuts the trails at the boundary of the park, just like a cookie cutter would.
Clip is used when you want to answer the question: “What lies Inside the area where the Clip features are coincident with features from the Input layer?”
clip
Clip and Erase as Cookies

Dissolve

Dissolve, the sixth and last of the “top six” tools in the Geoprocessing menu, aggregates (groups together) features in a single data set (that is to say, this tool only has one input) based on one or more values found in the attribute table.  Dissolve is considered an extraction analysis tool in the sense that it takes a large dataset and combines features to make a smaller dataset.  In contrast to tools like Select and Export and Clip, the geographical size of the output dataset from a Dissolve tool run remains the same size as the input.  In other words, Extraction analysis tools like Select and Export and Clip reduce the size of the dataset based on how the features interact with the landscape in a spatial way while Dissolve doesn't make the overall size of the dataset smaller, but the count of features go down as things are combined based on one or more established aggregation values.  

Similar to games like Candy Crush or Bubble Bobble, where the task is to move through the game by connecting together matching candy or colored bubbles, the Dissolve tool combines features based on some common value.  In Figure 7.10 below, 

Figure 7.10: The Dissolve tool vs Bubble Bobble
bubble_bobblepre dissolve with numberspost dissolve with numbers
The goal of games like Bubble Bobble and Candy Crush is to aggregate the objects on the screen based on some attribute - the type of candy or the color of the bubble. In contrast to the Dissolve tool, however, is in the game the object is the remove features based on said aggregation, where the Dissolve tool keeps all of input features in the output layer, reducing the number of features based on that aggregation.A group of features before the Dissolve tool is run.  The symbology reflects the number associated with the feature, for visible clarity.  While it's possible to have the name of the color as a field in the attribute table, this run of the Dissolve tool is based on the number, not the color.After the Dissolve tool, where the input features were aggregated based on the number associated with the feature.  The input feature had hundreds of polygons, each with a distinct value of 1 - 6.  The output feature has only six features, still with the same distinct value of 1 - 6.  This example shows all of the features neighboring each other, but the Dissolve tool can aggregate features which do not touch.  Those that do are merged into a single polygon, and those that do not remain where they are and become a multipart polygon feature, or a feature that shares a single line in the attribute table without physically touching in the landscape.  Hawaii is a great example of a multipart polygon feature.  All of the islands are part of the State of Hawaii, however, the do not physically touch in the real world.
Use Dissolve to combine like features in a single vector based upon one field in the attribute table.
dissolve1

7.4.2: Raster Extraction Analysis

Much like vectors, we often have the need to reduce the size of large rasters into smaller ones.  Sometimes it's because we simply have too much data and would like to reduce the quantity and other times, it's necessary to produce faster processing times.  Vector tools, in general, run fairly quickly within ArcGIS, while raster tools, in general, take a much longer time.  This is because the software isn't just dealing with the spatial placement and interaction of vertices, but instead needs to process each pixel individually and write the output value to a brand new raster file. The process of analyzing raster cell values and writing a new raster is processor heavy and time consuming.  Reducing the size of rasters to just the area of focus can greatly speed up the processing time, reduce the amount of space required to store raster files, and tax the computer less when it comes to processing power (which also leads to fewer software crashes!).  Other raster extraction analysis tasks work similar to vector tools, such as Reclassify, which acts similar to Dissolve (but not exactly the same), and Extract by Attribute, which creates a smaller raster dataset based on the values contained in each pixel.  While this class is primarily focused on vector analysis, it's important to at least explore the possibilities of tasks a technician can accomplish with rasters, even at this early stage in the GIS game.

Extract by Attribute and Extraction to Point 

Way back in Chapter Three, we learned one property of raster data is that each cell within a raster layer contains one or more numeric values.  Those value can be things like classification values, where the number is a "coded value" to represent a group of data such as water, urban, trees, or soil; or the values can represent continuous  characteristics of the landscape such as elevation, precipitation, or temperature; or those values can represent discrete data values, such as noting which pixels show a river or a house.  Regardless of what the values are representing, extraction tools are capable of examining the values and writing the output to a new, smaller raster or convert the values to a vector layer.

Extract by Attribute

Extract by Attribute, just like Select by Attribute and Export, utilizes a SQL expression to find the desired values within a raster cell and write the cells which were true to a new output raster.  For example, if you were interested in extracting all of the raster cells within a DEM where the elevation is greater than or equal to 2,500 feet, using the Extract by Attribute tool and the SQL expression “Elevation” >= ‘2500’ is the ticket.

Figure 7.x: Raster Extraction Analysis Tool: Extract by Attribute
ExtractByAttriubte
Raster To Point

Raster to Points will pull the value from each cell, such as the elevation in a DEM, and extract it to a vector point layer.  In general, it's not possible to find spatial relationships between vectors and rasters, thus a technician must make a choice to work almost completely in the vector world or almost completely in the raster world (each one has it's advantages and disadvantages, however, that is more of a discussion for another class).  In order to find the start and end elevations for a road polyline feature, for example, the best choice is to extract the elevations for each pixel in a DEM to a vector point layer and eventually perform a spatial join (with some more processing steps in between).  By turning the raster DEM layer into a vector point layer, the technician has made the choice to work in the vector world to complete the remainder of the analysis (nothing is stopping them from switching back to raster later, if that better suits the needs of the project moving forward from some other given task).

Figure 7.X: Raster Extraction Analysis Tools: Raster to Point
ExtractToPoint

Extract by [Shape, Vector, or Other Raster] 

Like the vector Clip tool, rasters are able to be extracted by an input shape, whether that shape is an arbitrary geometric shape using tools such as Extract by Circle, Extract by Rectangle, Extract by Polygon.  If an exact area is known or needs to be preserved, there are tools like the raster version of Clip, which uses a vector layer as the cookie cutter and Extract by Mask, which uses another raster to determine the extent of the output raster.  For all of these raster based tools, we can apply the “cookie cutter” idea, where the input raster is the “cookie dough” and the geometric shape/specific vector/specific other raster as the “cookie cutter”.  And like with the vector tools, the extracted raster is a smaller version of the input raster.

While the most common use of these tools are to create an output raster which is specific for an area or given project, as mentioned above, sometimes rasters are reduced in size to increase processing time.  The task of creating an output raster that is smaller than the original, but might not yet be the exact size/area of the project is called subsetting or creating a raster subset (just depending on if you need a verb or noun) and is commonly done to increase processing time, reduce storage space, and make the transfer of data a whole heck of a lot easier.  Many websites and geospatial data repositories use this method for the downloading of rasters to make sure the transfer is speedy and complete.  You might find these downloads to be subset by state, county, or an arbitrary bounding box set by the user.  In all cases, however, the goal is speed, accuracy, and completion.

Figure 7.X: Extract by [Shape, Vector, or Mask]
raster_clip
This picture is an example of either an Extract by Rectangle or Raster Clip, where the input is a vector feature, either drawn by the user (Extract by Rectangle) or determined by an input vector feature (Raster Clip)
Reclassify

In Section Two of this chapter, we explored several methods of classification.  Within the group of Raster Extraction Analysis tools, we have a tool simply named Reclassify.  This tool is used to change the stored values within a single raster, creating a new output raster image with new values.  Any of the numeric classification methods can be used, but quite often, technicians use manual classes, establishing areas within a raster that meet or do not meet certain criteria.  For example, a technician may be interested in only elevations between 500 and 1,500 meters.  Utilizing a DEM, they set manual breaks (create classes manually), creating three new classes: below 500 meters, between 500 and 1,500 meters, and above 1,500 meters.  The Reclassify tool changes the values inside each pixel from the stored elevation (496, 527, 14098, etc) to Class 1 (below 500), Class 2 (between 500 and 1,500), and Class 3 (above 1,500).  The new output raster no longer has elevation values stored within each pixel, but instead has a Class Value.  There is no way to get the original elevation value from the new, reclassified raster, as that is no longer it's purpose.  If the technician needed to know the elevation value, they would need to use the Identify tool and look at the original raster for the matching pixel, as the reclassify tool doesn't change the size or location of each pixel, just the value stored inside. 

Figure 7.X: Reclassify Tool
reclassify_tool
In this example, we see the input raster is a DEM with elevations in meters. The technician is interested in classifying areas into three groups and assigning each class a new value. Class one is areas where the elevation is below 500 meters, Class 2 is areas where the elevation is between 500 - 1,500 meters, and Class 3 is areas above 1,500 meters. The output of the tool shows each new class in a different color with the corresponding class number inside the pixel.
reclassify_tool_nodata
Using the same classes, the Reclassify tool can hide pixels (as we know, based on the properties of a raster, the raster is made up of a grid of rows and columns of pixels. The raster does not need to be a square with an equal number of rows and columns, but each row and column must be complete) using the value "NoData". The technician still uses manual breaks with three classes, but instead of using the numbers 1 and 3, the technician uses the No Data value to tell ArcGIS to hide those pixels from view and further analysis.