Section Eight: Select by Location

Let's look at an example where we have two shapefiles: a point layer representing earthquake epicenters and a polygon layer representing the contiguous United States (the "lower 48"), and we want to know what of those earthquakes occurred within the State of California.  When we open and examine the attribute table for the earthquake point layer, we find that there are fields for the year, month, day, and strength of each earthquake, yet there is no field for "State".  Since our initial goal was to complete a simple SQL query for "Epicenter_State" = 'California', we are kind of at a standstill.  No field designating state name means no ability to complete a simple SQL query on the table.

We still need to know what earthquakes happened within the borders of the State of California, and we've determined there is no field for the state name, so we turn to another tool in our GIS toolbox - Select by Location.Select by Location is the second most common way we use to find data and selects values from one layer based on it's spatial relationship with another layer In the image below, we can see the earthquakes which fall inside the California border, since our eyes can determine which ones fall inside, which ones fall outside, and which ones fall on the border.  In a way, the software can "see" this too, when we tell is what to "look" for utilizing the Select by Location tool.

Figure 5.15: Understanding Select By Location
Select by Location Earthquake
In this example, we can clearly see which earthquakes fall inside the outline of the State of California.  We also see that there is no field in the attribute table where we could have completed a select by attribute to figure out which earthquakes happened in California.  Select by Location, however, can select features within the state based on how the two intersect.
Select by Location Earthquake Action
After the Select by Location tool has run, we see that 72 of the 5,875 earthquake points were selected in the earthquakes layer within in the State of California, regardless of the fact there is no "state" field in the earthquake layer.

5.8.2: The Select by Location Dialog Box: Top to Bottom

Like we did with the Select by Attribute dialog box, we are going to look at the Select by Location dialog box from top to bottom, utilizing our earthquakes in California example, now that we've seen the situation of not having a state field within the earthquake point layer's attribute table and the result of the tool running.  If you haven't yet, take a minute to examine the images above, noting the lack of the field necessary (specifically a field with a "state" header which would designate what state the earthquake occurred in) to complete a Select by Attribute an how the query performed by the Select by Location tool selected (highlighted) features within the earthquake layer based on the fact that the noted incidents intersected the US states layer (the desired query).  Also note that California is selected, but - as we will see - not as a result of the Select by Location query, but instead as an input to the tool.  

The Select by Location tool is still performing queries on the tables - asking the table a question and returning those values where the tool finds the relationship is true - just in this case, the query is less a question which follows the simple SQL query format and more of a question based on relationships.  They are both types of table queries, just approached in different ways - non-spatial and spatial, respectively.  SQL queries, as we read in a previous section, are not unique or limited to GIS, as SQL is a database language and databases popup all over the place.  Select by Location relationship queries, however, are unique to GIS as the only selection criteria is a spatial relationship - how the features interact with each other using relationships such as intersect, are exactly the same as, and fall completely inside. 

Launch the Select by Location Dialog Box

Select by Location can be found in the Selection menu. Unlike Select by Attribute, Select by Location can only be launched via the Selection Menu.

launch_select_by_location-display
Selection Method

The selection methods are the same as in the Select by Attribute dialog box, albeit a little different wording.  Most of the time, we are performing new selections, so the dialog box defaults to "select features from", meaning that the technician needs to change the selection method as needed throughout different selection tasks.

When it comes to our California earthquakes example, we used the default of "select features from", since our earthquake point file had no selection to start, and we wanted to identify the features which meet our defined relationship query.

SBL_selection_method-display

select features from

Creates a new selection based on a Select by Location relationship query.  If there is any sort of selection in the target layer (the layer where the selection is being made), this option will clear those selected features and creates a whole new selection.

Add to current selection

Selects additional features in the target layer based on a Select by Location relationship query.  The current selection - whatever is currently highlighted - may have been selected using an SQL query, another Select by Location query, or interactive selection (the next section of the text). Add to current selection can be used an endless amount of times within one table, well, at least until all the features are selected and there are no more to add to the list.

Remove from current selection

Clears the selection of some features based on the relationship query which is true by running the tool.  The selected features which are being cleared may have been selected by attribute in an earlier step, another select by location relationship query, or interactive selection.

Select from current selection

Whittles down a selection to a smaller set of features based upon another relationship query. This method would be used when you need to select some features, examine them, and then decide which ones to move forward with by using another query. Multiple queries can be used sequentially to reduce the pool of selected features until only the final selection is left.

Setting the Target Layer(s)

In our earthquakes of California example, the question we are asking is: "Which of the 5,875 earthquakes represented in the point layer happened within the border of the State of California?"  After we've established our selection method (create a new selection where there was not one before), we are ready to start placing the different parts of our relationship query into the Select by Location toolbox.

The first part of the question is asking "Which of the 5,875 earthquakes represented in the point layer...", since we are curious about the earthquakes.  That would make the Target Layer the earthquakes layer, since the target layer(s) are those where we would like to make the selection - the target of our question, and we note the target layer by placing a check in the box in the appropriate place in the tool.

In our earthquakes of California example, we only have one target layer - the earthquakes, but the Select by Location tool can have many target layers and can select features from all of them at once.  Maybe we have an earthquakes layers, a point layer for building locations, and a polyline layer for major freeways, all of which do not have a state field and we wish to know which of the features from all the layers fall within the borders of California (this is rather unlikely, but we will make it true for the purpose of the example).  We could, in this case, place a check box next to the name of all three layers, making them the target of our relationship query.  In turn, when the tool runs, it will select features from all three layers which meet the criteria set in the relationship query.

Like with the Select by Attribute dialog box, the option to “Only show selectable layers in this list” will limit the list to only layers which are marked as such in the List by Selectable portion of the Table of Contents.

SBL_target_layer_earthquake
Setting the Source Layer

Continuing with our question, after we've set the target layer as the layer from which we'd like to have the selection made, the next portion we need to look at is "...the State of California?".  It is a bit our of order to the way we phrased our question, but like we looked at with Select by Attribute, machine thinking and syntax structure isn't always the same as human thinking and the structure of the English language.  It's our job, both with Select by Attribute and Select by Location, to set up the query in the way the software expects us to, not the software's job to try and figure out what you are trying to say because each and every time, you will lose and the software will not do what you want it to do.

Within the syntax and structure of the Select by Location tool, the Source Layer is the "selector" layer, as in it is the layer for which the relationship with the target layer is established.  We would like to select from the earthquakes those which happened in the State of California, so the earthquakes are the target and the US_States is the source.  When we look at the relationship types, or the Spatial Selection Method, this relationship will be further solidified by the wording of the tool.

SBL_source_layer_without_selected_features

In Section Four of this chapter, we learned about how we can use selections once they are make within attribute tables, and one of those was Limit the Input Features for a Geoprocessing Tool.  Technically, Select by Location is a geoprocessing tool, as it processes spatial data via a tool, so making a selection prior to running the tool can be helpful.

In the case of our earthquakes in California example, we see that the polygon layer is not made up of just California, but of all the US_States (cleverly noted by the name of the layer), meaning we need to limit the tool to run only for the State of California.  We see that in the US_States layer, there is a selection of one feature - the State of California.  Having just the one state selected will help limit the input features of the geoprocessing tool (Select by Location), which is noted by placing a check mark in the box.

SBL_earthquakes_only_CA

Just below the Source Layer dropdown, we see a check box which states Use Selected Features.  This tells the tool to not use all the the states, but limit the relationship query to only the State of California.  When we look at the US_States attribute table, we see that there is (1 out of 51 Selected), and that one feature is California.  Below the Source Layer dropdown (where the US_States are set), the Use Selected features box notes (1 feature selected), referring to the State of California.  The Select by Location tool both recognizes and honors a selection made in the Source Layer.

SBL_source_layer
Understanding the Spatial Selection Method

The last part of the question we need to address is "...happened within the border of...", referring to which earthquakes occurred within the boundary of the State of California.  When you examine the picture at the start of this section, you can see that some of points land inside and some outside of the boundary of the State of California.  The Select by Location tool needs you to establish what  the Spatial selection method for the target layer feature(s), meaning the tool needs to understand how you would like the source and the target layer to interact in order to select features within the target layer.  The default is "intersect the source layer feature", meaning that in order to select features (rows) in the target layer, those features must intersect the features in the source layer.  

In our earthquakes of California example, we have set the tool to examine what points in the earthquake layer intersect the polygon which makes up the State of California.  There are other spatial selection methods, as explained in the table below.  This, again, is not a chance for you to memorize how the tool works, but understand that the tool explores spatial relationships between the target and source layers in many different ways.

SBL_interaction_method-display
Spatial selection method for the target layer feature(s):

intersect the source layer feature(s)

Using the source layer as the “selection” layer, intersect find features from the target layer which intersect, wholly or in part, the points, polylines, or polygons of the source layer. Even if the symbology of polygons make the appear to be “hallow” they still have an area, and the tool will find all features from the target layer which exist somewhere inside the boundary of the source layer.

are within a distance of the source layer feature(s)

After specifying a distance, “are within a distance of” will locate features from the target layer which exist at a distance less than or equal to the specified distance to features in the source layer.

contain the source layer feature(s)

This relationship is looking for features in the target layer which contain, wholly or in part, features from the source layer. For example, you may wish to find all counties which contain a lake. You could use this relationship with the lakes layer as the source layer and the counties layer as the target layer.

completely contain the source layer feature(s)

This relationship is looking for features in the target layer which totally contains features from the source layer, meaning no part of the source layer feature falls outside the target layer features.

are within the source layer feature(s)

Similar to the target layer containing the source layer feature, “are within the source feature” is the opposite search, looking for features within the target layer which fall, wholly or in part, inside the source layer feature(s).

For example, if you wanted to find all of the roads which are within a single county, including those which start or terminate inside another county, this would be the relationship to use.

are completely within the source layer feature(s)

Much like completely contain, completely within excludes any features which fall outside of the source layer feature(s). If we continue the example from “within the source layer”, this relationship would exclude all of the roads which may start or terminate in another county.

are identical to the source layer feature(s)

This relationship finds all of the features from the target layer which are exact spatial duplicates of the source layer. We note “spatial” since Select by Location doesn’t take into account any attributes, and is only looking for features where all of the features have the same number of vertices that all fall in the exact same spatial locations. This relationship is true for points, polylines, and polygons.

touch the boundary of the source layer feature(s)

This relationship looks for features from the target layer which intersect the boundary of the source layer and is either completely inside or outside of the source layer feature(s).

For example, a Select by Location for counties (target layer) which share a boundary with states layer (source layer) where only California is selected (and we’ve marked the “use selected features” box) would return: Humboldt, Del Norte, Siskiyou, Modoc, Lassen and Sierra Counties (plus 22 more in California), and Clark, Nye, and Mineral Counties (plus 5 more) in Nevada since counties are completely within a state. In the case of “share a boundary with”, all counties which share a border with the State of California’s border will be returned, since they satisfy the completely inside or outside parameter.

To best find all the counties within California that are also on the border, first perform a select by location for counties which “touche the boundary of” the state layer (state with California selected and the “use selected features” box checked) , then change the selection method to “select from the currently selected” and perform a second select by location for all counties (target layer) which are completely within the source layer (maintaining the selection and the tick mark in the “use selected features” box).

share a line segment with the source layer feature(s)

Similar to “identical”, this spatial relationship is looking for shared vertices between polygon or polyline input/output features only (meaning this relationship will not work for points). Where identical needed to have the exact same amount of vertices, in all the exact same locations, “share a line segment” is only looking for two identical vertex pairs. (To find features that share as little as one, use “intersect”, the first relationship in the list)

are crossed by the outline of the source feature(s)

This relationship looks for features from the target layer which cross the boundary of source layer, excluding those which exist with any other sort of relationship, including “completely contains”, “shares a segment with”, or “is identical to”. This relationship could be used to find all the roads (target layer) which start or terminate in a county other than the source county.

have their centroid in the source layer feature(s)

A centroid is the geometric center of a polygon or the midpoint of a polyline. When using the centroid relationship, the Select by Location tool first finds the centroid for all of the target features, then determines if that centroid is somewhere inside the source layer features (not at the center of the source feature, just somewhere within the boundary or along the line).

For example, if you set up a bunch of circular study plots across an entire county, then broke the county into several units, the centroid relationship would return all of the study plots which had their start point (the center of a circular study plot) within any given unit (again utilizing the “use selected feature” box, otherwise you’ll return all of the plots within the county instead of one or two units).


Apply a Search Distance

In addition to the spatial selection methods available, you can apply an additional search distance to the relationship to extend the spatial search beyond the definite boundaries of the source layer. For example, if you wanted to know not only which earthquakes occurred within the State of California, but which ones happened within the state AND within a mile of the state border, you could apply a search distance.  

If you ever run the tool and you notice the selection happened, but so did a whole bunch that surround the intended selection, you most likely have checked this box.  Whenever the tool outcome was unexpected, examine all of the tool inputs, as ArcMap will always do exactly what you tell it to do.  If the outcome was not as you anticipated, you most likely told it to do something you didn't want it to do and you need to find that mistake in the tool input.  This is true for all things ArcMap, as you will discover over and over in the course of lab.

SBL_search_distance-display