Section Five: Using Selected Data

In the last section, we looked at the basic idea of selecting data - we use one of three selection methods, Select by Attribute, Select by Location, or Interactive Selection, to highlight features in a map and rows in a table in order to isolate those features from all of the others.  That's all well and good, but what can we do with these selected features?  Before we explore the three selection methods, lets look at some of the more common uses of selected data.

Figure 5.8: Selected Records
number_selected-display
After making a selection, the attribute table notes the number of selected records out of the total number of records,  In this example, we see that 20 records out of the 47,610 total records are selected.  This means our vector shapefile or feature class (as the attribute table both look the same) has 47,610 rows/features, and some selection method resulted in 20 of them being highlighted.

Looking for Answers in the Attribute Table

One of the simplest thing you can do with selected data is just answer a quick question about the data.  Since selection visually separates values by highlighting the lines which answer a question (a query) about the attributes in the attribute table or a question of how two layers interact with each other spatially, we can use these selections to answer a quick question.  For example, you might have a rivers polygons layer where one of the fields contains the class and you are interested in how many of those rivers are a Class 1.  Using an attribute table-based query (the topic of the next section), you can highlight just those rows and obtain a count by looking at the number of selected records out of the total number of records. Finding a count of records is just one example of quick question-and-answer interactions with the data.  Other examples include examining other fields once a selection is made to find out more, seeing where selected features lie on the map, and exploring the distribution of features across some area.  It's important to note, however, that these quick question-and-answer interactions do not really quantify the data, but show you an answer about the data.

Arithmetic and Statistics

In Selection Two of this chapter, the introduction to the attributes table's menus and buttons, we noted that within the field header menu, there was an option to perform arithmetic and statistics on any number field utilizing the Statistics and Summarize tools.  These tools can work on either all the values in one field or just a select few.  Once you have some features selected based on either an table query or a relationship query, the Statistics tool will note that selection and provide the data for just the selected features. Neat.

Figure 5.9: The Statistics Tool: Selected and All Features
Statistics_of_State_Pop_all_featres
When no selection is made in an attribute table, the statistics tool will refer to all the features in the entire field.  In this example, the US_States layer has 51 features, and the Statics box is showing the population total with a count of 51.
Statistics_of_State_Pop_selected_featres
When a selection is made, like we did with the US_States layer where seven features are selected, the Statistics tool will compute the values for just the selected features, as noted when the count is 7.

Creating Data Subsets

One of the most common tasks after selecting data is to create a data subset, or exporting just the selected features to a new feature class or shapefile. The export process will create a new layer containing only the selected features (of the vector data type, either feature class or shapefile, depending on where the new layer is going to be stored - geodatabase or folder, respectively). For example, if you need a layer that contains only the State of Colorado, there is no need to spend hours searching for a Colorado.shp shapefile when layers of the United States as whole are easy to find. By adding the United States layer to an MXD, selecting the State of Colorado, and exporting a new layer - Colorado.shp - you’ve got it made in the shade.

The purpose of subsetting data (creating a data subset) is to create a layer with a smaller amount of data which serves our purpose in a more focused way.  Often, we download data which has thousands of features, and we are just not interested in all of those features.  Some may fall outside an area of interest or study site, some may be outside of a time frame which fits the scope of our project, or some might not meet the minimum size requirement.  These are just a few examples of why we might create a data subset - there are theoretically infinite reasons why some features may not make the cut of a specific project - but in all cases, the data set one started with was much too large and needed to be pared down into a subset of the original. 

Figure 5.10: Data Subsets
Data Subset
The US_States layer was subset by region. The output layer has only eight features (vs the 51 of the input layer), all of which fall into the Mountain subregion.

Export Data as Table

While ArcGIS is really good at spatial relationships and analysis, table-based analysis is rather limited. We know we can look at basic statistics, such as sum, mean, and median, and there is even a tool (Frequency) to count the occurrences of each unique value in a field, but beyond that, ArcGIS is limited.

Not to worry, though. Companies like Microsoft have spent years building software to solve complex problems with numbers, and ArcGIS allows you to export an attribute table as a database file (or .dbf), which Excel can read but cannot write.

Some Notes About Exporting Tables

    • With the possible exception of the final project, there is never a time in this class where we will/need to export tables
    • When exporting, note the file extension - .dbf = data table, .shp = shapefile/vector file
    • Excel can read .dbf, but will not write to one. Open Office (openoffice.org) will, however.

    Limit the Input Features for a Geoprocessing Tool

    Most all geoprocessing tools honor a selection, when one is made, within the input layer. That means, if you want to create a buffer of only a subset of layer, such as all the Class 1 rivers (and exclude the Class 2 and 3 rivers), you can go about this is two ways: 1 make a selection (via Select by Attribute/SQL) and export the subset as it's own layer.  From there, you can perform any geoprocessing tool without any sort of additional inputs; or 2. make the selection and run a geoprocessing tool on the input layer while the selection is made.  Either method means the tool will run on a subset of data, the difference being is method 1 leads to an additional layer being created in the process while method 2 preserves the count of layers.  Neither is the "correct" way, they are just different. 

    A note: Some tools have a tick box stating “Use Selected Features”, some do not, and some will not honor a selection. It is pointless to list them as we are not memorizing tool processes, so just know, if you expected the selection to be honored, and it was not, reopen to the tool and look for the tick box, or export the selection first, then run the tool on the exported layer.