Section Two - Vector Data

The first data type we are going to look at, and the type you will use for the majority of your GIS career (and in Introduction to GIS) is vector data.  The term "vector" is not specific to GIS, and if you have any experience with CAD or digital drawing, you know that vector applies to those disciplines, too.  Vector files, in any sort of digital science or art, is simply denoting a type of graphical representation using straight lines to construct the outlines of objects.  Make the straight lines short enough and clustered together, and you can draw curves.  

In fact, you have had many experiences with vector drawings in the past: dot-to-dots.  These (amazing) grade school activities were mostly used as a learning tool for counting or the alphabet, and consisted of a series of dots on a page which, when connected by straight lines in a specific order, created an image.  Vector data in GIS is really no different. We establish each “dot” (using the proper word of “vertex”), and create the useful data by connecting those dots together. A vector file which creates a street in the GIS is made up of a series of vertices (the plural of "vertex") connected together. We know that data in the GIS represents reality, meaning the vertices of that particular street most likely sit at intersections or along curves of that road (since we need to trace out the road in our grown-up dot-to-dot, we need those dots to lie along that curve to help draw it out), and those intersections and curves exist on the Earth’s surface, those vertices must have XY geographic coordinates - which we spent so much time last chapter learning all about.

The job of the vertex is to simply mark a location on the Earth’s surface and really nothing else. Vertices are the building blocks of vector data, but in the GIS when we refer to vector data, we are really referring the product of our dot-to-dot, and those products come in three varieties or geometry types. Vector data, like all GIS data, is a representation of real world objects and we can represent all of those objects using a combination of three geometry types: points, polylines, and polygons.

Looking back on our dot-to-dot, sometimes after it was completed, we saw a shoe or a unicorn or a cat. Vector data is the same, creating polygons - a geometry type which connects three or more vertices in a closed shape - which represent not shoes, unicorns, or cats, but buildings, lakes, parking lots, and forested areas. Rarely, our dot-to-dots didn’t close to create a recognizable shape, but in the GIS we have a second geometry type which doesn’t close - the polyline. Polylines are made up of two or more vertices connected together in a line to represent streets, rivers, and trails. The third geometry type is point, which is made of on vertex each, representing crime report locations, fire hydrants, and individual trees.

  • It should be noted that while the geometry type “point” is made up of a single vertex - one for each point - the words are not interchangeable. “Vertex” is the building block, and “point” is the resulting geometry type. Remember that all vector files are made up not only the shape which you see, but the associated attributes as well. When our goal of GIS data is to combine the spatial and non-spatial, that is exactly what we must do. Points are geometry type - a file with associated non-spatial data in the form of attribute tables (which we will cover is lots of details really soon).
Figure 3.1: Vector Data
In this image, we see two vertices (two in green and one in red) and the polyline that is created by connected each vertex to the next.  The red vertex is special in that it is the last vertex in the line (called a "node").  In polygon features, the node is the start and the finish vertex, since polygons are defined as a closed shape consisting of at least three vertices. 

 Vector data is advantageous in the GIS for many reasons.

  • The file itself is small, usually 50 kb - 200 mb, which is often small enough to be emailed in a zip format (to make it easy).
  • When using vector features in a map, they create smooth, easily defined lines which can be scaled to any size without being skewed or becoming pixelated.
  • In GIS, all vector files are paired with a data table, or an attribute table, which can hold a large amount of information, or attributes, about each independent geospatial object, or feature, (one road, one tree, one building), from the feature’s name (ie. Colorado) to the population in 2015 (5.4 million), to the racial breakdown of residents by county.  Since the beauty of the GIS is combining the spatial with the non-spatial, having vector files with any available information about an area can lead to problem solving you didn’t even know you could do.

3.2.2: Measuring Distances and Areas with Vectors

Since GIS vector files are representative of real-world objects (representation in the GIS model), they represent all factors of the real world - size, shape, length, area, and location. When a polygon vector file contains representations of a series of lakes, each one was carefully drawn in the GIS with the mouse on top of an image, or digitized (Chapter Six), which means each one has a calculated area. And just like measuring the surface area of a lake (or any irregular object), there is a fair amount of calculus involved. Don’t worry, you do not have to solve for it, the GIS does it for us in a fraction of a second, but you should know that solving for the area of a curve uses integrals, a method of finding the area under a curve by solving for a large number of rectangles which fit under that curve.  In short, finding the area under a curve is a very challenging thing to do and finding the area of a rectangle is pretty easy - it's the length multiplied by the width.  If we place a whole bunch of very skinny rectangles under a curve, the rectangles can mimic the curve and we can solve for the area of each one, adding them all together.  When we look at rasters in the next section, we will see that finding the area of objects is as easy as adding a bunch of squares together.  While we haven't gotten to that yet, keep in mind how the area of vector polygons are solved vs how the area of pixels within rasters are solved.

Figure 3.2: Measuring the Area of Polygons using Integrals

Measuring lengths of polylines is much easier and more straight forward then polygon areas. When a polyline is digitized, say for a series of roads, the GIS measures the distance between each vertex and adds it up (per the definition of a vector feature). As we learned in Chapter Two, projected coordinate systems contain linear units of measure such as feet or meters, and the GIS calculates the distance accordingly.

Figure 3.3: Measuring the Length of Polylines by Adding the Lengths of the Individual Segments

Point vector data has no length or area, since each point is a coordinate pair found on a specific geographic grids. The measured values of a point are equal to the X and Y coordinates of its location, such as degrees, if the point were plotted on a geographic coordinate system or in meters, feet, or international feet if the point were plotted on a projected coordinate system.  The "address" of the point is a measurement from the origin of the system, with the X coordinate measuring the distance in degrees, meters, feet, or international feet along a line of longitude and the Y coordinate measuring along a line of latitude.  These are indeed measurements, as they are finding the distance from 0,0 to whatever the measurement is.  While the point itself has no length or area, there is still a measurement involved.

Which takes us back to the idea of creating and measuring polygons and polylines: Vertices are the building blocks of vector data.  If you place two or more vertices on a coordinate system and connect them with a line, you get a polyline; place three or more and have the start and end vertex as the same one, you can enclose the shape and find an area; place just one, and you get a point.  Three geometry types capable of representing all the features on the Earth's surface, each geometry type combined with non-spatial data, creating the vector data type of spatial data. 

The unique properties of vector data - its ability to create fluid lines, no loss of quality when scaled, and small file size - make it an excellent medium for GIS work. We can cleanly represent data such as winding roads and rivers, peculiar shaped buildings, and objects which can be represented simply as points, such as fire hydrants - which is why 90% of the data we work with in this class, and with GIS as a whole, is vector data.

Concept Quiz

3.2.3: Recognizing Vector Data

In the last section, we looked at the idea that vector data come in three geometry types, built on a series of vertices placed at XY coordinates, and has associated non-spatial data. In GIS, "vector data" is a category of data, one of the three main data types: vector, raster, and data tables.  Within that 'vector' category, we find two categories: shapefiles and feature classes.  Even though both types are called vector files for ease, it's important to know (and recognize) the difference between them.  In this section, we will start to explore these two kinds of GIS vector files, and when each one is advantageous.  ArcGIS has a few ways to recognize both shapefiles and feature classes within ArcMap and ArcCatalog (two of the software found in the ArcGIS suite of GIS software), as well as some clues to the vector category in Windows (File) Explorer.

Shapefiles vs Feature Classes

As we've stated, vector data can be broken into to main categories: shapefiles and feature classes.  They are similar in the fact that they both display vector based data in the GIS, which is characterized by smooth lines and easy flowing design.  Both are made up a series of vertices and both have non-spatial data associated with them via attribute tables.  Both can analyze spatial patterns and measure areas and distances.  Both are small in size and do not distort when they are resized in the software. Both can be used as the inputs to geoprocessing tools (chapter seven). For the most part, they are almost exactly the same.  So what is the difference between them? 

Feature classes are a vector data type that was designed and is "owned" by ESRI, the makers of ArcGIS.  For a very long time, feature classes could only be opened inside of ArcGIS, but that has changed with the explosion of the GIS field.  People were upset that they had to use ESRI software to convert feature classes to shapefiles to use in non-ESRI software, which places a bit of a conundrum - you had to pay for one software to use it's data in another.  As a response, ESRI changed that, allowing non-ESRI software to open geodatabases and feature classes.

The power of feature classes come from, not their propitiatory nature, but from the fact that they must reside inside geodatabase. Geodatabases, as we will see in future chapter, are a special container made to hold certain kinds of spatial data, similar to a folder on your computer's desktop.  Much like you save all of the papers you wrote in freshman English in a folder named something clever like "English 101", geodatabases are a way to store all of the spatial data for a particular GIS project.  However, unlike the folders that were used to organize your classwork, geodatabases are a very powerful container.  Not only are the contents related because they all are used in one project, there are ways to inter-connect those files so they interact with each other.  These connections are very powerful in GIS, a science made up of a series of data types and non-spatial data, which we use all together to solve spatial problems.  We will take a deeper look at geodatabases later, so for now, just understand their basic concept - a special geo-container that allows for data to be stored in an interconnected way, and from those connections comes a deeper understanding of all the data types as a whole, not just individual files.

Shapefiles are a universal form of GIS vector data.  No particular software company "owns" shapefiles.  They can be seamlessly opened and edited in any GIS software, some CAD software, and even some graphic design software.  However, their universal nature limits their power within any single GIS software.  As we just saw, feature classes are a kind of vector file which resides inside a geodatabase, which gives the feature classes the power of relating to and working with other files inside that geodatabase.  Shapefiles, which can only live inside folders, are each and independent object, even if many shapefiles live inside one folder.  The geodatabase is the container with power, not the folder.  Shapefiles, however, are easy to use, do not require any other spatial data containers to exist, do not need any other files to operate successfully, and can be opened with a wide variety of software without the need to convert or import the data in any way.

As time goes on, both in GIS 101 and in your GIS career, the ability to recognize the difference between shapefiles and feature classes will become seamless.  You will even be able to select which one you'd like to use for a particular project.  For now, just understand the basic differences between the files: shapefiles are universal vector file type which live inside folders, and with that, they do not require any sort of conversion or importing to fully use them; and feature classes are a proprietary ESRI vector file type that live only inside geodatabases, but with that home comes the power of relationship.

Windows (File) Explorer ... What’s that?

Throughout this text and this class, we attempt to be clear about software and concepts. With that being said, we will take a minute to define what we mean when we refer to “Windows (File) Explorer”.

Windows (File) Explorer is a file manager which displays the hierarchical structure of files, folders, and drives on your computer. It also shows any network drives that have been mapped to drive letters on your computer. Using Windows (File) Explorer, you can copy, move, rename, and search for (non-spatial) files and folders. For example, you can open a folder that contains a Windows-based file, such as a Word document, and then drag the file to another folder or drive, successfully copying it.

  • You might also know Windows (File) Explorer as “My Computer” (left over from the Windows 98 and XP days). If your a Macintosh convert, Windows (File) Explorer is equivalent to “Finder”, and if your a Linux/Ubuntu user, Windows (File) Explorer is equivalent to (the default) Nautilus.

Also, Windows (File) Explorer is not to be confused with Internet Explorer (both made by Microsoft, thus named similarly for continuities sake). Internet Explorer is a software used to browse the internet and visit your favorite websites, most likely LearnGIS.org, while Windows (File) Explorer is used to look at the files and folder you have stored locally on the machine’s internal hard drive. (Since the operating system is called “Windows”, the file management system is simply called “Windows (File) Explorer”)

GIS data is unique from other kinds of data you have used prior in the fact that it is often not one file, but a package of files. Vector data is not one file, but actually a package of three to eight 'mini-files'. Due to this “packaging” GIS data uses, in general, we do not use Windows (File) Explorer (aka: “My Computer”/C Drive/File Explorer/that little file folder button in your task bar/Finder to you Mac folks) to open, move, rename, or edit spatial data. This is because it is so very easy to destroy GIS data by using Windows (File) Explorer. If you were to copy a GIS file from one place to another and miss one of the three to eight files that actually make up your single “roads” file, the GIS data will no longer open in ArcGIS and you can pretty much call your data corrupt.  In GIS, regardless of the suite you are working with, there will be a software or method for copying, deleting, renaming, or moving spatial data, and in the ArcGIS suite of software, we use ArcCatalog.

  • Yet, whenever someone says “in general”, that usually means there is an exception to the rule, and GIS data does indeed have an exception. The exception to this rule is when you need to deliver or store GIS data and wish to do so in ZIP format. Since GIS data is a package of files, when delivering or storing it, it really is best to do in ZIP or TAR format (depending if it is vector (ZIP) or raster (TAR)). This task can be accomplished in Windows (File) Explorer, since ArcGIS does not have a “compress data for delivery” feature.

What Are Each of the Three to Eight Files?

ExtensionDescriptionRequired?
.shpThe main file that stores the feature geometry. No attributes are stored in this file—only geometryYes
.shxA companion file to the .shp that stores the position of individual feature IDs in the .shp file.Yes
.dbfThe dBase table that stores the attribute information of features.Yes
.sbn and .sbxFiles that store the spatial index of the features.No
.atxCreated for each dBase attribute index created in ArcCatalog.No
.ixs and .mxsGeocoding index for read-write shapefiles.No
.prjThe file that stores the coordinate system information.No
.xmlMetadata for ArcGIS - stores information about the shapefile.No

3.2.4: ArcCatalog

As we’ve determined, since spatial data can be inadvertently destroyed in Windows (File) Explorer, each GIS software suite has a solution to the problem. In the case of ArcGIS, we use a program called ArcCatalog, which might be described as the “Windows (File) Explorer for Spatial Data”.  ArcCatalog’s only job is to create and organize spatial data files, including moving, copying, renaming, and deleting spatial files.

Since ArcCatalog was designed to work with spatial data, it does not see the shapefiles and feature classes as their three to eight individual files, but instead as a single file. This one feature prevents the destruction of data for lack of moving one of the individual files. As with other software, each file type has a file icon that is associated with it to quickly recognize what kind of file it is. As we see in Figure 3.2.1, Windows (File) Explorer attempts to guess what program will open each of the independent files, but with the exception of the attribute table’s .dbf extension (database file) recognized by Microsoft Excel[1], it doesn’t do a very good job.

Figure 3.4: Spatial Data as Seen in Windows (File) Explorer vs Spatial Data as Seen in ArcCatalog
arccatalog_sees_files-display
Shapefiles as Shown in Windows (File) ExplorerFeature Classes Within a Geodatabase as Shown in Windows (File) ExplorerShapefiles as Shown in ArcCatalog

ArcCatalog does darn good job, however, and even has differently colored file icons to mark when vector files are shapefiles or feature classes utilizing green or blue icon backgrounds, and which geometry type they are, using a symbol to represent polygons, polylines, and points.  

Figure 3.5: Shapefile and Feature Class File Icons
GeodatabaseFeatureClassLine32-displayGeodatabaseFeatureClassPoint32-displayGeodatabaseFeatureClassPolygon32-displayshapefile_polyline-displayshapefile_point-displayshapefile_polygon-display
Feature Class: PolylineFeature Class: PointFeature Class: PolygonShapefile: PolylineShapefile: PointShapefile:Polygon
All of the file icons can be looked up and reviewed in Appendix A of this book.  You can also find them on a page called "File Icons" in the wiki.  You can find the link button in the top menu.