Geographic Information System
A geographic information system (GIS) is a system for creating and managing spatial data and associated attributes. In the strictest sense, it is a computer system capable of integrating, storing, editing, analyzing, and displaying geographically-referenced information. In a more generic sense, GIS is a smart map tool that allows users to create interactive queries (user created searches), analyze the spatial information, and edit data.
Geographic information systems technology can be used for scientific investigations, resource management, asset management, development planning, cartography and route planning. For example, a GIS might allow emergency planners to easily calculate emergency response times in the event of a natural disaster, or a GIS might be used to find wetlands that need protection from pollution.
History of development
35,000 years ago, on the walls of caves near Lascaux, France, Cro-Magnon hunters drew pictures of the animals they hunted. Associated with the animal drawings are track lines and tallies thought to depict migration routes. These early records followed the two-element structure of modern geographic information systems: a graphic file linked to an attribute database.
In the 18th century, modern surveying techniques for topographic mapping were implemented, along with early versions of thematic mapping, e.g. for scientific or census data.
A notable example of this is John Snow's 1854 map depicting a cholera outbreak in London, which provided analysis to narrow the source of the cholera to a contaminated pump, stemming the outbreak.Images of John Snow's maps
The early 20th century saw the development of photo lithography where maps were separated into layers. Computer hardware development spurred by nuclear weapon research would lead to general purpose computer mapping applications by the early 1960s.
The year 1967 saw the development of the world's first true operational GIS in Ottawa, Ontario by the federal Department of Energy, Mines and Resources. Developed by Roger Tomlinson, it was called Canadian GIS (CGIS) and was used to store, analyse and manipulate data collected for the Canada Land Inventory (CLI)¡ªan initiative to determine the land capability for rural Canada by mapping information about soils, agriculture, recreation, wildlife, waterfowl, forestry, and land use at a scale of 1:250,000. A rating classification factor was also added to permit analysis.
CGIS was the world's first system and was an improvement over mapping applications as it provided capabilities for overlay, measurement, digitizing/scanning, supported a national coordinate system that spanned the continent, coded lines as arcs having a true embedded topology, and it stored the attribute and locational information in separate files. Its developer, geographer Roger Tomlinson, has become known as the father of GIS.
CGIS lasted into the 1990s and built the largest digital land resource data base in Canada. It was developed as a mainframe based system in support of federal and provincial resource planning and management. Its strength was continent-wide analysis of complex data sets. The CGIS was never available in a commercial form. Its initial development and success stimulated various commercial mapping applications being sold by vendors such as Intergraph. The development of micro-computer hardware spurred vendors such as ESRI, MapInfo and CARIS to successfully incorporate many of the CGIS features, combining the first generation approach to separation of spatial and attribute information with a second generation approach to organizing attribute data into database structures. The 1980s and 1990s industry growth were spurred on by the growing use of GIS on Unix workstations and the personal computer. By the end of the 20th century, the rapid growth in various systems had been consolidated and standardized on relatively few platforms and users were beginning to export the concept of viewing GIS data over the Internet, requiring data format and transfer standards.
Techniques used in GIS
Relating information from different sources
If you could relate information about the rainfall of your state to aerial photographs of your county, you might be able to tell which wetlands dry up at certain times of the year. A GIS, which can use information from many different sources in many different forms, can help with such analyses. The primary requirement for the source data consists of knowing the locations for the variables. Location may be annotated by x,y, and z coordinates of longitude, latitude, and elevation, or by other geocode systems like ZIP Codes or by highway mile markers. Any variable that can be located spatially can be fed into a GIS. Several computer databases that can be directly entered into a GIS are being produced by government agencies and non-government organizations. Different kinds of data in map form can be entered into a GIS.
A GIS can also convert existing digital information, which may not yet be in map form, into forms it can recognize and use. For example, digital satellite images generated through remote sensing can be analyzed to produce a map-like layer of digital information about vegetative covers. Another fairly developed resource for naming GIS objects is the Getty Thesaurus of Geographic Names (GTGN), which is a structured vocabulary containing around 1,000,000 names and other information about places.
Likewise, census or hydrologic tabular data can be converted to map-like form, serving as layers of thematic information in a GIS.
GIS data represents real world objects (roads, land use, elevation) with digital data. Real world objects can be divided into two abstractions: discrete objects (a house) and continuous fields (rain fall amount or elevation). There are two broad methods used to store data in a GIS for both abstractions: Raster and Vector.
Raster data type consists of rows and columns of cells where in each cell is stored a single value. Most often, raster data are images (raster images), but besides just color, the value recorded for each cell may be a discrete value, such as land use, a continuous value, such as rainfall, or a null value if no data is available. While a raster cell stores a single value, it can be extended by using raster bands to represent RGB (red, green, blue) colors, colormaps (a mapping between a thematic code and RGB value), or an extended attribute table with one row for each unique cell value. The resolution of the raster dataset is its cell width in ground units. For example, in a LIDAR raster image, each cell is a pixel that represents an area of 3 meters by 3 meters. Usually cells represent square areas of the ground, but other shapes can also be used.
Vector data type uses geometries such as points, lines (series of point coordinates), or polygons, also called areas (shapes bounded by lines), to represent objects. Examples include property boundaries for a housing subdivision represented as polygons and well locations represented as points. Vector features can be made to respect spatial integrity through the application of topology rules such as 'polygons must not overlap'. Vector data can also be used to represent continuously varying phenomena. Contour lines and triangulated irregular networks (TIN) are used to represent elevation or other continuously changing values. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface.
There are advantages and disadvantages to using a raster or vector data model to represent reality. Raster datasets record a value for all points in the area covered which may require more storage space than representing data in a vector format that can store data only where needed. Raster data also allows easy implementation of overlay operations, which are more difficult with vector data. Vector data can be displayed as vector graphics used on traditional maps, whereas raster data will appear as an image that may have a blocky appearance for object boundaries.
Additional non-spatial data can also be stored besides the spatial data represented by the coordinates of a vector geometry or the position of a raster cell. In vector data, the additional data are attributes of the object. For example, a forest inventory polygon may also have an identifier value and information about tree species. In raster data the cell value can store attribute information, but it can also be used as an identifier that can relate to records in another table.
Data capture¡ªentering information into the system¡ªconsumes much of the time of GIS practitioners. There are a variety of methods used to enter data into a GIS where it is stored in a digital format.
Existing data printed on paper or mylar maps can be digitized or scanned to produce digital data. A digitizer produces vector data as an operator traces points, lines, and polygon boundaries from a map. Scanning a map results in raster data that could be further processed to produce vector data.
Survey data can be directly entered into a GIS from digital data collection systems on survey instruments. Positions from a global positioning system (GPS), another survey tool, can also be directly entered into a GIS.
Remotely sensed data also plays an important role in data collection and consist of sensors attached to a platform. Sensors include cameras, digital scanners and LIDAR, while platforms usually consist of aircraft and satellites.
The majority of digital data currently comes from photo interpretation of aerial photographs. Soft copy workstations are used to digitize features directly from stereo pairs of digital photographs. These systems allow data to be captured in 2 and 3 dimensions, with elevations measured directly from a stereo pair using principles of photogrammetry. Currently, analog aerial photos are scanned before being entered into a soft copy system, but as high quality digital cameras become cheaper this step will be skipped.
Satellite remote sensing provides another important source of spatial data. Here satellites use different sensor packages to passively measure the reflectance from parts of the electromagnetic spectrum or radio waves that were sent out from an active sensor such as radar. Remote sensing collects raster data that can be further processed to identify objects and classes of interest, such as land cover.
When data is captured, the user should consider if the data should be captured with either a relative accuracy or absolute accuracy, since this could not only influence how information will be interpreted but also the cost of data capture.
In addition to collecting and entering spatial data, attribute data is also entered into a GIS. For vector data this includes additional information about the objects represented in the system.
After entering data into a GIS, it usually requires editing, to remove errors, or further processing. For vector data it must be made topologically correct before it can be used for some advanced analysis. For example, in a road network, lines must connect with nodes at an intersection. Errors such as undershoots and overshoots must also be removed. For scanned maps, blemishes on the source map may need to be removed from the resulting raster. For example, a fleck of dirt might connect two lines that should not be connected.
Data restructuring can be performed by a GIS to convert data into different formats. For example, a GIS may be used to convert a satellite image map to a vector structure by generating lines around all cells with the same classification, while determining the cell spatial relationships, such as adjacency or inclusion.
Since digital data are collected and stored in various ways, the two data sources may not be entirely compatible. So a GIS must be able to convert geographic data from one structure to another.
Projections, coordinate systems and registration
A property ownership map and a soils map might show data at different scales. Map information in a GIS must be manipulated so that it registers, or fits, with information gathered from other maps. Before the digital data can be analyzed, they may have to undergo other manipulations¡ªprojection and coordinate conversions, for example¡ªthat integrate them into a GIS.
The earth can be represented by various models, each of which may provide a different set of coordinates (e.g., latitude, longitude, elevation) for any given point on the earth's surface. The simplest model is to assume the earth is a perfect sphere. As more measurements of the earth have accumulated, the models of the earth have become more sophisticated and more accurate. In fact, there are models that apply to different areas of the earth to provide increased accuracy (e.g., North American Datum, 1983 - NAD83 - works well in North America, but not in Europe). See Datum for more information.
Projection is a fundamental component of map making. A projection is a mathematical means of transferring information from a model of the Earth, which represents a three-dimensional curved surface, to a two-dimensional medium¡ªpaper or a computer screen. Different projections are used for different types of maps because each projection particularly suits certain uses. For example, a projection that accurately represents the shapes of the continents will distort their relative sizes. See Map projection for more information.
Since much of the information in a GIS comes from existing maps, a GIS uses the processing power of the computer to transform digital information, gathered from sources with different projections and/or different coordinate systems, to a common projection and coordinate system.
Spatial analysis with GIS
It is difficult to relate wetlands maps to rainfall amounts recorded at different points such as airports, television stations, and high schools. A GIS, however, can be used to depict two- and three-dimensional characteristics of the Earth's surface, subsurface, and atmosphere from information points.
For example, a GIS can quickly generate a map with lines that indicate rainfall amounts.
Such a map can be thought of as a rainfall contour map. Many sophisticated methods can estimate the characteristics of surfaces from a limited number of point measurements. A two-dimensional contour map created from the surface modeling of rainfall point measurements may be overlaid and analyzed with any other map in a GIS covering the same area.
In the past 35 years, were there any gas stations or factories operating next to the swamp? Any within two miles and uphill from the swamp? A GIS can recognize and analyze the spatial relationships that exist within digitally stored spatial data. These topological relationships allow complex spatial modelling and analysis to be performed. Topological relationships between geometric entities traditionally include adjacency (what adjoins what), containment (what encloses what), and proximity (how close something is to something else).
If all the factories near a wetland were accidentally to release chemicals into the river at the same time, how long would it take for a damaging amount of pollutant to enter the wetland reserve? A GIS can simulate the routing of materials along a linear network. Values such as slope, speed limit, or pipe diameter can be incorporated into network modelling in order to represent the flow of the phenomenon more accurately. Network modelling is commonly employed in transportation planning, hydrology modelling, and infrastructure modelling.
Powerful analysis techniques with raster data. This section is a stub. You can help by adding to it.
The combination of two separate spatial datasets (points, lines or polygons) to create a new output vector dataset. These overlays are similar to mathematical Venn diagram overlays. A union overlay combines the geographic features and attribute tables of both inputs into a single new output. An intersect overlay defines the area where both inputs overlap and retains a set of attribute fields for each. A symmetric difference overlay defines an output area that includes the total area of both inputs except for the overlapping area.
Data extraction is a GIS process similar to vector overlay, though it can be used in either vector or raster data analysis. Rather than combining the properties and features of both datasets, data extraction involves using a clip or mask to extract the features of one dataset that fall within the spatial extent of another dataset.
In raster data analysis, the overlay of datasets is accomplished through a process known as local operation on multiple rasters or map algebra, through a function that combines the values of each raster's matrix. This function may weigh some inputs more than others through use of an index model that reflects the influence of various factors upon a geographic phenomenon.
Spatial Statistics (Geostatistics)
Using geostatistics to predict fields from points. Point pattern analysis. A way of looking at the statistical properties of spatial data. What makes it unique from other kinds of statistics is the use of graph theory and matrix algebra to reduce the number of parameters in the data being analyzed. This is necessary because it is actually the second-order properties of the GIS data that need analyzing.
When we measure any phenomena, our observation methods dictate the accuracy of any subsequent analysis. Whether our study is concerned with the nature of traffic patterns in an urban core, or with the analysis of weather patterns over the Pacific, there will always contain a variable or a degree of precision which escapes our measurement; this is determined directly by the scale and distribution of our data collection, or survey methods. In order to apply statistical relevance to spatial analysis, an 'average' must be determined so that points, or gradients, outside of any immediate measurement may be included as to their predicted behavior. Limitations in statistics and data collection mean that it is impossible to directly measure a contiuum without the inferential methods of analysis, of which, several forms of interpolation are used in order to predict the behavior of particles and locations not directly measured.
Interpolation is the process by which a surface is created, usually a raster dataset, through the input of data collected at a number of sample points. There are several forms of interpolation, each which treats the data differently, depending on the properties of the dataset. In comparing interpolation methods, the first consideration should be whether or not the source data will change (exact or approximate). Next is whether the method is subjective, a human interpretation, or objective. Then there is the nature of transitions between points, are they abrupt or gradual. Finally there is whether a method is global, it uses the entire dataset to form the model, or local, an algorithm is repeated for a small section of terrain.
Digital Elevation Models (DEM), Digital Terrain Models (DTM), Triangulated Irregular Networks (TIN), Edge finding algorithms, Theissen Polygons, Fourier analysis, Weighted moving averages, Inverse Distance Weighted, Moving averages, Kriging, Spine, Trend surface analysis.
Regionalized variable theory
Spatial Autocorrelation Principle: Data collected at any position will have a greater similarity to, or influence on, those locations within its immediate vicinity. This section is a stub. You can help by adding to it.
Calculating spatial locations (X,Y coordinates) from street addresses. A reference theme is required to geocode individual addresses, such as a road centerline file with address ranges. The individual address locations are interpolated, or estimated, by examining address ranges along a road segment. These are usually provided in the form of a table or database. The GIS will then place a dot approximately where that address belongs along the segment of centerline. For example, an address point of 500 will be at the midpoint of a line segment that starts with address 1 and ends with address 1000. Geocoding can also be applied against actual parcel data, typically from municipal tax maps. In this case, the result of the geocoding will be an actually positioned space as opposed to an interpolated point.
It should be noted that there are several (potentially dangerous) caveats that are often overlooked when using interpolation. See the full entry for Geocoding for more information.
Various algorithms are used to help with address matching when the spellings of addresses differ. Address information that a particular entity or organization has data on, such as the post office, may not entirely match the reference theme. There could be variations in street name spelling, community name, etc. Consequently, the user generally has the ability to make matching criteria more stringent, or to relax those parameters so that more addresses will be mapped. Care must be taken to review the results so as not to erroneously map addresses incorrectly due to overzealous matching parameters.
Reverse geocoding is the process of returning an estimated street address number as it relates to a given coordinate. For example, a user can click on a road centerline theme (thus providing a coordinate) and have information returned that reflects the estimated house number. This house number is interpolated from a range assigned to that road segment. If the user clicks at the midpoint of a segment that starts with address 1 and ends with 100, the returned value will be somewhere near 50. Note that reverse geocoding does not return actual addresses, only estimates of what should be there based on the predetermined range.
Data output and cartography
Cartography is the design and production of maps, or visual representations of spatial data. The vast majority of modern cartography is done with the help of computers, usually using a GIS. Most GIS software gives the user substantial control over the appearance of the data.
Cartographic work serves two major functions:
First, it produces graphics on the screen or on paper that convey the results of analysis to the people who make decisions about resources. Wall maps and other graphics can be generated, allowing the viewer to visualize and thereby understand the results of analyses or simulations of potential events. Web Map Servers facilitate distribution of generated maps via the web technology.
Second, other database information can be generated for further analysis or use. A list of all addresses within 1 mile of a toxic spill for instance.
Graphic display techniques
Traditional maps are abstractions of the real world, a sampling of important elements portrayed on a sheet of paper with symbols to represent physical objects. People who use maps must interpret these symbols. Topographic maps show the shape of land surface with contour lines; the actual shape of the land can be seen only in the mind's eye.
Today, graphic display techniques such as shading based on altitude in a GIS can make relationships among map elements visible, heightening one's ability to extract and analyze information. For example, two types of data were combined in a GIS to produce a perspective view or a portion of San Mateo County, California. The digital elevation model, consisting of surface elevations recorded on a 30-meter horizontal grid, shows high elevations as white and low elevation as black. The accompanying Landsat Thematic Mapper image shows a false-color infrared image looking down at the same area in 30-meter pixels, or picture elements, for the same coordinate points, pixel by pixel, as the elevation information.
A GIS was used to register and combine the two images to render the three-dimensional perspective view looking down the San Andreas Fault, using the Thematic Mapper image pixels, but shaded using the elevation of the landforms. The GIS display depends on the viewing point of the observer and time of day of the display, to properly render the shadows created by the sun's rays at that latitude, longitude, and time of day.
The future of GIS
Many disciplines can benefit from GIS techniques. An active GIS market has resulted in lower costs and continual improvements in the hardware and software components of GIS. These developments will, in turn, result in a much wider use of the technology throughout science, government, business, and industry, with applications including real estate, public health, crime mapping, national defense, sustainable development, natural resources, transportation & logistics.
Open Geospatial Consortium (OGC) Open GIS Consortium,OGC in short is an international industry consortium of 257 companies, government agencies and universities participating in a consensus process to develop publicly available geoprocessing specifications. Open interfaces and protocols defined by OpenGIS Specifications support interoperable solutions that geo-enable the Web, wireless and location-based services, and mainstream IT, and empower technology developers to make complex spatial information and services accessible and useful with all kinds of applications.
Compliant Products, that is, software products that comply to OGC's OpenGIS? Specifications. When a product has been tested and certified as compliant through the OGC Testing Program, the product is automatically registered as compliant on this site.
Implementing Products, that is, software products that implement OpenGIS Specifications but have not yet passed a compliance test. (Compliance tests are not available for all specifications.) Developers can register their products as implementing draft or approved specifications. (OGC reserves the right to review and verify each entry.) This section is a stub. You can help by adding to it.
Open Source GIS Software
The use of open-source software is not new, but adoption in the GIS industry is a new phenomenon. With the broad use of non-proprietary data formats such as the Shape File format for vector data and the Geotiff format for raster data, as well as the adoption of Open Geospatial Consortium (OGC) protocols such as Web Mapping Service (WMS) and Web Feature Service (WFS), the barrier has been lowered for productive development using open source software, especially for web and web service oriented applications.
Google Maps is different from other web map servers (like MapQuest, Yahoo! Maps, or Rand McNally) because Google Maps exposes an API that enables users to associate attributes with interactive maps. This is in effect a GIS. However Google Maps is largely point oriented and other than using different point markers, you have to click on the markers to get the metadata.
Global change and climate history program
Maps have traditionally been used to explore the Earth and to exploit its resources. GIS technology, as an expansion of cartographic science, has enhanced the efficiency and analytic power of traditional mapping. Now, as the scientific community recognizes the environmental consequences of human activity, GIS technology is becoming an essential tool in the effort to understand the process of global change. Various map and satellite information sources can combine in modes that simulate the interactions of complex natural systems.
Through a function known as visualization, a GIS can be used to produce images - not just maps, but drawings, animations, and other cartographic products. These images allow researchers to view their subjects in ways that literally never have been seen before. The images often are equally helpful in conveying the technical concepts of GIS study-subjects to non-scientists.
Adding the dimension of time
The condition of the Earth's surface, atmosphere, and subsurface can be examined by feeding satellite data into a GIS. GIS technology gives researchers the ability to examine the variations in Earth processes over days, months, and years.
As an example, the changes in vegetation vigor through a growing season can be animated to determine when drought was most extensive in a particular region. The resulting graphic, known as a normalized vegetation index, represents a rough measure of plant health. Working with two variables over time would then allow researchers to detect regional differences in the lag between a decline in rainfall and its effect on vegetation.
GIS technology and the availability of digital data on regional and global scales enable such analyses. The satellite sensor output used to generate a vegetation graphic is produced by the Advanced Very High Resolution Radiometer or AVHRR. This sensor system detects the amounts of energy reflected from the Earth's surface across various bands of the spectrum for surface areas of about 1 square kilometer. The satellite sensor produces images of a particular location on the Earth twice a day. AVHRR is only one of many sensor systems used for Earth surface analysis. More sensors will follow, generating ever greater amounts of data.
GIS and related technology will help greatly in the management and analysis of these large volumes of data, allowing for better understanding of terrestrial processes and better management of human activities to maintain world economic vitality and environmental quality.
In addition to the integration of time in environmental studies, GIS is also being explored for its ability to track and model the progress of humans throughout their daily routines. A concrete example of progress in this area is the recent release of time-specific population data by the US Census. In this data set, the populations of cities are shown for daytime and evening hours highlighting the pattern of concentration and dispersion generated by North American commuting patters. The manipulation and generation of data required to produce this data would not have been possible without GIS.