Machine learning on geospatial big data

Van Zyl, T

dc.contributor.author	Van Zyl, T
dc.date.accessioned	2014-09-30T13:24:59Z
dc.date.available	2014-09-30T13:24:59Z
dc.date.issued	2014-02
dc.identifier.citation	Van Zyl, T. 2014. Machine learning on geospatial big data. In: Big Data: Techniques and Technologies in Geoinformatics, CRC Press: London, UK, pp 131-147	en_US
dc.identifier.isbn	978-1-4665-8651-2
dc.identifier.uri	http://www.crcpress.com/product/isbn/9781466586512
dc.identifier.uri	http://hdl.handle.net/10204/7703
dc.description	Copyright: 2014 CRC Press, London, UK. Abstract only attached.	en_US
dc.description.abstract	When trying to understand the difference between machine learning and statistics, it is important to note that it is not so much the set of techniques and theory that are used but more importantly the intended use of the results. In fact, many of the underpinnings of machine learning are statistical in nature. When considering statistics, the main intent of statistics is in gaining an understanding of the underlying system, in this case geospatial system, through an analysis of observations or data about the system. Here, the geostatistician or environmental modeller is interested in cause and effect in the underlying system and gaining a deeper understanding of system itself. As a result of the need for environmental modellers and geostatisticians to gain an understanding of the underlying system, it is important that the eventual statistical model be interpretable, that is, not a black box. In fact, one reason for the limited use of machine learning algorithms has historically been exactly the lack of interpretability. Machine learning, on the other hand, is more focused on learning from observations of a system so as to be able to automate functionality. Here, the intention is not one of understanding but more one of engineering. For instance, in machine learning, a model may be trained so as to do automated classification of new unlabelled observations, to forecast future observations of some system or automatically spot anomalous events (Vatsavai et al. 2012). Geospatial big data present two opportunities for the increased use of machine learning in the geospatial analytics domain. First, geospatial big data have created a shift towards considering large amounts of data as a resource that can be used to add value to an organization. Second, by virtue of the three V’s, volume, velocity, and variety, of big data, there is a shift away from complex models that require extensive computational and memory resources to techniques that instead can produce results in a more computationally efficient manner. Both of these opportunities provide a space in which black box solutions that produce usable results are more valuable than a strict need for interpretability and transparency.	en_US
dc.language.iso	en	en_US
dc.publisher	CRC Press	en_US
dc.relation.ispartofseries	Workflow;13340
dc.subject	Geospatial big data	en_US
dc.subject	Machine data learning	en_US
dc.subject	Geoinformatics	en_US
dc.subject	Statistical data processing	en_US
dc.subject	High performance computing	en_US
dc.title	Machine learning on geospatial big data	en_US
dc.type	Book Chapter	en_US
dc.identifier.apacitation	Van Zyl, T. (2014). Machine learning on geospatial big data., <i>Workflow;13340</i> CRC Press. http://hdl.handle.net/10204/7703	en_ZA
dc.identifier.chicagocitation	Van Zyl, T. "Machine learning on geospatial big data" In <i>WORKFLOW;13340</i>, n.p.: CRC Press. 2014. http://hdl.handle.net/10204/7703.	en_ZA
dc.identifier.vancouvercitation	Van Zyl T. Machine learning on geospatial big data.. Workflow;13340. [place unknown]: CRC Press; 2014. [cited yyyy month dd]. http://hdl.handle.net/10204/7703.	en_ZA
dc.identifier.ris	TY - Book Chapter AU - Van Zyl, T AB - When trying to understand the difference between machine learning and statistics, it is important to note that it is not so much the set of techniques and theory that are used but more importantly the intended use of the results. In fact, many of the underpinnings of machine learning are statistical in nature. When considering statistics, the main intent of statistics is in gaining an understanding of the underlying system, in this case geospatial system, through an analysis of observations or data about the system. Here, the geostatistician or environmental modeller is interested in cause and effect in the underlying system and gaining a deeper understanding of system itself. As a result of the need for environmental modellers and geostatisticians to gain an understanding of the underlying system, it is important that the eventual statistical model be interpretable, that is, not a black box. In fact, one reason for the limited use of machine learning algorithms has historically been exactly the lack of interpretability. Machine learning, on the other hand, is more focused on learning from observations of a system so as to be able to automate functionality. Here, the intention is not one of understanding but more one of engineering. For instance, in machine learning, a model may be trained so as to do automated classification of new unlabelled observations, to forecast future observations of some system or automatically spot anomalous events (Vatsavai et al. 2012). Geospatial big data present two opportunities for the increased use of machine learning in the geospatial analytics domain. First, geospatial big data have created a shift towards considering large amounts of data as a resource that can be used to add value to an organization. Second, by virtue of the three V’s, volume, velocity, and variety, of big data, there is a shift away from complex models that require extensive computational and memory resources to techniques that instead can produce results in a more computationally efficient manner. Both of these opportunities provide a space in which black box solutions that produce usable results are more valuable than a strict need for interpretability and transparency. DA - 2014-02 DB - ResearchSpace DP - CSIR KW - Geospatial big data KW - Machine data learning KW - Geoinformatics KW - Statistical data processing KW - High performance computing LK - https://researchspace.csir.co.za PY - 2014 SM - 978-1-4665-8651-2 T1 - Machine learning on geospatial big data TI - Machine learning on geospatial big data UR - http://hdl.handle.net/10204/7703 ER -	en_ZA

Files in this item

Name: vanZyl_2014_ABSTRACT ...

Size: 161.4Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Book Chapters

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.