Guerrilla Analytics: A Practical Approach to Working with Data

Guerrilla Analytics: A Practical Approach to Working with Data

Enda Ridge

Language: English

Pages: 276

ISBN: 0128002182

Format: PDF / Kindle (mobi) / ePub


Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is often limited. Despite these disruptions, a data science team must get off the ground fast and begin demonstrating value with traceable, tested work products. This is when you need Guerrilla Analytics.

 In this book, you will learn about:

The Guerrilla Analytics Principles:

simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting.

Reproducible, traceable analytics:

how to design and implement work products that are reproducible, testable and stand up to external scrutiny.

Practice tips and war stories

: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research.

Preparing for battle:

how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions.

Data gymnastics:

over a dozen analytics patterns that your team will encounter again and again in projects

  • The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting
  • Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny
  • Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research
  • Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions
  • Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Wireshark 101: Essential Skills for Network Analysis

Modern Computer Algebra (3rd Edition)

Vehicular Networks: From Theory to Practice (Chapman & Hall/CRC Computer & Information Science Series)

3D Game Programming All in One (Course Technology PTR Game Development Series)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

the largest duplicate issues. Figure 65 Identify duplicates without removing data Pattern 5: Uniquely Identify a Row of Data With a Hash Because you are often extracting data from a multitude of disparate sources, these sources will either have completely incompatible ID fields or will have no ID at all such as in the case of web pages and spreadsheets. A key skill is the ability to deterministically ID a row of data. The best way to do this is with a hash code. However, it is not simply a

register, 187 test result, 188 without naming conventions, 96 Date field, 123 Date-cleaning code, 142 Debugging intermediate datasets, 86 Decouple data, 59 DME, See Data manipulation environment (DME) Dynamic data workflow, Guerrilla analytics, 18 E Encryption, 228, 235 encrypting media, 235 moving data portable media, 235 End-to-end code, 87 Environment Guerrilla analytics, 90 External data analytics projects, challenges, 20 External software and libraries version control

to query data out of the system. Back-end data extractions are desirable in a project because they provide data in its most raw form, and therefore in a form that is most flexible for the analytics team. The data has not yet been presented in the application layer, where it is often modified for user convenience. However, in this type of data extraction you will typically encounter a variety of database systems, each of which will have their own flavors of programming languages and their own

data that should not be part of the analysis. Filtering is the data manipulation step of removing data records. But if records are removed, you lose the profile of the original data and the ability to try out combinations of filters to assess their impact on the data. 8.6.2. Guerrilla Analytics approach Instead of deleting filtered records, flag these records for removal. This is like a simple switch. When the filter switch is on, the record is not included in an analysis. When the filter

time. • Test scripts are simplified and this reduces the chance of bugs appearing in test scripts. 14.6.4. Practice Tip 86: Automate Build Test Execution and Reporting 14.6.4.1. Guerrilla Analytics Environment As the test suite grows, it becomes increasingly cumbersome to manually execute all the test scripts and check for passes or fails. In a Guerrilla Analytics project, the build will change frequently and so tests should be executed and reported on. If test execution and reporting is a

Download sample

Download