Pig-cookbook

1914 palavras 8 páginas
Pig Cookbook
Table of contents
1 Overview............................................................................................................................2 2 Performance Enhancers......................................................................................................2

Copyright © 2007 The Apache Software Foundation. All rights reserved.

Pig Cookbook

1. Overview
This document provides hints and tips for pig users.

2. Performance Enhancers
2.1. Use Optimization
Pig supports various optimization rules which are turned on by default. Become familiar with these rules.

2.2. Use Types
If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with speed of arithmetic computation. It has an additional advantage of early error detection.
--Query 1 A = load 'myfile' as (t, u, v); B = foreach A generate t + u; --Query 2 A = load 'myfile' as (t: int, u: int, v); B = foreach A generate t + u;

The second query will run more efficiently than the first. In some of our queries with see 2x speedup.

2.3. Project Early and Often
Pig does not (yet) determine when a field is no longer needed and drop the field from the row. For example, say you have a query like:
A B C D E = = = = = load 'myfile' as (t, u, v); load 'myotherfile' as (x, y, z); join A by t, B by x; group C by u; foreach D generate group, COUNT($1);

There is no need for v, y, or z to participate in this query. And there is no need to carry both t and x past the join, just one will suffice. Changing the query above to the query below will greatly reduce the amount of data being carried through the map and reduce phases by pig.

Page 2

Copyright © 2007 The Apache Software Foundation. All rights reserved.

Pig Cookbook

A = load 'myfile' as (t, u, v); A1 = foreach A generate t, u; B = load

Relacionados

  • Libertação animal
    119626 palavras | 479 páginas
  • Bioterrorism
    144121 palavras | 577 páginas
  • Dicionario ingles espanhol
    58087 palavras | 233 páginas
  • Eat pray love
    133070 palavras | 533 páginas
  • Dont-make-me-think-second-edition
    49981 palavras | 200 páginas
  • Administração
    166037 palavras | 665 páginas
  • Expressoes idiomaticas americanas
    271492 palavras | 1086 páginas
  • Çkschjpkxlo
    72613 palavras | 291 páginas
  • Propriedade intelectual
    405117 palavras | 1621 páginas