Author: Chuck Lam
Publisher: N.A
ISBN: 9788177228137
Page: 336
View: 2326
Special Features: · Introduction to MapReduce· Examples illustrating ideas in practice· Hadoop's Streaming API· Other related tools, like Pig and Hive About The Book: Hadoop in Action introduces the subject and teaches you how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.This book requires basic Java skills. Knowing basic statistical concepts can help with the more advanced examples.

Hadoop in 24 Hours, Sams Teach Yourself

Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134456726
Category: Computers
Page: 496
View: 4710
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Big Data Analytics with R and Hadoop

Author: Vignesh Prajapati
Publisher: Packt Publishing Ltd
ISBN: 1782163298
Category: Computers
Page: 238
View: 5276
Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

Hadoop Beginner's Guide

Author: Garry Turkington
Publisher: Packt Publishing Ltd
ISBN: 1849517304
Category: Computers
Page: 398
View: 8459
Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills. "Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems. Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems. While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection. In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.

The Stances of e-Government

Policies, Processes and Technologies
Author: Puneet Kumar,Vinod Kumar Jain,Kumar Sambhav Pareek
Publisher: CRC Press
ISBN: 135139617X
Category: Computers
Page: 206
View: 6967
This book focuses on the three inevitable facets of e-government, namely policies, processes and technologies. The policies discusses the genesis and revitalization of government policies; processes talks about ongoing e-government practices across developing countries; technology reveals the inclusion of novel technologies.

Computer Networks

21st International Conference, CN 2014, Brunów, Poland, June 23-27, 2014. Proceedings
Author: Andrzej Kwiecien,Piotr Gaj,Piotr Stera
Publisher: Springer
ISBN: 3319079417
Category: Computers
Page: 349
View: 880
This book constitutes the thoroughly refereed proceedings of the 21st International Conference on Computer Networks, CN 2014, held in Brunów, Poland, in June 2014. The 34 revised full papers presented were carefully reviewed and selected for inclusion in the book. The papers in these proceedings cover the following topics: computer networks, tele informatics and communications, new technologies, queueing theory, innovative applications and networked and IT-related aspects of e-business.

Hadoop Real-World Solutions Cookbook

Author: Jonathan R. Owens,Brian Femiano,Jon Lentz
Publisher: Packt Publishing Ltd
ISBN: 1849519137
Category: Computers
Page: 316
View: 7004
Realistic, simple code examples to solve problems at scale with Hadoop and related technologies.

Apache Oozie

The Workflow Scheduler for Hadoop
Author: Mohammad Kamrul Islam,Aravind Srinivasan
Publisher: "O'Reilly Media, Inc."
ISBN: 1449369758
Category: Computers
Page: 272
View: 1569
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases. Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities. Install and configure an Oozie server, and get an overview of basic concepts Journey through the world of writing and configuring workflows Learn how the Oozie coordinator schedules and executes workflows based on triggers Understand how Oozie manages data dependencies Use Oozie bundles to package several coordinator apps into a data pipeline Learn about security features and shared library management Implement custom extensions and write your own EL functions and actions Debug workflows and manage Oozie’s operational details

Hadoop Application Architectures

Designing Real-World Big Data Applications
Author: Mark Grover,Ted Malaska,Jonathan Seidman,Gwen Shapira
Publisher: "O'Reilly Media, Inc."
ISBN: 1491900059
Category: Computers
Page: 400
View: 9939
Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case. To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process. This book covers: Factors to consider when using Hadoop to store and model data Best practices for moving data in and out of the system Data processing frameworks, including MapReduce, Spark, and Hive Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics Giraph, GraphX, and other tools for large graph processing on Hadoop Using workflow orchestration and scheduling tools such as Apache Oozie Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume Architecture examples for clickstream analysis, fraud detection, and data warehousing

Professional Hadoop Solutions

Author: Boris Lublinsky,Kevin T. Smith,Alexey Yakubovich
Publisher: John Wiley & Sons
ISBN: 1118824180
Category: Computers
Page: 504
View: 1459
The go-to guidebook for deploying Big Data solutions withHadoop Today's enterprise architects need to understand how the Hadoopframeworks and APIs fit together, and how they can be integrated todeliver real-world solutions. This book is a practical, detailedguide to building and implementing those solutions, with code-levelinstruction in the popular Wrox tradition. It covers storing datawith HDFS and Hbase, processing data with MapReduce, and automatingdata processing with Oozie. Hadoop security, running Hadoop withAmazon Web Services, best practices, and automating Hadoopprocesses in real time are also covered in depth. With in-depth code examples in Java and XML and the latest onrecent additions to the Hadoop ecosystem, this complete resourcealso covers the use of APIs, exposing their inner workings andallowing architects and developers to better leverage and customizethem. The ultimate guide for developers, designers, and architectswho need to build and deploy Hadoop applications Covers storing and processing data with various technologies,automating data processing, Hadoop security, and deliveringreal-time solutions Includes detailed, real-world examples and code-levelguidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in theprogrammer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprisearchitects and developers need to maximize the power of Hadoop.

Pig in Action

Munging Big Data
Author: M. Tim Jones
Publisher: Manning Publications Company
ISBN: 9781617291586
Category: Computers
Page: 325
View: 7592
It's notoriously difficult to query Hadoop data using standard Map/Reduce programming techniques. Pig and the Pig Latin scripting language provide a SQL-like platform that simplifies query construction against data sets in Hadoop, eases the obstacle of Map/Reduce, and opens the door to processing large data sets for casual users, including experimentation on data sets. And it stands up well under stress—Yahoo uses Pig for over half the queries it runs on the world's largest Hadoop cluster. Pig in Action introduces Pig and the Pig Latin language while teaching the fundamentals of big data processing. Readers will explore the intersection of business and data science as they walk through practical questions like executing standard queries, establishing automated data management processes and policies, and developing useful reports. Most importantly, they'll learn techniques to extract valuable insights from data while mastering the features of Pig.

Web Scalability for Startup Engineers

Author: Artur Ejsmont
Publisher: McGraw Hill Professional
ISBN: 0071843663
Category: Computers
Page: 432
View: 3967
This invaluable roadmap for startup engineers reveals how to successfully handle web application scalability challenges to meet increasing product and traffic demands. Web Scalability for Startup Engineers shows engineers working at startups and small companies how to plan and implement a comprehensive scalability strategy. It presents broad and holistic view of infrastructure and architecture of a scalable web application. Successful startups often face the challenge of scalability, and the core concepts driving a scalable architecture are language and platform agnostic. The book covers scalability of HTTP-based systems (websites, REST APIs, SaaS, and mobile application backends), starting with a high-level perspective before taking a deep dive into common challenges and issues. This approach builds a holistic view of the problem, helping you see the big picture, and then introduces different technologies and best practices for solving the problem at hand. The book is enriched with the author's real-world experience and expert advice, saving you precious time and effort by learning from others' mistakes and successes. Language-agnostic approach addresses universally challenging concepts in Web development/scalability—does not require knowledge of a particular language Fills the gap for engineers in startups and smaller companies who have limited means for getting to the next level in terms of accomplishing scalability Strategies presented help to decrease time to market and increase the efficiency of web applications

Vehicle, Mechatronics and Information Technologies

Author: X.D. Yu
Publisher: Trans Tech Publications Ltd
ISBN: 3038262013
Category: Technology & Engineering
Page: 5174
View: 2080
Collection of selected, peer reviewed papers from the 2013 International Conference on Vehicle & Mechanical Engineering and Information Technology (VMEIT 2013), August 17-18, 2013, Zhengzhou, Henan, China. The 1094 papers are grouped as follows: Chapter 1: Design and Researches in Area of Vehicle and General Mechanical Engineering; Chapter 2: Mechatronics, Automation and Control; Chapter 3: Measurement and Instrumentation, Monitoring and Detection Technologies, Fault Diagnosis; Chapter 4: Computation Methods and Algorithms for Modeling, Simulation and Optimization, Data Mining and Data Processing; Chapter 5: Information Technologies, WEB and Networks Engineering, Information Security, Software Application and Development; Chapter 6: Power and Electric Systems, Electronics and Microelectronics, Embedded and Integrated Systems; Chapter 7: Communication, Signal and Image Processing, Data Acquisition, Identification and Recognition Technologies; Chapter 8: Information Technologies in Urban and Civil Engineering, Medicine and Biotechnology; Chapter 9: Material Science and Manufacturing Technology; Chapter 10: Information Technology in Management Engineering, Logistics, Economics, Finance, Assessment; Chapter 11: Related Themes.

Learning Hadoop 2

Author: Garry Turkington,Gabriele Modena
Publisher: Packt Publishing Ltd
ISBN: 1783285524
Category: Computers
Page: 382
View: 3425
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

Impala in Action

Querying and Mining Big Data
Author: Ricky Saltzer,Istvan Szegedi,Richard L. Saltzer,Paul De Schacht
Publisher: N.A
ISBN: 9781617291982
Category: Computers
Page: 250
View: 8509
Hadoop queries in Pig or Hive can be too slow for real-time data analysis. Impala, an ultra-speedy query engine from Cloudera, supercharges Hadoop by avoiding the typical Map-Reduce overhead and parallelizing queries so that they can run on multiple nodes. This is a big deal for big data, because with Impala, querying Hadoop takes seconds rather than minutes. Impala's dialect is close to standard SQL, and Impala seamlessly accesses HBase and HDFS (Hadoop Distributed File System), allowing considerable freedom in choice of data formats. Impala in Action is a hands-on guide to querying Hadoop using Impala. It starts by comparing Impala to traditional databases and database services on Hadoop. Then it explains Impala's SQL dialect and the basics of data access. Next, it tackles data visualization tasks and provides techniques for securing Impala with Apache Sentry. The book also shows how to embed Impala queries in a Java client and how to connect to JDBC and ODBC clients. Advanced readers will appreciate the deep dive into Impala's architecture and the practical insights into the issues complicated configurations and complex queries can cause. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

Hadoop: The Definitive Guide

The Definitive Guide
Author: Tom White
Publisher: "O'Reilly Media, Inc."
ISBN: 9780596551360
Category: Computers
Page: 528
View: 3287
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!

Big data

La revolución de los datos masivos
Author: Viktor Mayer-Schönberger,Kenneth Cukier
Publisher: Turner
ISBN: 8415427816
Category: Computers
Page: N.A
View: 1006
Un análisis esclarecedor sobre uno de los grandes temas de nuestro tiempo, y sobre el inmenso impacto que tendrá en la economía, la ciencia y la sociedad en general. Los datos masivos representan una revolución que ya está cambiando la forma de hacer negocios, la sanidad, la política, la educación y la innovación. Dos grandes expertos en la materia analizan qué son los datos masivos, cómo nos pueden cambiar la vida, y qué podemos hacer para defendernos de sus riesgos. Un gran ensayo, único en español, pionero en su campo, y que se adelanta a una tendencia que crece a un ritmo frenético.

Big Data

Architettura, tecnologie 
e metodi per l’utilizzo 
di grandi basi di dati
Author: Alessandro Rezzani
Publisher: Maggioli Editore
ISBN: 8838789894
Category: Business & Economics
Page: 320
View: 6977
Ogni giorno nel mondo vengono creati miliardi di dati digitali. Questa mole di informazione proviene dal notevole incremento di dispositivi che automatizzano numerose operazioni – record delle transazioni di acquisto e segnali GPS dei cellulari, per esempio – e dal Web: foto, video, post, articoli e contenuti digitali generati e diffusi dagli utenti tramite i social media. L’elaborazione di questi “big data” richiede elevate capacità di calcolo, tecnologie e risorse che vanno ben al di là dei sistemi convenzionali di gestione e immagazzinamento dei dati. Il testo esplora il mondo dei “grandi dati” e ne offre una descrizione e classificazione, presentando le opportunità che possono derivare dal loro utilizzo. Descrive le soluzioni software e hardware dedicate, riservando ampio spazio alle implementazioni Open Source e alle principali offerte cloud. Si propone dunque come una guida approfondita agli strumenti e alle tecnologie che permettono l’analisi e la gestione di grandi quantità di dati. Il volume è dedicato a chi, in università e in azienda (database administrator, IT manager, professionisti di Business Intelligence) intende approfondire le tematiche relative ai big data. È, inoltre, un valido supporto per il management aziendale per comprendere come ottenere informazioni utilizzabili nei processi decisionali. Alessandro Rezzani insegna presso l’Università Bocconi di Milano. È esperto di progettazione e implementazione di Data Warehouse, di processi ETL, database multidimensionali e soluzioni di reporting. Attualmente si occupa di disegno e implementazione di soluzioni di Business Intelligence presso Factory Software. Con Apogeo Education ha pubblicato “Business Intelligence. Processi, metodi, utilizzo in azienda”, 2012.

Hadoop MapReduce v2 Cookbook - Second Edition

Author: Thilina Gunarathne
Publisher: Packt Publishing Ltd
ISBN: 1783285486
Category: Computers
Page: 322
View: 824
If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.