Hadoop in Action

Author: Chuck Lam,Mark Davis,Ajit Gaddam
Publisher: Manning Publications
ISBN: 9781617291227
Page: 525
View: 9953
The massive datasets required for most modern businesses are too large to safely store and efficiently process on a single server. Hadoop is an open source data processing framework that provides a distributed file system that can manage data stored across clusters of servers and implements the MapReduce data processing model so that users can effectively query and utilize big data. The new Hadoop 2.0 is a stable, enterprise-ready platform supported by a rich ecosystem of tools and related technologies such as Pig, Hive, YARN, Spark, Tez, and many more. Hadoop in Action, Second Edition, provides a comprehensive introduction to Hadoop and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show how Hadoop can be used in more complex data analysis tasks. It covers how YARN, new in Hadoop 2, simplifies and supercharges resource management to make streaming and real-time applications more feasible. Included are best practices and design patterns of MapReduce programming. The book expands on the first edition by enhancing coverage of important Hadoop 2 concepts and systems, and by providing new chapters on data management and data science that reinforce a practical understanding of Hadoop. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

Hadoop in 24 Hours, Sams Teach Yourself

Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134456726
Category: Computers
Page: 496
View: 5766
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Big Data Analytics with R and Hadoop

Author: Vignesh Prajapati
Publisher: Packt Publishing Ltd
ISBN: 1782163298
Category: Computers
Page: 238
View: 7871
Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

The Stances of e-Government

Policies, Processes and Technologies
Author: Puneet Kumar,Vinod Kumar Jain,Kumar Sambhav Pareek
Publisher: CRC Press
ISBN: 135139617X
Category: Computers
Page: 206
View: 5274
This book focuses on the three inevitable facets of e-government, namely policies, processes and technologies. The policies discusses the genesis and revitalization of government policies; processes talks about ongoing e-government practices across developing countries; technology reveals the inclusion of novel technologies.

Computer Networks

21st International Conference, CN 2014, Brunów, Poland, June 23-27, 2014. Proceedings
Author: Andrzej Kwiecien,Piotr Gaj,Piotr Stera
Publisher: Springer
ISBN: 3319079417
Category: Computers
Page: 349
View: 334
This book constitutes the thoroughly refereed proceedings of the 21st International Conference on Computer Networks, CN 2014, held in Brunów, Poland, in June 2014. The 34 revised full papers presented were carefully reviewed and selected for inclusion in the book. The papers in these proceedings cover the following topics: computer networks, tele informatics and communications, new technologies, queueing theory, innovative applications and networked and IT-related aspects of e-business.

Data Algorithms

Recipes for Scaling Up with Hadoop and Spark
Author: Mahmoud Parsian
Publisher: "O'Reilly Media, Inc."
ISBN: 1491906154
Category: Computers
Page: 778
View: 1659
If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects. Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark. Topics include: Market basket analysis for a large set of transactions Data mining algorithms (K-means, KNN, and Naive Bayes) Using huge genomic data to sequence DNA and RNA Naive Bayes theorem and Markov chains for data and market prediction Recommendation algorithms and pairwise document similarity Linear regression, Cox regression, and Pearson correlation Allelic frequency and mining DNA Social network analysis (recommendation systems, counting triangles, sentiment analysis)

Hadoop Real-World Solutions Cookbook

Author: Jonathan R. Owens,Brian Femiano,Jon Lentz
Publisher: Packt Publishing Ltd
ISBN: 1849519137
Category: Computers
Page: 316
View: 9642
Realistic, simple code examples to solve problems at scale with Hadoop and related technologies.

Apache Oozie

The Workflow Scheduler for Hadoop
Author: Mohammad Kamrul Islam,Aravind Srinivasan
Publisher: "O'Reilly Media, Inc."
ISBN: 1449369758
Category: Computers
Page: 272
View: 2477
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases. Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities. Install and configure an Oozie server, and get an overview of basic concepts Journey through the world of writing and configuring workflows Learn how the Oozie coordinator schedules and executes workflows based on triggers Understand how Oozie manages data dependencies Use Oozie bundles to package several coordinator apps into a data pipeline Learn about security features and shared library management Implement custom extensions and write your own EL functions and actions Debug workflows and manage Oozie’s operational details

Pig in Action

Munging Big Data
Author: M. Tim Jones
Publisher: Manning Publications Company
ISBN: 9781617291586
Category: Computers
Page: 325
View: 7170
It's notoriously difficult to query Hadoop data using standard Map/Reduce programming techniques. Pig and the Pig Latin scripting language provide a SQL-like platform that simplifies query construction against data sets in Hadoop, eases the obstacle of Map/Reduce, and opens the door to processing large data sets for casual users, including experimentation on data sets. And it stands up well under stress—Yahoo uses Pig for over half the queries it runs on the world's largest Hadoop cluster. Pig in Action introduces Pig and the Pig Latin language while teaching the fundamentals of big data processing. Readers will explore the intersection of business and data science as they walk through practical questions like executing standard queries, establishing automated data management processes and policies, and developing useful reports. Most importantly, they'll learn techniques to extract valuable insights from data while mastering the features of Pig.

Web Scalability for Startup Engineers

Author: Artur Ejsmont
Publisher: McGraw Hill Professional
ISBN: 0071843663
Category: Computers
Page: 432
View: 7824
This invaluable roadmap for startup engineers reveals how to successfully handle web application scalability challenges to meet increasing product and traffic demands. Web Scalability for Startup Engineers shows engineers working at startups and small companies how to plan and implement a comprehensive scalability strategy. It presents broad and holistic view of infrastructure and architecture of a scalable web application. Successful startups often face the challenge of scalability, and the core concepts driving a scalable architecture are language and platform agnostic. The book covers scalability of HTTP-based systems (websites, REST APIs, SaaS, and mobile application backends), starting with a high-level perspective before taking a deep dive into common challenges and issues. This approach builds a holistic view of the problem, helping you see the big picture, and then introduces different technologies and best practices for solving the problem at hand. The book is enriched with the author's real-world experience and expert advice, saving you precious time and effort by learning from others' mistakes and successes. Language-agnostic approach addresses universally challenging concepts in Web development/scalability—does not require knowledge of a particular language Fills the gap for engineers in startups and smaller companies who have limited means for getting to the next level in terms of accomplishing scalability Strategies presented help to decrease time to market and increase the efficiency of web applications

Professional Hadoop Solutions

Author: Boris Lublinsky,Kevin T. Smith,Alexey Yakubovich
Publisher: John Wiley & Sons
ISBN: 1118824180
Category: Computers
Page: 504
View: 3800
The go-to guidebook for deploying Big Data solutions withHadoop Today's enterprise architects need to understand how the Hadoopframeworks and APIs fit together, and how they can be integrated todeliver real-world solutions. This book is a practical, detailedguide to building and implementing those solutions, with code-levelinstruction in the popular Wrox tradition. It covers storing datawith HDFS and Hbase, processing data with MapReduce, and automatingdata processing with Oozie. Hadoop security, running Hadoop withAmazon Web Services, best practices, and automating Hadoopprocesses in real time are also covered in depth. With in-depth code examples in Java and XML and the latest onrecent additions to the Hadoop ecosystem, this complete resourcealso covers the use of APIs, exposing their inner workings andallowing architects and developers to better leverage and customizethem. The ultimate guide for developers, designers, and architectswho need to build and deploy Hadoop applications Covers storing and processing data with various technologies,automating data processing, Hadoop security, and deliveringreal-time solutions Includes detailed, real-world examples and code-levelguidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in theprogrammer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprisearchitects and developers need to maximize the power of Hadoop.

Hadoop Beginner's Guide

Author: Garry Turkington
Publisher: Packt Publishing Ltd
ISBN: 1849517304
Category: Computers
Page: 398
View: 6797
Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills. "Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems. Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems. While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection. In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.

Hadoop Application Architectures

Designing Real-World Big Data Applications
Author: Mark Grover,Ted Malaska,Jonathan Seidman,Gwen Shapira
Publisher: "O'Reilly Media, Inc."
ISBN: 1491900059
Category: Computers
Page: 400
View: 2554
Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case. To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process. This book covers: Factors to consider when using Hadoop to store and model data Best practices for moving data in and out of the system Data processing frameworks, including MapReduce, Spark, and Hive Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics Giraph, GraphX, and other tools for large graph processing on Hadoop Using workflow orchestration and scheduling tools such as Apache Oozie Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume Architecture examples for clickstream analysis, fraud detection, and data warehousing

Hadoop: The Definitive Guide

The Definitive Guide
Author: Tom White
Publisher: "O'Reilly Media, Inc."
ISBN: 9780596551360
Category: Computers
Page: 528
View: 9801
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!

Learning Hadoop 2

Author: Garry Turkington,Gabriele Modena
Publisher: Packt Publishing Ltd
ISBN: 1783285524
Category: Computers
Page: 382
View: 462
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Author: Steve Hoffman
Publisher: Packt Publishing Ltd
ISBN: 1784399140
Category: Computers
Page: 178
View: 816
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

Hadoop Operations

Author: Eric Sammer
Publisher: "O'Reilly Media, Inc."
ISBN: 1449327052
Category: Computers
Page: 282
View: 7776
For system administrators tasked with the job of maintaining large and complex Hadoop clusters, this book explains the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.

A Guide to the Project Management Body of Knowledge (Pmbok Guide) -- Fifth Ed. (Arabic)

Author: Project Management Institute
Publisher: Project Management Institute
ISBN: 9781628250008
Category: Business & Economics
Page: 587
View: 2211
A Guide to the Project Management Body of Knowledge (PMBOK Guide) Fifth Edition reflects the collaboration and knowledge of working project managers and provides the fundamentals of project management as they apply to a wide range of projects. This internationally recognized standard gives project managers the essential tools to practice project management and deliver organizational results. A 10th Knowledge Area has been added; Project Stakeholder Management expands upon the importance of appropriately engaging project stakeholders in key decisions and activities. Project data information and information flow have been redefined to bring greater consistency and be more aligned with the Data, Information, Knowledge and Wisdom (DIKW) model used in the field of Knowledge Management. Four new planning processes have been added: Plan Scope Management, Plan Schedule Management, Plan Cost Management and Plan Stakeholder Management: These were created to reinforce the concept that eac

Big Data

Architettura, tecnologie 
e metodi per l’utilizzo 
di grandi basi di dati
Author: Alessandro Rezzani
Publisher: Maggioli Editore
ISBN: 8838789894
Category: Business & Economics
Page: 320
View: 9059
Ogni giorno nel mondo vengono creati miliardi di dati digitali. Questa mole di informazione proviene dal notevole incremento di dispositivi che automatizzano numerose operazioni – record delle transazioni di acquisto e segnali GPS dei cellulari, per esempio – e dal Web: foto, video, post, articoli e contenuti digitali generati e diffusi dagli utenti tramite i social media. L’elaborazione di questi “big data” richiede elevate capacità di calcolo, tecnologie e risorse che vanno ben al di là dei sistemi convenzionali di gestione e immagazzinamento dei dati. Il testo esplora il mondo dei “grandi dati” e ne offre una descrizione e classificazione, presentando le opportunità che possono derivare dal loro utilizzo. Descrive le soluzioni software e hardware dedicate, riservando ampio spazio alle implementazioni Open Source e alle principali offerte cloud. Si propone dunque come una guida approfondita agli strumenti e alle tecnologie che permettono l’analisi e la gestione di grandi quantità di dati. Il volume è dedicato a chi, in università e in azienda (database administrator, IT manager, professionisti di Business Intelligence) intende approfondire le tematiche relative ai big data. È, inoltre, un valido supporto per il management aziendale per comprendere come ottenere informazioni utilizzabili nei processi decisionali. Alessandro Rezzani insegna presso l’Università Bocconi di Milano. È esperto di progettazione e implementazione di Data Warehouse, di processi ETL, database multidimensionali e soluzioni di reporting. Attualmente si occupa di disegno e implementazione di soluzioni di Business Intelligence presso Factory Software. Con Apogeo Education ha pubblicato “Business Intelligence. Processi, metodi, utilizzo in azienda”, 2012.

HBase in Action

Author: Nick Dimiduk,Amandeep Khurana
Publisher: Manning Publications
ISBN: 9781617290527
Category: Computers
Page: 334
View: 4969
Provides information on designing, building, and running applications using HBase.