Hadoop Operations


Author: Eric Sammer
Publisher: "O'Reilly Media, Inc."
ISBN: 1449327052
Category: Computers
Page: 282
View: 5125
DOWNLOAD NOW »
For system administrators tasked with the job of maintaining large and complex Hadoop clusters, this book explains the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.

Hadoop Operations and Cluster Management Cookbook


Author: Shumin Guo
Publisher: Packt Publishing Ltd
ISBN: 1782165177
Category: Computers
Page: 368
View: 9298
DOWNLOAD NOW »
Solve specific problems using individual self-contained code recipes, or work through the book to develop your capabilities. This book is packed with easy-to-follow code and commands used for illustration, which makes your learning curve easy and quick.If you are a Hadoop cluster system administrator with Unix/Linux system management experience and you are looking to get a good grounding in how to set up and manage a Hadoop cluster, then this book is for you. It's assumed that you will have some experience in Unix/Linux command line already, as well as being familiar with network communication basics.

Hadoop Operations


Author: Benjamin Bowen
Publisher: CreateSpace
ISBN: 9781503390676
Category:
Page: 156
View: 4021
DOWNLOAD NOW »
Introduction Data warehousing is a success, judging by its 25 year history of use across all industries. Business intelligence met the needs it was designed for: to give non-technical people within the organization access to important, shared data. During the same period that data warehousing and BI matured, the automation and instrumenting of almost all processes and activities changed the data landscape in most companies. Where there were only a few applications and minimal monitoring 25 years ago, there is ubiquitous computing and data available about every activity today. Data warehouses have not been able to keep up with business demands for new sources of information, new types of data, more complex analysis and greater speed. Companies can put this data to use in countless ways, but for most it remains uncollected or unused, locked away in silos within IT. There has been a gradual maturing of data use in organizations. In the early days of BI it was enough to provide access to core financial and customer transactions. Better access enabled process changes, and these led to the need for more data and more varied uses of information. These changes put increasing strain on information processing and delivery capabilities that were designed under assumptions of stability and common use. Most companies now have a backlog of new data and analysis requests that BI groups are struggling to meet. Big data is not simply about growing data volumes - it's also about the fact that the data being collected today is different in ways that make it unwieldy for conventional databases and BI tools. Big data is also about new technologies that were developed to support the storage, retrieval and processing of this new data. The technologies originated in the world of web applications and internet-based companies, but they are now spreading into enterprise applications of all sorts. New technology coupled with new data enables new practices like real-time monitoring of operations across retail channels, supply chain practices at finer grain and faster speed, and analysis of customers at the level of individual activities and behaviors. Until recently, large scale data collection and analysis capabilities like these would have required a Wal-Mart sized investment, limiting them to large organizations. These capabilities are now available to all, regardless of company size or budget. This is creating a rush to adopt big data technologies. As the use of big data grows, the need for data management will grow. Many organizations already struggle to manage existing data. Big data adds complexity, which will only increase the challenge. The combination of new data and new technology requires new data management capabilities and processes to capture the promised long-term value. Wal-Mart handles more than a million customer transactions each hour and imports those into databases estimated to contain more than 2.5 petabytes of data. Radio frequency identification (RFID) systems used by retailers and others can generate 100 to 1,000 times the data of conventional bar code systems. Facebook handles more than 250 million photo uploads and the interactions of 800 million active users with more than 900 million objects (pages, groups, etc.) - each day. More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. Organizations are inundated with data - terabytes and petabytes of it. To put it in context, 1 terabyte contains 2,000 hours of CD-quality music and 10 terabytes could store the entire US Library of Congress print collection. Exabytes, zettabytes and yottabytes definitely are on the horizon . Data is pouring in from every conceivable direction: from operational and transactional systems, from scanning and facilities management systems, from inbound and outbound customer contact points, from mobile media and the Web .

Learning Hadoop 2


Author: Garry Turkington,Gabriele Modena
Publisher: Packt Publishing Ltd
ISBN: 1783285524
Category: Computers
Page: 382
View: 9223
DOWNLOAD NOW »
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

Expert Hadoop 2 Administration

Managing Spark, YARN, and MapReduce
Author: Sam R. Alapati
Publisher: Addison-Wesley Professional
ISBN: 0134703383
Category: Computers
Page: 848
View: 1979
DOWNLOAD NOW »
This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book. The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.” —Paul Dix, Series Editor In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop

Hadoop in 24 Hours, Sams Teach Yourself


Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134456726
Category: Computers
Page: 496
View: 8554
DOWNLOAD NOW »
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Professional Hadoop Solutions


Author: Boris Lublinsky,Kevin T. Smith,Alexey Yakubovich
Publisher: John Wiley & Sons
ISBN: 1118824180
Category: Computers
Page: 504
View: 3276
DOWNLOAD NOW »
The go-to guidebook for deploying Big Data solutions withHadoop Today's enterprise architects need to understand how the Hadoopframeworks and APIs fit together, and how they can be integrated todeliver real-world solutions. This book is a practical, detailedguide to building and implementing those solutions, with code-levelinstruction in the popular Wrox tradition. It covers storing datawith HDFS and Hbase, processing data with MapReduce, and automatingdata processing with Oozie. Hadoop security, running Hadoop withAmazon Web Services, best practices, and automating Hadoopprocesses in real time are also covered in depth. With in-depth code examples in Java and XML and the latest onrecent additions to the Hadoop ecosystem, this complete resourcealso covers the use of APIs, exposing their inner workings andallowing architects and developers to better leverage and customizethem. The ultimate guide for developers, designers, and architectswho need to build and deploy Hadoop applications Covers storing and processing data with various technologies,automating data processing, Hadoop security, and deliveringreal-time solutions Includes detailed, real-world examples and code-levelguidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in theprogrammer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprisearchitects and developers need to maximize the power of Hadoop.

HDInsight Essentials - Second Edition


Author: Rajesh Nadipalli
Publisher: Packt Publishing Ltd
ISBN: 1784396664
Category: Computers
Page: 178
View: 6102
DOWNLOAD NOW »
If you want to discover one of the latest tools designed to produce stunning Big Data insights, this book features everything you need to get to grips with your data. Whether you are a data architect, developer, or a business strategist, HDInsight adds value in everything from development, administration, and reporting.

Oracle Big Data Handbook


Author: Tom Plunkett,Brian Macdonald,Bruce Nelson,Mark Hornick,Helen Sun,Keith Laker,Khader Mohiuddin,Debra Harding,Gokula Mishra,David Segleau,Robert Stackowiak
Publisher: McGraw Hill Professional
ISBN: 0071827269
Category: Computers
Page: 464
View: 7913
DOWNLOAD NOW »
"Cowritten by members of Oracle's big data team, [this book] provides complete coverage of Oracle's comprehensive, integrated set of products for acquiring, organizing, analyzing, and leveraging unstructured data. The book discusses the strategies and technologies essential for a successful big data implementation, including Apache Hadoop, Oracle Big Data Appliance, Oracle Big Data Connectors, Oracle NoSQL Database, Oracle Endeca, Oracle Advanced Analytics, and Oracle's open source R offerings"--Page 4 of cover.

Hadoop MapReduce v2 Cookbook - Second Edition


Author: Thilina Gunarathne
Publisher: Packt Publishing Ltd
ISBN: 1783285486
Category: Computers
Page: 322
View: 9073
DOWNLOAD NOW »
If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.

Real-World Hadoop


Author: Ted Dunning,Ellen Friedman
Publisher: "O'Reilly Media, Inc."
ISBN: 1491928913
Category: Computers
Page: 104
View: 8726
DOWNLOAD NOW »
If you’re a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues. You’ll learn about early decisions and pre-planning that can make the process easier and more productive. If you’re already using these technologies, you’ll discover ways to gain the full range of benefits possible with Hadoop. While you don’t need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects. Examine a day in the life of big data: India’s ambitious Aadhaar project Review tools in the Hadoop ecosystem such as Apache’s Spark, Storm, and Drill to learn how they can help you Pick up a collection of technical and strategic tips that have helped others succeed with Hadoop Learn from several prototypical Hadoop use cases, based on how organizations have actually applied the technology Explore real-world stories that reveal how MapR customers combine use cases when putting Hadoop and NoSQL to work, including in production

Programming MapReduce with Scalding


Author: Antonios Chalkiopoulos
Publisher: Packt Publishing Ltd
ISBN: 1783287020
Category: Computers
Page: 148
View: 435
DOWNLOAD NOW »
This book is an easy-to-understand, practical guide to designing, testing, and implementing complex MapReduce applications in Scala using the Scalding framework. It is packed with examples featuring log-processing, ad-targeting, and machine learning. This book is for developers who are willing to discover how to effectively develop MapReduce applications. Prior knowledge of Hadoop or Scala is not required; however, investing some time on those topics would certainly be beneficial.

Hadoop: The Definitive Guide

The Definitive Guide
Author: Tom White
Publisher: "O'Reilly Media, Inc."
ISBN: 9780596551360
Category: Computers
Page: 528
View: 3390
DOWNLOAD NOW »
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!

Pro Website Development and Operations

Streamlining DevOps for large-scale websites
Author: Matthew Sacks
Publisher: Apress
ISBN: 1430239700
Category: Computers
Page: 124
View: 3894
DOWNLOAD NOW »
Pro Website Development and Operations gives you the experience you need to create and operate a large-scale production website. Large-scale websites have their own unique set of problems regarding their design—problems that can get worse when agile methodologies are adopted for rapid results. Managing large-scale websites, deploying applications, and ensuring they are performing well often requires a full scale team involving the development and operations sides of the company—two departments that don't always see eye to eye. When departments struggle with each other, it adds unnecessary complexity to the work, and that result shows in the customer experience. Pro Website Development and Operations shows you how to streamline the work of web development and operations - incorporating the latest insights and methodologies of DevOps - so that your large-scale website is up and running quickly, with little friction and extreme efficiency between divisions. This book provides critical knowledge for any developer engaged in delivering the business and software engineering goals required to create and operate a large-scale production website. It addresses how developers can collaborate effectively with business and engineering teams to ensure applications are smoothly transitioned from product inception to implementation, and are properly deployed and managed. Pro Website Development and Operations provides unique insights into how systems, code, and process can all work together to make large-scale website development and operations ultra-efficient.

Big data

La revolución de los datos masivos
Author: Viktor Mayer-Schönberger,Kenneth Cukier
Publisher: Turner
ISBN: 8415427816
Category: Computers
Page: N.A
View: 1657
DOWNLOAD NOW »
Un análisis esclarecedor sobre uno de los grandes temas de nuestro tiempo, y sobre el inmenso impacto que tendrá en la economía, la ciencia y la sociedad en general. Los datos masivos representan una revolución que ya está cambiando la forma de hacer negocios, la sanidad, la política, la educación y la innovación. Dos grandes expertos en la materia analizan qué son los datos masivos, cómo nos pueden cambiar la vida, y qué podemos hacer para defendernos de sus riesgos. Un gran ensayo, único en español, pionero en su campo, y que se adelanta a una tendencia que crece a un ritmo frenético.

Planning for Big Data


Author: Edd Wilder-James
Publisher: "O'Reilly Media, Inc."
ISBN: 1449329640
Category: Computers
Page: 83
View: 3326
DOWNLOAD NOW »
In an age where everything is measurable, understanding big data is an essential. From creating new data-driven products through to increasing operational efficiency, big data has the potential to make your organization both more competitive and more innovative. As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven. Written by O'Reilly Radar's experts on big data, this anthology describes: The broad industry changes heralded by the big data era What big data is, what it means to your business, and how to start solving data problems The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions The landscape of NoSQL databases and their relative merits How visualization plays an important part in data work

Mastering Hadoop


Author: Sandeep Karanth
Publisher: Packt Publishing Ltd
ISBN: 1783983655
Category: Computers
Page: 374
View: 3878
DOWNLOAD NOW »
Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.

Pro Hadoop


Author: Jason Venner
Publisher: Apress
ISBN: 1430219424
Category: Computers
Page: 440
View: 7506
DOWNLOAD NOW »
You’ve heard the hype about Hadoop: it runs petabyte–scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open-source (thus free). But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running? From Apress, the name you’ve come to trust for hands–on technical knowledge, Pro Hadoop brings you up to speed on Hadoop. You learn the ins and outs of MapReduce; how to structure a cluster, design, and implement the Hadoop file system; and how to build your first cloud–computing tasks using Hadoop. Learn how to let Hadoop take care of distributing and parallelizing your software—you just focus on the code, Hadoop takes care of the rest. Best of all, you’ll learn from a tech professional who’s been in the Hadoop scene since day one. Written from the perspective of a principal engineer with down–in–the–trenches knowledge of what to do wrong with Hadoop, you learn how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system or inheriting someone else’s. Skip the novice stage and the expensive, hard–to–fix mistakes...go straight to seasoned pro on the hottest cloud–computing framework with Pro Hadoop. Your productivity will blow your managers away. What you’ll learn Set up a stand–alone Hadoop cluster the smart way, laid out simply and step by step so you can get up and running quickly to build your next data center, collaborative, data–intensive Internet services application, Software as a Service (SaaS), and more. Optimize your Hadoop production tasks like an experienced pro. Work with time–proven, bulletproof standard patterns that have been tested and debugged in high–volume production. Understand just enough theoretical knowledge to know why something works in Hadoop, without getting bogged down in abstruse walls of theory. Get detailed explanations of not only how to do something with Hadoop, but also why, from a front–line coder with years in the Hadoop game. Turn someone else’s expensive cluster–wide “wrong” into an orderly, productive "right" with professional–level debugging and testing. Who this book is for IT professionals interested in investigating Hadoop and implementing it in their organizations, and existing Hadoop users who want to deepen their professional toolkits. Table of Contents Getting Started with Hadoop Core The Basics of a MapReduce Job The Basics of Multimachine Clusters HDFS Details for Multimachine Clusters MapReduce Details for Multimachine Clusters Tuning Your MapReduce Jobs Unit Testing and Debugging Advanced and Alternate MapReduce Techniques Solving Problems with Hadoop Projects Based On Hadoop and Future Directions

YARN Essentials


Author: Amol Fasale,Nirmal Kumar
Publisher: Packt Publishing Ltd
ISBN: 1784397725
Category: Computers
Page: 176
View: 5945
DOWNLOAD NOW »
If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.

Applied Big Data Analytics in Operations Management


Author: Kumar, Manish
Publisher: IGI Global
ISBN: 1522508872
Category: Business & Economics
Page: 251
View: 1132
DOWNLOAD NOW »
Operations management is a tool by which companies can effectively meet customers’ needs using the least amount of resources necessary. With the emergence of sensors and smart metering, big data is becoming an intrinsic part of modern operations management. Applied Big Data Analytics in Operations Management enumerates the challenges and creative solutions and tools to apply when using big data in operations management. Outlining revolutionary concepts and applications that help businesses predict customer behavior along with applications of artificial neural networks, predictive analytics, and opinion mining on business management, this comprehensive publication is ideal for IT professionals, software engineers, business professionals, managers, and students of management.