Introduction To Big Data & Hadoop :
Big data is the data produced by different applications , like social networking sites,transport sites,search engines etc .
Big data meaning
- Variety of data
Eg : RDBMS etc.
- Semi structured
Eg : XML and excel documents
- Un structured
Eg: PDF,Media logs etc.
Volume of Data :
Currently there is exponential growth in data storage. In Petabyte (PB) ,Exabyte (EB) , we have data now a days.
- Velocity of Data :
Every second how many data are produced . it is called its velocity.
Hadoop is the implementation for Big Data. It is an open source platform that provides processing of large amount of data across multiple nodes of computers. Also it provides distributed storage. Its main feature is : “Partitioning of data acoss many hosts and the execution of application computation in parallel.”
HDFS : abbreviated as Hadoop Distributed File System.
It is designed to store very large datasets .It can offer a constant bitrate above a certain threshold when when transferring the data.
HDFS stores filesystem meta data and application data separately .
HDFS stores filesystem metadata on the server called namenode . Namenode also executes file system operations such as renaming,closing and opening files and directories.
HDFS stores application data on the server called datanode. Datanode performs read/write operations on client’s request. They also perform operations such as block creation , deletion and replication according to the instructions of the namenode.
It also provides file authentication and permissions.