Abstract—The rapid development of multimedia hardware and internet, the amount of videos in the internet is increasing.Each day millions of videos are generated and published.Among these huge volume , there exist large amount of copies or duplicate videos.The analysis of large amount of data is more difficult.Hadoop is distributed computing platform which is designed deployment in inexpensive hardware and suitable for those applications with a large data set .In this paper ,an attempt is done to develop a video search engine based on hadoop, and two algorithms Brightness sequence and TIRI_DCT are used. Index Terms—HDFS, Hadoop ,Video copy detection, Map-reduce,PythonI. IntroductionVideo Search Engine basically deals with finding out similar videos from the data set for given input video.This can be done by calculating hash values of content present in the videos by using appropriate algorithms.Due to large amount of data ,it is not possible to perform the video copy detection on a single machine approach as it is time consuming.Distributed computing platform enables the calculation amount be allocated to each computer,which increases performance and efficiency.Hadoop is a distributed computing platform,which uses the Map-Reduce model.Two algorithms Brightness sequence and TIRI_DCT are implemented,which are more efficient than other video copy detection algorithms.II. RELATED WORKSA. Introduction to Hadoop PlatformHadoop was developed by the Apache Foundation. It uses MapReduce programming model to develop the distributed applications. Hadoop has advantages of high fault tolerance, high throughput, easy scalability and etc. Hadoop mainly includes MapReduce, HDFS, Pig, Hive, HBase and ZooKeeper technology.MapReduce is the programming model of Hadoop which includes Map function and Reduce function. Users only need to implement these two functions to complete the preparation of a distributed program. A MapReduce process is called a job. Map function processes the input fragmentation and outputs the processing result to the Reduce function; Reduce function combines the outputs of the Map function. A job is jointly done by a JobTracker and several TaskTrackers, with the JobTracker responsible for scheduling jobs and TaskTracker responsible for executing the task.The Map function takes a key pair (KeyValue) as input and generates a set of intermediate key pairs,the MapReduce framework passes the intermediate key pairs generated by Map function in every TaskTracker to the Reduce function, and then the Reduce function combines these key pairs to produce a smaller key pair. B. Video copy detectionIn this paper ,Video copy detection is divided into two parts.First, the features of videos are extracted and hash library are formed .Second,features of querying video is extracted and hash values of querying video compared with hash library to find similar videos.Hash value calculation is used to extract video feature.Video hash value is sequence of character which express the unique feature of video content.In this paper two hash value is generated by two algorithms Brightness sequence and TIRI-DCT. III. PLATFORM DESIGNVideo copy detection is divided into two parts.First part training videos are uploaded to Hdfs and hash library is generated.OpenCv is used to convert videos into frames.The generated images are used to calculate the hash value of them.Hash values are stored in hdfs to generate the Hash library.Second part ,Querying video is converted to frames and hash values of querying video is generated.These hash value is compared with hash library. A.Design of Map-ReduceMapreduce design is used to write map function and reduce function.The input of the map function is the Video name and value is hash value.In reduce function querying video is compared with the hash library.IV. VIDEO HASH VALUE GENERATIONA. Brightness Sequence The video is divided into frames. Video Sequence X=( X1,X2,….XN ) where N is the number of frames.Hash value of each frame is generated by following steps.Step1: Each frame is divided into blocksStep2: Brightness of each block is calculated.Step3: Value of each block is used as hash value B. TIRI-DCT TIRI is the representative image of the time domain information, whose effect is similar to that of key frames. In this paper, every 30 frames of the video are weighted added and the representative frame of the video segment can be obtained. Then DCT transform is used on the representative frame to obtain DCT coefficients, and these coefficients are converted to the final hash values through a threshold.Step1:Representative frame is divided into blocksStep2:DCT transform is applied into each blocks ,and DCT coefficient of each block are calculate.Step3: Mean of these value are calculated.If coefficient > mean then added to hash sequence. C. Matching Consider two hash sequence Ha and Hb HA=HA1,……………HAN HB=HB1……….HBN Where HA and HB are hash value of querying video A and hash library B respectively. Hamming distance of two hash value are calculated.If hamming distance is less than threshold value then the frame is said to be same.V. CONCLUSIONThe analysis of large volume of video data on hadoop platform gives more efficient than on a single machine approach. The algorithms Brightness sequence and TIRI-DCT is implemented on hadoop platform.AcknowledgmentWe hereby express our sincere thanks to our dear teachers and other staffs for their inestimable and overwhelming support. We would like to express deep sense of gratitude to our guide Mrs. Sneha k , Asst. professor of department of computer science and engineering for his encouragement and guidance for the successful completion of this paper.We would also like to express our heartfelt thanks to our beloved parents and friends for their blessings and moral support.
July 24, 2019 0 Comments