Fun Computer Vision opencv tutorials and ..

Opencv 3.2, and Deep Regression Networks

deep reression network

Opencv 3.2 is out. I am just checking the change log. What is inside from the my point of view ?
The tuns of improvement are mainly in opencv_contrib modules Github fork is here.

There is several thinks that should be mentioned. For example GOTURN tracker. Is also part of the opencv_contrib fork under the tracking modules.. Goturn is convolutional neural network based tracking algorithm.

More information should be also found on Learning to Track at 100 FPS with Deep Regression Networks.

Deep Regression Networks (goturn basic information)

In opposite to online learning (for example first version of TLD that using warp path of negative example and geometric transformation over positive sample) the proposed DRN (Deep regression networks) using somehow pretrained feed-forward network without online learning. The authors pre-train algorithms on many different video samples called generic object tracking. Let say there is a video sample with car. Lets build the positive sample in sense that the neural network tracker hit the target in next frame.. Negative samples in opposite way. Collect the many video samples with different kind of object  and let just penalize the network when he make a mistake in tracking and in opposite when is overthink with tracking OK..

Idea simple as hell. Make sense. I have to try. Network should penalize samples that are not connected in space. Let say samples apart from consecutive video frames. Samples with weird behaviour and and strange similarities in object appearance.. 
Let the neural network learn what is tracking, how to track, what is optical flow, the relations between frames and object itself.. This is not bad idea. I have to try.. 
Yes, there are already lots of detectors based on neural network and others great idea. I really want to know more about this method and maybe try to implement.. 
Make sense..

Build from Convolutional neural network 

The architecture of the network is 2 pretrained convolutional neural network. One is processing and judge the feature of previous frame and the second one evaluate and judge the current frame.. Outputs of the both convolutional neural network are connected to one input vector to fully-connected layer. That evaluate relation between the object in previous and current video frames.. There is nothing new from the technical perspective just and idea used in tracking.  

Conclusion is that, I want to try select positive and negative samples from connected frames to learn neural network ho to track. 

Make sense ?? 

Amazon GO driven by modern computer vision, sensor fusion and deep learning 

In my eyes this is the brilliant idea. Just walk in, pick up what ever you want and leave. Get a milk or piece of bread and leave. All is automatically charged to your online account. What else ? You can comfortably put the staff back and go. You pay only what you take away. Great. What about the data collected behind. This is huge opportunity to optimize the process selling the product or even better the whole shopping cart  to concrete person. Wow effect is right there. 

computer vison

Data about customers 

I would like to talk mainly about this topic.  In late 2013, I and my friends come with the idea about statistics from security cameras in retail stores. The goal is to recognize interest of your customers and optimize the offers, product placement and more based on the statistics taken from cameras. By comparison of the interest of the people and their behavior in segments (places) and compare to all others data like sales, promotions, advertising placement you can obtain interesting dependencies between data. The data valuable to solve the questions how to transform environment, product placement and also price even better than before to gain the revenue.

Amazon GO deal with the idea in different way. Just go behind all of this from different and better perspective. We would like to mainly bring something valuable for our customers.

No checkout required shopping.

You will never wait in line again.

Behind all these benefits is also huge amount of the data to transform and optimize environment and indoor outdoor advertisement. Track the goals. Experimenting with the advertisement to bring more people to concrete product, into the store and much more than we can ever imagine.

I love this idea from both perspective.

Amazon GO computer vision against RFID 

The RFID gates that count what you take away are already here. They are not to much widely use and the main purpose is to use RFID for the security reasons. You can also track the customers and product movement in the environment based on RFID.
Why the RFID based checkout is not so widely used? You still need to pay somehow and this no line check out is not true as in case of Amazon GO.
But the main reason should be that the statistics about the customers are more valuable in case of use sensor fusion and  optimize really everything. This is more valuable to retails and traders itself to invest into this technology and expand as much as possible.

This all make sense at a time of the big data, data driven everything, deep and machine learning progress than anytime before. 

How does Amazon go works 

System based purely on video from security cameras is not enough. Human vision without no others senses is not enough also. Basically the feature space to recognize actions, product and track concrete customers need better features. Features taken from more than one sensors. Technique which is called sensor fusion. There is also deep learning behind and other not so interesting staff :). There is lots of articles about this topics. Articles describing the Amazon patents and technical estimations of the technology. 

I would like to point out something different. Some of my estimations and speculations. 

Amazon go, cloud and deep learning 

Amazon is strong in cloud computing. They are able to scale learning for large amount of the data on their own platform.
Point is that the evaluation of the learned model with multiple sensor input should be locally managed. Let say, collect data from all available sensors and calculate what is necessary for that service. Also, collect the features and send back to the cloud to improve the model. Learn the better one. Learn the model from the data taken from many stores from different kind of situations. Evaluate the model and provide increment back to improve the technology.

Even better provide the cloud service that distribute the learned model to any other who apply same sensors configuration as Amazon. This potential market Amazon go AS a Service for retail is potentially huge market. Really huge market.

Go amazon go

LBP cascade for detect head and people in opencv 

LBP cascade free to download to use in opencv to detect people and heads. Code example and cascade description. All you need to write your own people head detector from the youtube video.
Cascade is trained on my own people and head datasets. There are no perfect but in some cases are better then default opencv cascades. They are just different.. For example you can count that the head detector have much more false detection than the people detector.. The shape and feature space is much more common and close to others shapes than the whole people detector.

detect peole head opencv haar cascade

Issues with opencv detectMultiScale head and people detector

Please let me know if cascades worked as expected.. In code example there is ground threshold settings and reccomentation. 

LBP cascade head detection properties

Sure you can find inside file.

This is just basic 16 stage lbp cascade head detector develop by 
V.K. from
<?xml version="1.0"?>

New LBP cascade people detection properties

This is just basic 10 stage lbp cascade head detector develop by 
V.K. from
<?xml version="1.0"?>

Opencv cascade for car detection conditions of use

Also, Do not worry about the condition of use. Use only on your own risk. That's it. The dataset to train this cascade is only mine. I also colect positive and negative data. I just want to say, that there is also no conditions based on the datasets. There is no others conditions of use. Maybe check the Opencv traincascade utility. Thanks. Yes share and cite. Just small minimal condition like send me a million dolars. Its up to you.

Head cascade and people cascade download link.

Head detection download link

People detection download link

Head and people detection tutorial code

You can simple prepare the project inside the Visual Studio 2015 by Nuget Packages. This approach is easy for beginers and better than standard installation with all the environmental variables problems. Just follow the installation steps inside here

This code is based on my previous tutorial Fast people detection.
Inside the video capture loop just modify the part like this. Include cascade inside your project and play with different settings.. This is not ideal one..

// Name of the downloaded my cascades.. 
 string cascadeHead = "cascadeH5.xml";
 string cascadeName = "cascadG.xml";

// Load the cascade
 CascadeClassifier detectorBody;
 bool loaded1 = detectorBody.load(cascadeName);
 CascadeClassifier detectorHead;
 bool loaded2 = detectorHead.load(cascadeHead);
// save original make img gray
// draw rectangle back to the original colored sample
 Mat original;
// Prepare vector for results 
 vector human;
 vector head;
// Prepare gray image
 cvtColor(img, img, CV_BGR2GRAY);
// equalize Histogram  
        equalizeHist(img, img);
// detect body and head in the img 
// Set the proper min and max size for your case
 detectorBody.detectMultiScale(img, human, 1.04, 4, 0 | 1, Size(30, 80), Size(80,200));
 detectorHead.detectMultiScale(img, head, 1.1, 4, 0 | 1, Size(40, 40), Size(100, 100));

 if (human.size() > 0) {
  for (int gg = 0; gg < human.size(); gg++) {

   rectangle(original, human[gg].tl(), human[gg].br(), Scalar(0, 0, 255), 2, 8, 0);


 if (head.size() > 0) {
  for (int gg = 0; gg < head.size(); gg++) {

   rectangle(original, head[gg].tl(), head[gg].br(), Scalar(0, 0, 255), 2, 8, 0);


Opencv Kalman filter example video head tracking

Example of kalman filter in Opencv with head detection and tracking. 

Two big tutorials will be published soon. 
  • New version of LBP cascades for people detection, head detection
  • Code and tutorial related to this example. Simple kalman filter for tracking in Opencv. I just finished the code for you.. Stay tuned and share :). Thanks

 head and people detection Opencv LBP CascadeClassifier 

This LBP cascade for opencv will be available soon. People is just new version of the old one published here. Head detection is new one. Just trained.

LBP cascade description

I trained cascade just on hopefully well selected 2000 positive images and 2400 negative. Great effect has also selection of the negative samples. I need to find more about this. Same positive samples and different negative set of samples should lead to big different kind cascade property. Yes positive samples is good to have somehow unique (situation, positive, background, rotation etc). The negative samples are simple to capture. Random crop from the several pictures from vacation. Nothing special. But there is also the magic behind. Instead of try different kind of positive samples. Try the negative. Easy to collect and performance of the cascade should be rapidly different. Also, It is not necessary to have 100 percent clear negative sample. If there is 1 positive inside the negative set. Maybe more. It is not necessary something wrong. More details after release.... Soon hopefully