Hack-up night with Google Vision API

3 min readJun 30, 2017

“We had two bags of books, seventy-five photos from our office IP camera, two laptops of high-performance RAM, an Azure cloud full of services, and a Visual Studio IDE with different tools, integrations, compilation warnings and errors… and also a quart of orange juice, a quart of water, a case of Budweiser, a big vegetarian pizza and desire to come up with the project and deliver a demo of it.
Not that we needed all that for the trip, but once you get locked into a serious neural networks collection, the tendency is to push it as far as you can.”

Imagine me, my colleague @ale_de635 and a project that needs to be presented next morning.
We found out that interesting thing at like 4 PM. The project needs to be delivered at next day’s 9 AM. OK, 17 hours, I had no sleep last night, my colleague’s situation is quite similar. OK, we are in!

Actually, I was sitting at our office place at 4 PM and what I first decided to do is to have some sleep. Not tonight, of course, we have got a bunch of work to do. So I went straight forward to my home, slept until like 10 PM and got back to the office at 11 PM to meet with Alex to start doing great things.

There is the time when some magic comes to stage, huh.

Inventing the wheel, firefighting windmills, service to create a website with keywords like “lemurs”.
That one might be interesting, thought I. Imagine you have like text field where you type “create SPA about watermelons” and everything is auto-generated, bootstrapped, good looking, filled with information from Wikipedia or whatever. However, this idea has no value for end-user. Or even has it, but very small value — who needs auto-generated SPA about watermelons?

So our next and final idea was to implement emotion analyze based on photos from IP cameras in our office.

The camera saves one photo per hour, sends it to the repository and at the end of the day you get refreshed statistic about your employee’s emotions — joy, anger, sorrow et cetera. You can find out how often they feel happy or drowned just by looking at end-of-the-day statistics.

Look’n’feel:

Tech stack:
Azure CosmosDB(DocumentDB),
Google Vision API,
C# ASP.NET

What can I say so far about Google Vision API:
I thought it would be pretty tough to switch from MS stack to Google stack, but actually, I got up’n’running in like 2 hours, everything else was just about implementing logic and UI.
It’s really powerful, you can have a lot of information _for free_. Every camera is streaming pictures, the picture is information itself, but now we can parse this picture at very low level — objects, faces, emotions, coordinates, labels, tags.

I’m sure I will use this tool few more times, but I wonder — what is more powerful — Microsoft Computer Vision or Google Vision API? Can I build and train my own neural network in Google Vision API? How would it transform the visual recognition?

Written by Dmytro Zhluktenko

No responses yet