Relation Network

Relation Network

With the emergence of some Deep neural networks(DNN), such as ResNet, the ImageNet challenge is not so difficult as before. As a result, since recognition tasks are going to be solved, we are going to deal with a high-level question, including visual understanding.
Visual Question Answering has attracted much attention these years. Besides, visual relation is also going to be an issue we should consider.
more >>

What is attention,attention-based model in natural language processing


Attention-based Model (1)

What is attention?

In fact, the concept of “attention” in machine learning is just the same as what it is in our daily life. That’s maybe confusing. Imagine: parents always want their kids to concentrate, to focus on one thing they are doing, that’s the so-called “attention”. In other words, when we are reading a fiction story for example, sometimes we might can’t understand some parts. So we are likely to focus on this part more than usual, that’s also “attention”. However, before this concept, what people usually did in both language translation and recognition is just put the whole sentence or picture into the input and get the prediction.
more >>

Camelyon 16 Cancer detection Approach - 1

What’s going on?

We just got a homework from CSE 190 class, which wants us to slove the camelyon16 contest. To be honest, i think it is a difficult task since i have no knowledge about this before. So i am gonna check things out to see waht we can do!

Detecting Cancer Metastases on Gigapixel Pathology Images By google

They present a framework to detect and localize tumors as small as 100 x 100 pixels in Gigapixel microscopy images sized 100000x100000 pixels, leveraging a CNN architecure and obtaining state-of-art results on the Camelyon 16 dataset. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach, where human expert can only achieve 73.2%. AUC above 97%, even discovered 2 slides that were erroneously labeled normal. In a word, i think their approach could considerably reduce false negative rates in Metastases detection
more >>

Visual Relationship Detection with Languages Priors

Visual Relationship Detection with Languages Priors


This passage propose a model that uses the insight that combines objects and predicates to train visual models for models and predicates individually and later combines them together. Additionally, it localizes the objects in the predicated relationships as bounding boxes in the image.


While we poses similar challenges as object detection, one critical difference is that the size of semantic space of relationships is much larger. A fundamental challenge in visual relationship detection is learning from very few examples
Visual Phrases :using 13 common relationships, needs to train O(N^2K) . person-jumping-off-bikesis rare while person and bikes are usual. So the authors propose a visual appearance module that fuses objects and predicates together to jointly predicate, only need O(N + K)
Word vector embeddings naturally similar objects in linking their relationships (e.g., “person riding a horse” and “person riding a elephant”). There is a language module that uses cast pre-trained word vectors to cast relationships into a vector space where similar relationships are optimized to be close to each other.(can even enable zero-shot learning)

more >>

Visual Translation Embedding Network for Visual Relation Detection

Visual Translation Embedding Network for Visual Relation Detection


“Person ride bike “ offers a comprehensive understanding of an image, connecting CV and NLP. Due to the challenging complexity of finding the relation, this passage offers Visual Translation Embedding network(VTranse: subject+predicate=object) for visual relation detection.(competitive to Lu’s multi-modal model with language priors–[])


  • a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass.


This are lots of efforts that connecting computer vision and natural language: visual caption and question answering (mostly connects CNN and RNN,operimized on specialized datasets for specific tasks : image caption or image QA….), falling short in understanding relationships

Introduction to VTransE

Translation Embedding

Assume we have N objects and R relation
the fundamental challenge is O(N^2R).A common solution: to learn separate models for objects and predicates: O(N+R)(even more challenge ,compare “ride bike“ to “ride elephant“). TransE : representing large-scale in lower dimensional space, the relation triplet can be interpreted as a vector translation : person + ride = bike , only need to learn the ride translation vector in the relation space.

Knowledge Transfer in Relation

For example, person and bike detection serves as the context for ride prediction, Specifically, we propose a novel feature extraction layer that extracts three types of object features used in VTransE: classme,locations and RoI visual features. It uses bilinear feature interpolation[15,20] instead of RoI pooling for differentiable coordinates


  • VTransE, a convolutional network that detects objects and relations, first end-to-end relation detection

  • A novel visual relation detection learning model for VTansE that incorporates translation embedding and knowledge transfer

  • VTransE outperforms several strong baselines

more >>

Roaming about Clustering II Spectral Clustering

Ng大神等人提出的Spectral CLustering非常有意思,证明看得我很惊…
可惜数学基础还不够好orz 233,这篇就当是补一下线代的知识,顺便证明下这个谱聚类
more >>

钱湦钜<br>方向ML,DL,数学渣<br>爱好骑行,远行<br>Qian Shengju