Keras - Merging layers - Keras 2.0

I am trying to merge two networks. I can accomplish this by doing the following:

merged = Merge([CNN_Model, RNN_Model], mode='concat')

But I get a warning:

merged = Merge([CNN_Model, RNN_Model], mode='concat')
__main__:1: UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.

So I tried this:

merged = Concatenate([CNN_Model, RNN_Model])
model = Sequential()
model.add(merged)

and got this error:

ValueError: The first layer in a Sequential model must get an `input_shape` or `batch_input_shape` argument.

Can anyone give me the syntax as how I would get this to work?

Don't use sequential models for models with branches.

Use the Functional API:

from keras.models import Model  

You're right in using the Concatenate layer, but you must pass "tensors" to it. And first you create it, then you call it with input tensors (that's why there are two parentheses):

concatOut = Concatenate()([CNN_Model.output,RNN_Model.output])

For creating a model out of that, you need to define the path from inputs to outputs:

model = Model([CNN_Model.input, RNN_Model.input], concatOut)

This answer assumes your existing models have only one input and output each.

mathematically, the difference is this:

  • an embedding layer performs select operation. in keras, this layer is equivalent to:

    k.gather(self.embeddings, inputs)      # just one matrix
    
  • a dense layer performs dot-product operation, plus an optional activation:

    outputs = matmul(inputs, self.kernel)  # a kernel matrix
    outputs = bias_add(outputs, self.bias) # a bias vector
    return self.activation(outputs)        # an activation function
    

you can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. in nlp, the word vocabulary size can be of the order 100k (sometimes even a million). on top of that, it's often needed to process the sequences of words in a batch. processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. in addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.

you can use the functional api model and separate four distinct groups:

from keras.models import model
from keras.layers import dense, input, concatenate, lambda

inputtensor = input((8,))

first, we can use lambda layers to split this input in four:

group1 = lambda(lambda x: x[:,:2], output_shape=((2,)))(inputtensor)
group2 = lambda(lambda x: x[:,2:4], output_shape=((2,)))(inputtensor)
group3 = lambda(lambda x: x[:,4:6], output_shape=((2,)))(inputtensor)
group4 = lambda(lambda x: x[:,6:], output_shape=((2,)))(inputtensor)

now we follow the network:

#second layer in your image
group1 = dense(1)(group1)
group2 = dense(1)(group2)
group3 = dense(1)(group3)   
group4 = dense(1)(group4)

before we connect the last layer, we concatenate the four tensors above:

outputtensor = concatenate()([group1,group2,group3,group4])

finally the last layer:

outputtensor = dense(2)(outputtensor)

#create the model:
model = model(inputtensor,outputtensor)

beware of the biases. if you want any of those layers to have no bias, use use_bias=false.


old answer: backwards

sorry, i saw your image backwards the first time i answered. i'm keeping this here just because it's done...

from keras.models import model
from keras.layers import dense, input, concatenate

inputtensor = input((2,))

#four groups of layers, all of them taking the same input tensor
group1 = dense(1)(inputtensor)
group2 = dense(1)(inputtensor)
group3 = dense(1)(inputtensor)   
group4 = dense(1)(inputtensor)

#the next layer in each group takes the output of the previous layers
group1 = dense(2)(group1)
group2 = dense(2)(group2)
group3 = dense(2)(group3)
group4 = dense(2)(group4)

#now we join the results in a single tensor again:
outputtensor = concatenate()([group1,group2,group3,group4])

#create the model:
model = model(inputtensor,outputtensor)

first, the backend: tf.keras.backend.concatenate()

backend functions are supposed to be used "inside" layers. you'd only use this in lambda layers, custom layers, custom loss functions, custom metrics, etc.

it works directly on "tensors".

it's not the choice if you're not going deep on customizing. (and it was a bad choice in your example code -- see details at the end).

if you dive deep into keras code, you will notice that the concatenate layer uses this function internally:

import keras.backend as k
class concatenate(_merge):  
    #blablabla   
    def _merge_function(self, inputs):
        return k.concatenate(inputs, axis=self.axis)
    #blablabla

then, the layer: keras.layers.concatenate(axis=-1)

as any other keras layers, you instantiate and call it on tensors.

pretty straighforward:

#in a functional api model:
inputtensor1 = input(shape) #or some tensor coming out of any other layer   
inputtensor2 = input(shape2) #or some tensor coming out of any other layer

#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputtensor = keras.layers.concatenate(axis=someaxis)([inputtensor1, inputtensor2])

this is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).


finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)

this is not a layer. this is a function that will return the tensor produced by an internal concatenate layer.

the code is simple:

def concatenate(inputs, axis=-1, **kwargs):
   #blablabla
   return concatenate(axis=axis, **kwargs)(inputs)

older functions

in keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". their names were related to the merge word.

but since keras 2 doesn't mention or document these, i'd probably avoid using them, and if old code is found, i'd probably update it to a proper keras 2 code.


why the _keras_shape word?

this backend function was not supposed to be used in high level codes. the coder should have used a concatenate layer.

atoms_bonds_features = concatenate(axis=-1)([atoms, summed_bond_features])   
#just this line is perfect

keras layers add the _keras_shape property to all their output tensors, and keras uses this property for infering the shapes of the entire model.

if you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.

the coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (this may work now, but in case of keras updates this code will break while proper codes will remain ok)

i had to ask this question on the keras github page and someone helped me on how to implement it properly... here's the issue on github...


Tags: Keras Keras Layer