Automated action recognition is difficult because of the large number of possible meanings of data in video streams.
Many current approaches,train systems to look for specific actions. Those approaches are challenged when previously unseen actions are recognized, and often return inaccurate results if there are between actions from different classes in the training set.
This work applies bandwidth efficient, non-focal techniques to automatically determine the similarity of previously unseen activities, in a way that labels can be assigned in due time. Motion events in a videostream are correlated to create rated sub-streams.
Ratings are clustered and the resulting number of groups isused to drive a neural networkthat generates representationsof the substreams. These representations are then clustered using TSNE. Reports on sub-stream classifications can then be generated without prior labelling, and labels can be assigned at a later time.
As a result, the system is able to observe and classify video events without explicit training to search for specific video sequences, thereby minimizing human effort in video surveillance tasks.