Botconf 2017 has ended
View analytic
Wednesday, December 6 • 10:30 - 11:10
How to Compute the Clusterization of a Very Large Dataset of Malware with Open Source Tools for Fun & Profit?

Log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Malware are now developed at an industrial scale and human analysts need automatic tools to help them.
We propose here to present the results of our experiments on this difficult problem: how to cluster a very large set of malware (with only static information) to be able to classify some new malware. To cluster a set of (numerical) objects is to group into meaningful categories these objects. We want objects in the same group to be closer (or more similar) to each other than to those in other groups. Such groups of similar objects are called clusters. When data are labeled, this problem is called supervised clustering. It is a difficult problem but easier that the {\it unsupervised clustering} problem we have when data are not labeled.
All our experiments have been done with code written in Python and we have mainly used scikit-learn so you will probably be able to do the work again with your own feature vectors (well we hope for you!).

We will present some results on our dataset of two million malware. We will give some example of the results we have found and we will talk about future works that could be interesting to do (well: problems still to be solved).
Co-authors: Alexandre Letois, Marwan Burelle


Robert Erra

Professor, head of LSE, EPITA

Wednesday December 6, 2017 10:30 - 11:10
Corum Allée du Saint-Esprit, 34000 Montpellier, France