<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cylance &#8211; @Forensicxs</title>
	<atom:link href="https://www.forensicxs.com/tag/cylance/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.forensicxs.com</link>
	<description>Ethical Hacking &#124; Cybersecurity</description>
	<lastBuildDate>Thu, 13 May 2021 19:01:18 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Machine Learning applied to Cybersecurity and Hacking</title>
		<link>https://www.forensicxs.com/machine-learning-applied-to-cybersecurity-and-hacking/</link>
					<comments>https://www.forensicxs.com/machine-learning-applied-to-cybersecurity-and-hacking/#respond</comments>
		
		<dc:creator><![CDATA[Forensicxs]]></dc:creator>
		<pubDate>Fri, 12 Jun 2020 15:13:04 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Cybersecurity]]></category>
		<category><![CDATA[Cylance]]></category>
		<category><![CDATA[Darktrace]]></category>
		<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<guid isPermaLink="false">https://www.forensicxs.com/?p=504</guid>

					<description><![CDATA[Machine Learning (ML) is a hot topic these days&#8230;why is that ? And first of all, what is ML all about ? What are the mainstream applications of ML ? What are the potential applications to Cybersecurity ? Are Threat Actors likely to use ML in the near future, if so, in which context ? &#8230; <p class="link-more"><a href="https://www.forensicxs.com/machine-learning-applied-to-cybersecurity-and-hacking/" class="more-link">Continue reading<span class="screen-reader-text"> "Machine Learning applied to Cybersecurity and Hacking"</span></a></p>]]></description>
										<content:encoded><![CDATA[
<p class="has-normal-font-size"><strong><span class="has-inline-color has-vivid-red-color">Machine Learning (ML)</span></strong> is a hot topic these days&#8230;why is that ? And first of all, what is ML all about ? What are the mainstream applications of ML ? What are the potential applications to Cybersecurity ? Are Threat Actors likely to use ML in the near future, if so, in which context ? I try to explore these questions in this blog post, and more</p>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">What is ML</span></strong></p>



<p>While being a hot topic nowadays, and like other IT related research fields, ML is not a brand new technique</p>



<p>ML started to be theoreticized in the 50s with the first breakthough of Artificial Intelligence, then the first mainstream application of ML was a spam filter in the 90s, while a more complex architecture of algorithms, known as deep learning, started to grow some years ago</p>



<figure class="wp-block-image size-large is-resized"><img fetchpriority="high" decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML8.png" alt="" class="wp-image-530" width="699" height="399" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML8.png 700w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML8-300x171.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML8-230x131.png 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML8-350x200.png 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML8-480x274.png 480w" sizes="(max-width: 699px) 100vw, 699px" /></figure>



<p>Machine Learning is the science of programming computers so they can learn from data. The system uses examples, also called training set, to learn from data, using an ML algorithm. Accuracy of the learning output is measured so the cycle can repeat and converge to an even better result</p>



<p> The typical ML process is as follows :</p>



<figure class="wp-block-gallery columns-1 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex"><ul class="blocks-gallery-grid"><li class="blocks-gallery-item"><figure><img decoding="async" width="1024" height="472" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-1024x472.jpeg" alt="" data-id="541" data-full-url="https://www.forensicxs.com/wp-content/uploads/2020/06/ML1.jpeg" data-link="https://www.forensicxs.com/?attachment_id=541" class="wp-image-541" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-1024x472.jpeg 1024w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-300x138.jpeg 300w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-768x354.jpeg 768w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-830x382.jpeg 830w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-230x106.jpeg 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-350x161.jpeg 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1-480x221.jpeg 480w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML1.jpeg 1528w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure></li></ul></figure>



<p># Get data : acquire a big enough dataset/training set</p>



<p># Clean, Prepare &amp; Manipulate Data : load your dataset from storage, do basic data analysis and visualization, transform the input data into numeric data</p>



<p># Train Model : run the data through a machine learning algorithm, use the&nbsp;model to make predictions, classifications and other tasks</p>



<p># Test Data : verify the accuracy of the output versus expectations</p>



<p># Improve : repeat the above cycle trying to improve the accuracy</p>



<p>Machine learning is great for fluctuating environments as the ML system can adapt to new data, getting insights about complex problems and large amounts of data, problems for which classical approaches would require a lot of fine tuning or long lists of rules</p>



<p>There are three main categories of ML systems :</p>



<ol class="wp-block-list"><li><strong><span class="has-inline-color has-vivid-red-color"><em>Supervised learning</em> </span></strong>: the training set you feed to the algorithm includes the desired solutions, also called labels, and the machine will learn from it, it is &#8220;task driven&#8221;</li><li><em><strong><span class="has-inline-color has-vivid-red-color">Unsupervised learning</span></strong></em> : the training set is unlabeled, the machine tries to learn without a teacher, it is &#8220;data driven&#8221;</li><li><em><strong><span class="has-inline-color has-vivid-red-color">Reinforcement learning</span></strong></em> : the learning machine can observe the environment, select and perform actions, and get rewards in return. It must learn by itself what is the best strategy to get the most reward over time, the algorithm &#8220;learns to react to an environment&#8221;</li></ol>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML4.jpg" alt="" class="wp-image-526" width="627" height="456" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML4.jpg 958w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-300x219.jpg 300w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-768x560.jpg 768w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-830x605.jpg 830w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-550x400.jpg 550w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-230x168.jpg 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-350x255.jpg 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML4-480x350.jpg 480w" sizes="(max-width: 627px) 100vw, 627px" /></figure>



<p>There are many algorithms used in ML, and the trick is to select the best algorithm for the user case, to achieve the best accuracy, in the shortest time, as computations can be very long in some cases</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-898x1024.jpg" alt="" class="wp-image-529" width="667" height="760" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-898x1024.jpg 898w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-263x300.jpg 263w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-768x876.jpg 768w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-1347x1536.jpg 1347w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-830x947.jpg 830w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-230x262.jpg 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-350x399.jpg 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6-480x547.jpg 480w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML6.jpg 1402w" sizes="(max-width: 667px) 100vw, 667px" /></figure>



<p>From a mathematical point of view, the main tools used by all these ML algorithm are a blend of :</p>



<p class="has-text-align-center has-medium-font-size"><em><strong>Linear algebra</strong></em></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Analytic geometry</em></strong></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Matrix decompositions</em></strong></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Vector calculations</em></strong></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Optimization</em></strong></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Probability</em></strong></p>



<p class="has-text-align-center has-medium-font-size"><strong><em>Statistics</em></strong></p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">Why is ML a hot topic these days</span></strong></p>



<p>As ML is not a brand new technique but dates back from decades, one could wonder why ML is making the headlines nowadays. There are several good reasons for this :</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow">
<p><span class="has-inline-color has-luminous-vivid-orange-color"><strong>Computers are more powerfull than ever</strong> </span>: running an ML algorithm on a huge dataset is not a small task. Today&#8217;s personal computers are so powerfull that most ML algorithms can be run from home in a reasonable amount of time and computer resource</p>



<p><span class="has-inline-color has-luminous-vivid-orange-color"><strong>A lot of data are available</strong> </span>: there are many good datasets and public sources available in the Internet to feed ML algorithms</p>



<p><span class="has-inline-color has-luminous-vivid-orange-color"><strong>Many tools and libraries are available</strong> </span>: ML tools, mostly based on Python, such as SciKit, Keras, TensorFlow are easily available with many good tutorials online</p>



<p><span class="has-inline-color has-luminous-vivid-orange-color"><strong>Research is very active</strong> </span>: advances in mathematics, new algorithms, are stronger than ever. Improving existing algorithms and designing new ones is a key topic, especially in fields such as deep learning</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">There are great books</span></strong> to discover this topic, often with ready made codes and solutions that you can download on GitHub and Jupyter Notebooks</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">Startup have been growing</span></strong> very fast and now there is a strong ML ecosystem to propose efficient solutions applied to common Business issues</p>
</div></div>



<p>For these reasons, ML is now used at scale in many real life applications</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">What are the mainstream applications of ML ?</span></strong></p>



<p><em><strong><span class="has-inline-color has-vivid-red-color">Regressio</span></strong></em><strong><em><span class="has-inline-color has-vivid-red-color">n</span></em></strong></p>



<p>Forecasting company&#8217;s revenue based on many performance metrix </p>



<p>Predicting stock exchange</p>



<p><em><strong><span class="has-inline-color has-vivid-red-color">Clustering</span></strong></em></p>



<p>Segmenting clients based on their purchases</p>



<p>Segmenting articles in a library</p>



<p><em><strong><span class="has-inline-color has-vivid-red-color">Convolutional Neural Network (CNN)</span></strong></em></p>



<p>Analyzing images of products on a production line to automatically classify them</p>



<p>Detecting tumors in brain scans</p>



<p>Making your app react to voice commands using speech recognition, which requires processing audio samples</p>



<p><em><strong><span class="has-inline-color has-vivid-red-color">Recurrent Neural Network (RNN) and Natural Language Processing (NLP)</span></strong></em></p>



<p>Automatically classifying news articles based upon text classification</p>



<p>Automatically flagging offensive comments on discussion forums</p>



<p>Summarizing long documents automatically</p>



<p>Creating a chatbot or a personal assistant</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">What are the current ML applications to Cybersecurity</span></strong></p>



<p>Cybersecurity is a very wide discipline, ML too. So the possibilities are huge. Let&#8217;s have a look at some existing applications, which are not necessarily fully fledged ML applications, but embbed ML technologies at some point</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Darktrace</strong></span> : <a rel="noreferrer noopener" href="https://www.darktrace.com/en/" target="_blank">https://www.darktrace.com/en/</a></p>



<figure class="wp-block-image size-large"><img decoding="async" width="960" height="500" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML.jpg" alt="" class="wp-image-557" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML.jpg 960w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-300x156.jpg 300w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-768x400.jpg 768w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-830x432.jpg 830w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-230x120.jpg 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-350x182.jpg 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-480x250.jpg 480w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<p>Darktrace was founded in 2013, in a collaboration between British intelligence agencies and Cambridge University mathematicians. It&#8217;s a solution based upon behavior analytics, evolving towards an active defense</p>



<p>As perimeter-based protection strategies are not always sufficient, Darktrace is focused on detecting and mitigating attacks in their earliest stages. It calls its detection piece the <a href="https://darktrace.com/technology/#enterprise-immune-system" target="_blank" rel="noreferrer noopener">Enterprise Immune System</a>, modeled after the human body’s defenses. Using unsupervised machine learning — it doesn’t look for signatures or known examples of malware — without knowing what to look for, it develops a pattern of “normal” for the network, then looks for anomalies</p>



<p>Darktrace is based upon the following technologies :</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">Probabilistic approach</span></strong> and <strong><span class="has-inline-color has-luminous-vivid-orange-color">machine learning</span></strong> to analyse your network, data exchanges, user and devices behaviour. It looks for trends, can cluster and find subtle deviations</p>



<p>It can alert IT administrators of any <strong><span class="has-inline-color has-luminous-vivid-orange-color">discrepancies to normal behaviours</span></strong>; as such it will complement traditional firewall systems</p>



<p>The plateform is based upon a <strong><span class="has-inline-color has-luminous-vivid-orange-color">physical appliance running on CentOS</span></strong>, together with a distributed database to store metadata (such as IP headers, Ethernet, app logs,&#8230;), and follows all interactions inside and outside your Company</p>



<p>Raw datas are not stored nor retained for security and confidentiality reasons</p>



<p>If an anomaly is detected, <strong><span class="has-inline-color has-luminous-vivid-orange-color">an alert will be triggered</span></strong> and send to the IT administrator with a copy of the metadata</p>



<p>An HTML interface (called Threat Vizualizer) displays the network and devices <span class="has-inline-color has-luminous-vivid-orange-color"><strong>topology</strong> </span>and offers a user friendly 3D navigation, allowing you to drill down into the activity of a specific device, and watch for actions such as calling out to a suspicious region or sending out data</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
https://www.youtube.com/watch?v=nc32nUDyxaI
</div></figure>



<p>Although a very interesting appliance, there are several drawbacks to take into account :</p>



<p>The very definition of the product requires visibility of all network traffic to get the full potential of the tool. In distributed and complex networks, <strong><span class="has-inline-color has-luminous-vivid-orange-color">this can be very expensive</span></strong> in deployment and configuration</p>



<p>It requires a regular health check, as all ML applications, thus requiring <span class="has-inline-color has-luminous-vivid-orange-color"><strong>maintenance of the deployment</strong> </span>and extra care about false logs</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Splunk</strong></span> : <a rel="noreferrer noopener" href="https://www.splunk.com/en_us/software/user-behavior-analytics.html" target="_blank">Splunk User Behavior Analytics</a></p>



<p>Splunk UBA is a machine learning driven solution that helps organizations find hidden threats and anomalous behavior across users, devices, and applications. Its data science-driven approach produces actionable results with risk ratings and supporting evidence, augmenting SOC analysts’ existing techniques</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe title="Splunk User Behaviour Analytics (UBA) Introduction &amp; Demo | Somerford" width="525" height="295" src="https://www.youtube.com/embed/z8NWStWFg2Y?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p>Splunk User Behavior Analytics not only captures the footprint of threat actors as they traverse enterprise, cloud, and mobile environments, but also runs them through its advanced machine learning algorithms to baseline, detect deviations and find anomalies continuously. These aberrations are then stitched into a meaningful sequence over time using pattern detection and advanced correlation to reveal the actual kill chain, which is not only comprehensible but also immediately actionable</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">Splunk is based upon the following technologies</span></strong> :</p>



<p class="has-text-align-center has-medium-font-size">Big data platform</p>



<p class="has-text-align-center has-medium-font-size">Peer group analytics</p>



<p class="has-text-align-center has-medium-font-size">Unsupervised machine learning algorithms</p>



<p class="has-text-align-center has-medium-font-size">Behaviour analytics</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Cylance</strong></span> : <a href="https://www.cylance.com/en-us/index.html" target="_blank" rel="noreferrer noopener">https://www.cylance.com/en-us/index.html</a></p>



<p>Cylance Inc. is a software firm that develops antivirus programs, to block computer viruses or malware before they have an effect on a user&#8217;s computer. Cylance has been described as the first company to apply artificial intelligence and machine learning to cybersecurity</p>



<p>In February 2019, the company was acquired by <a href="https://en.wikipedia.org/wiki/BlackBerry_Limited" target="_blank" rel="noreferrer noopener">BlackBerry Limited</a></p>



<p>In 2017, the Cylance team released an excellent, must read, E-book to train cybersecurity professionals : <a rel="noreferrer noopener" href="http://shorturl.at/ijzM6" target="_blank">http://shorturl.at/ijzM6</a></p>



<p>Some of the key concepts used for antivirus detection are explained in this value added document</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="747" src="https://www.forensicxs.com/wp-content/uploads/2020/06/ML-1024x747.png" alt="" class="wp-image-563" srcset="https://www.forensicxs.com/wp-content/uploads/2020/06/ML-1024x747.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-300x219.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-768x560.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-830x605.png 830w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-550x400.png 550w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-230x168.png 230w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-350x255.png 350w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML-480x350.png 480w, https://www.forensicxs.com/wp-content/uploads/2020/06/ML.png 1207w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">A more extensive view of ML applications to cybersecurity</span></strong></p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Network protection</strong></span></p>



<p>Regression to predict the network packet parameters and compare them with the normal ones</p>



<p>Classification to identify different classes of network attacks such as scanning and spoofing</p>



<p>Clustering for forensic analysis</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Endpoint protection</strong></span></p>



<p>Regression to predict the next system call for executable process and compare it with real ones</p>



<p>Classification to divide programs into such categories as malware, spyware and ransomware</p>



<p>Clustering for malware protection on secure email gateways (e.g., to separate legal file attachments from outliers</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Application security</strong></span></p>



<p>Regression to detect anomalies in HTTP requests (for example, XXE and SSRF attacks and auth bypass)</p>



<p>Classification to detect known types of attacks like injections (SQLi, XSS, RCE, etc.)</p>



<p>Clustering user activity to detect DDOS attacks and mass exploitation</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>User behaviour</strong></span></p>



<p>Regression to detect anomalies in User actions (e.g., login in unusual time)</p>



<p>Classification to group different users for peer-group analysis</p>



<p>Clustering to separate groups of users and detect outliers</p>



<p><span class="has-inline-color has-vivid-red-color"><strong>Process behaviour</strong></span></p>



<p>Regression to predict the next user action and detect outliers such as credit card fraud</p>



<p>Classification to detect known types of fraud</p>



<p>Clustering to compare business processes and detect outliers</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">What Black Hat Hackers and Threat Actors could do with ML</span></strong></p>



<p>There could be many applications for hackers and offensive security teams. Here below a few examples</p>



<p><strong><span class="has-inline-color has-vivid-red-color">Breaking captcha</span></strong><span class="has-inline-color has-black-color"> </span>: <a rel="noreferrer noopener" href="http://shorturl.at/cdfI0" target="_blank">http://shorturl.at/cdfI0</a>. The captcha protection is broken using computer vision and image classification ML techniques</p>



<p><strong><span class="has-inline-color has-vivid-red-color">Keyboard strokes detection and password guessing</span></strong><span class="has-inline-color has-black-color"> </span>: <a rel="noreferrer noopener" href="http://shorturl.at/ruwAX" target="_blank">http://shorturl.at/ruwAX</a>. By listening to the keystrokes, the model is able to detect the key with a good reliability, hence guess passwords and other credentials</p>



<p><strong><span class="has-inline-color has-vivid-red-color">Automated spear phishing</span></strong> : <a rel="noreferrer noopener" href="http://shorturl.at/gjnrE" target="_blank">http://shorturl.at/gjnrE</a>. This technique allows to target high profile accounts on Twitter</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">A Machine Learning Case Study&nbsp;: Network intrusion detection using CNN</span></strong></p>



<p>In this long paragraph, we are going to follow a realistic application of ML, using <strong><span class="has-inline-color has-vivid-red-color">Convolutional Neural Network (CNN)</span></strong></p>



<p>Nowadays, there are an enormous number of attacks over the Internet that makes our information to be continuously at risk. <strong><span class="has-inline-color has-vivid-red-color">Intrusion Detection Systems (IDS)</span></strong> are used as a second line of defense. They observe suspicious actions in the network to detect attacks</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-39.png" alt="" class="wp-image-1062" width="406" height="214" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-39.png 385w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-39-300x158.png 300w" sizes="(max-width: 406px) 100vw, 406px" /></figure>



<p>Intrusion detection in Networks requires to be able to identify typical signatures of intruders, and do not confuse them with regular anomalies or normal traffic</p>



<p>In an attempt to find known attacks or unusual behavior, IDS traditionally inspect the contents of every packet. The problem of packet inspection, however, is that it is hard, or even impossible, to perform it at the speed of multiple Gigabits per second. And, the quantity of data is usually so huge that it makes the analysis practically difficult</p>



<p>For high-speed lines, it is therefore important to investigate alternatives to packet inspection. One option is <strong><span class="has-inline-color has-vivid-red-color">flow-based intrusion detection</span></strong>. With such approach, <strong><span class="has-inline-color has-vivid-cyan-blue-color">the communication patterns within the network are analyzed</span></strong>, instead of the contents of individual packets</p>



<p>This Case Study uses this approach, selecting only the features related to the network flow. Because of their ability to handle huge amount of data and identify patterns, deep learning techniques are a tooling of choice</p>



<p>So, we are going to go through some recent research to showcase the ability of deep learning for IDS. We will use the Research Study available here : <a href="https://github.com/CharlesMure/cassiope-NIDS" target="_blank" rel="noreferrer noopener">https://github.com/CharlesMure/cassiope-NIDS</a></p>



<p>The overall principle will be as per the flow chart below. The target is to collect data from our Network, and compare it with the training set, to be able to predict if our Network is free from cyber threats. We will be able to categorize the attacks per usual types</p>



<figure class="wp-block-image size-large"><img decoding="async" width="605" height="190" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-40.png" alt="" class="wp-image-1063" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-40.png 605w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-40-300x94.png 300w" sizes="(max-width: 605px) 100vw, 605px" /></figure>



<p>Basically, the software <strong><span class="has-inline-color has-vivid-red-color">argus</span></strong> will <strong>collect data</strong> from our Network flow</p>



<p>The ML model will perform the following tasks :</p>



<ol class="wp-block-list" type="1"><li><strong><span class="has-inline-color has-luminous-vivid-orange-color">Process </span></strong>the data from our network flow</li><li><strong><span class="has-inline-color has-luminous-vivid-orange-color">train</span></strong> using an existing dataset</li><li><strong><span class="has-inline-color has-luminous-vivid-orange-color">categorize</span></strong> our network flow</li></ol>



<p>The <strong><span class="has-inline-color has-vivid-red-color">training dataset</span></strong> will be downloaded from the University of New South Wales : <a href="https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/" target="_blank" rel="noreferrer noopener">The UNSW-NB15 data set description (adfa.edu.au)</a></p>



<p>This dataset was obtained recording Network data, detecting suspicious flows of data (attacks). These attacks are categorized as per the tables below, and can be categorized because <strong><span class="has-inline-color has-vivid-cyan-blue-color">each type of attack has a typical Network signature</span></strong> (number of connections to the same IP, packet size…)</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-41.png" alt="" class="wp-image-1064" width="311" height="107" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-41.png 344w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-41-300x103.png 300w" sizes="(max-width: 311px) 100vw, 311px" /></figure>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-42.png" alt="" class="wp-image-1065" width="309" height="457" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-42.png 355w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-42-203x300.png 203w" sizes="(max-width: 309px) 100vw, 309px" /></figure>



<p>Along the way, we will be using some well-known ML <strong><span class="has-inline-color has-vivid-red-color">libraries</span></strong> :</p>



<p><strong><span class="has-inline-color has-vivid-red-color">SciKit-learn</span> </strong>:<a href="https://scikit-learn.org/stable/" target="_blank" rel="noreferrer noopener">https://scikit-learn.org/stable/</a></p>



<figure class="wp-block-image size-large"><img decoding="async" width="355" height="166" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-43.png" alt="" class="wp-image-1066" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-43.png 355w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-43-300x140.png 300w" sizes="(max-width: 355px) 100vw, 355px" /></figure>



<p>It is a tooling set for ML, in Python</p>



<p>Simple and efficient tools for predictive data analysis</p>



<p>Accessible to everybody, and reusable in various contexts</p>



<p>Built on NumPy, SciPy, and matplotlib<strong></strong></p>



<p><strong><span class="has-inline-color has-vivid-red-color">Tensorflow</span></strong> : <a href="https://github.com/tensorflow/tensorflow" target="_blank" rel="noreferrer noopener">https://github.com/tensorflow/tensorflow</a></p>



<p>TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications</p>



<p><strong><span class="has-inline-color has-vivid-red-color">Keras</span></strong> : <a href="https://keras.io/" target="_blank" rel="noreferrer noopener">https://keras.io/</a></p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-44.png" alt="" class="wp-image-1067" width="331" height="185" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-44.png 317w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-44-300x168.png 300w" sizes="(max-width: 331px) 100vw, 331px" /></figure>



<p>Keras is an API. Keras follows best practices for reducing cognitive load: it offers consistent &amp; simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear &amp; actionable error messages</p>



<p>“Keras is to Deep Learning what Ubuntu is to Operating Systems”</p>



<p>So, let’s dive in our Case Study</p>



<p><strong><span class="has-inline-color has-vivid-red-color">Data collection from our Network flow</span></strong></p>



<p>So, we are going to use <strong><span class="has-inline-color has-vivid-red-color">argus</span></strong> : <a href="https://openargus.org/using-argus" target="_blank" rel="noreferrer noopener">openargus &#8211; Using Argus</a></p>



<p><em>Argus processes packet data and generates summary network flow data. If you have packets, and want to know something about whats going on, argus is a great way of looking at aspects of the data that you can&#8217;t readily get from packet analyzers. How many hosts are talking, who is talking to whom, how often, is one address sending all the traffic, are they doing the bad thing? Argus is designed to generate network flow status information that can answer these and a lot more questions that you might have</em><em></em></p>



<p><em>Many sites use argus to generate audits from their live networks. argus can run in an end-system, auditing all the network traffic that the host generates and receives, and it can run as a stand-alone probe, running in promiscuous mode, auditing a packet stream that is being captured and transmitted to one of the systems network interfaces. This is how most universities and enterprises use argus, monitoring a port mirrored stream of packets to audit all the traffic between the enterprise and the Internet. The data is collected and then the data is stored in what is described as an argus archive, or a MySQL database</em><em></em></p>



<p><em>From there, the data is available for forensic analysis, or anything else one may want to do with the data, such as performance analysis, or operational network management</em><em></em></p>



<p>To install argus, we shall download both argus and argus-client <a href="https://openargus.org/getting-argus" target="_blank" rel="noreferrer noopener">openargus &#8211; Getting Argus</a></p>



<p>You get two archives :</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">argus-3.0.8.2.tar.gz</span></strong></p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">argus-clients-3.0.8.2.tar.gz</span></strong></p>



<p>You just need to unzip these files with the Linux command : <strong>tar -xvf</strong> myfile</p>



<p>argus will need also additional packages</p>



<figure class="wp-block-image size-large"><img decoding="async" width="605" height="117" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-45.png" alt="" class="wp-image-1068" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-45.png 605w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-45-300x58.png 300w" sizes="(max-width: 605px) 100vw, 605px" /></figure>



<p>To install and run argus, follow the steps : <a href="https://openargus.org/documentation" target="_blank" rel="noreferrer noopener">openargus &#8211; Documentation</a></p>



<p>You can start collecting network data with argus ! Here’s an example of captured data</p>



<figure class="wp-block-image size-large"><img decoding="async" width="605" height="153" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-46.png" alt="" class="wp-image-1069" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-46.png 605w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-46-300x76.png 300w" sizes="(max-width: 605px) 100vw, 605px" /></figure>



<p><strong><span class="has-inline-color has-vivid-red-color">Installation of ML packages</span></strong></p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="37" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-48-1024x37.png" alt="" class="wp-image-1071" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-48-1024x37.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-48-300x11.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-48-768x27.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-48.png 1149w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<p>Then, you can download the necessary files from <a href="https://github.com/CharlesMure/cassiope-NIDS" target="_blank" rel="noreferrer noopener">https://github.com/CharlesMure/cassiope-NIDS</a></p>



<p>You will get some Python scripts</p>



<figure class="wp-block-image size-large"><img decoding="async" width="309" height="153" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-47.png" alt="" class="wp-image-1070" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-47.png 309w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-47-300x149.png 300w" sizes="(max-width: 309px) 100vw, 309px" /></figure>



<p><strong><span class="has-inline-color has-vivid-red-color">model_train.py</span></strong></p>



<p>To train the model, you just need to launch the script model_train.py</p>



<p><strong><span class="has-inline-color has-luminous-vivid-orange-color">python3 model_train.py</span></strong></p>



<p>This script is the “learning” component, that will train using the dataset</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="153" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-50-1024x153.png" alt="" class="wp-image-1073" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-50-1024x153.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-50-300x45.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-50-768x115.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-50.png 1143w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<p>It will train using the CNN model. Here below an example of the CNN model structure using an image recognition (interpret a written number “3” as the correct number)</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="588" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-51-1024x588.png" alt="" class="wp-image-1074" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-51-1024x588.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-51-300x172.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-51-768x441.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-51.png 1147w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<p>The corresponding Python script is the following. It includes several layers, with convolutions, and the activation function “ReLu”</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-52-1024x496.png" alt="" class="wp-image-1075" width="733" height="355" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-52-1024x496.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-52-300x145.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-52-768x372.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-52.png 1144w" sizes="(max-width: 733px) 100vw, 733px" /></figure>



<p>Once the learning phase is done, the results are recorded into a file together with the associated CNN “weights”, for re-use in the next steps of prediction</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-53.png" alt="" class="wp-image-1076" width="270" height="114" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-53.png 429w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-53-300x127.png 300w" sizes="(max-width: 270px) 100vw, 270px" /></figure>



<p><strong><span class="has-inline-color has-vivid-red-color">monitoring.py</span></strong></p>



<p>This script is going to extract features from the logs collected by argus</p>



<p>These features are useful to check if the captured packets are suspicious or not</p>



<p>Some features that can be extracted by argus are the ones below. They will be necessary for our model</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-56.png" alt="" class="wp-image-1079" width="348" height="378" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-56.png 666w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-56-276x300.png 276w" sizes="(max-width: 348px) 100vw, 348px" /></figure>



<p>This allows to extract the necessary features for our ML model</p>



<p>At first, we open one argus log</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-57.png" alt="" class="wp-image-1080" width="265" height="65" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-57.png 430w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-57-300x74.png 300w" sizes="(max-width: 265px) 100vw, 265px" /></figure>



<p>The features are calculated and extracted “on the go”, in a python list : for example with the feature ct_dst_sport_ltm</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-58.png" alt="" class="wp-image-1081" width="380" height="59" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-58.png 680w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-58-300x46.png 300w" sizes="(max-width: 380px) 100vw, 380px" /></figure>



<p>An additional treatment is done to consolidate the data</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-59-1024x61.png" alt="" class="wp-image-1082" width="700" height="42" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-59-1024x61.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-59-300x18.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-59-768x46.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-59.png 1313w" sizes="(max-width: 700px) 100vw, 700px" /></figure>



<p><strong><span class="has-inline-color has-vivid-red-color">DeepLInspect.py</span></strong></p>



<p>Now, this script can “correlate” the extracted features from argus, with our training set output</p>



<p>At first, it will load all necessary files</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-60.png" alt="" class="wp-image-1083" width="304" height="171" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-60.png 505w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-60-300x169.png 300w" sizes="(max-width: 304px) 100vw, 304px" /></figure>



<p>Then, the script will perform all necessary “correlations” to predict if our Network flow is suspicious or not, if so, it will classify it in the appropriate categories which are coming from the training dataset</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.forensicxs.com/wp-content/uploads/2020/11/image-61-1024x64.png" alt="" class="wp-image-1084" width="741" height="46" srcset="https://www.forensicxs.com/wp-content/uploads/2020/11/image-61-1024x64.png 1024w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-61-300x19.png 300w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-61-768x48.png 768w, https://www.forensicxs.com/wp-content/uploads/2020/11/image-61.png 1144w" sizes="(max-width: 706px) 89vw, (max-width: 767px) 82vw, 740px" /></figure>



<p><strong><span class="has-inline-color has-vivid-red-color">What are the next steps ?</span></strong></p>



<p>From the prediction script, we could automatize the detection and classification and Network flow between normal and suspicious flow, making our IDS a powerfull tool</p>



<p>There are a number of limitations to bear in mind, however</p>



<p>At first, to make good predictions, <strong><span class="has-inline-color has-vivid-red-color">you will need a good Dataset</span></strong>, that means fresh data (difficult to get and classify, as this is quite a huge task) and a good balance between categories (our dataset above, is from years ago, and we don’t have a good balance between categories). That means also, that such a dataset needs to be representative of the latest types of attacks and their typical signatures. In addition, the datas need to be comprehensive (our dataset misses the timestamp, for example)</p>



<p>Secondly, an IDS machine learning model would need to dive deeper into the analysis, by capturing some parts of the packets (at least the headers), to understand which content, which applications are used by the suspicious packets. This would require much more data “<strong><span class="has-inline-color has-vivid-red-color">Big Data</span></strong>”to be acquired and huge repositories and hardware to proceed with the calculations</p>



<p>Third, an IDS that would be using a <strong><span class="has-inline-color has-vivid-red-color">Hybrid approach</span></strong>, both signature based and Machine Learning based, would be quite powerful, as it would be able to classify the attacks with more accuracy, thus allowing to implement retaliation strategies against these attacks, to block them without doubt</p>



<p>Last but not least, to continue developing efficient IDS, there will be a strong need to deeply understand the attacks, the techniques used by them, in order to define the right features and categories. And, a strong understanding of the appropriate ML algorithms will be needed. Therefore a <strong><span class="has-inline-color has-vivid-red-color">combined Expertise</span></strong> in Network Forensics and Machine Learning will be necessary</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">Recommendations for the future</span></strong></p>



<p>With machine learning, cybersecurity systems can analyze patterns and learn from them to help prevent similar attacks and respond to changing behavior. It can help cybersecurity teams be more proactive in preventing threats and responding to active attacks in real time. It can reduce the amount of time spent on routine tasks and enable organizations to use their resources more strategically</p>



<p>As ML is based upon data, a must have for an efficient ML cybersecurity is to gather data in a structured way. The data must have complete, relevant and rich context collected from every potential source—whether that is at the endpoint, on the network or in the cloud</p>



<p>It all starts with taking the right approach to data. One of the biggest challenges is getting data from the endpoint, network and cloud and normalizing it into one state, so that it can be used effectively for machine learning</p>



<p>When it comes to cybersecurity, the potential for machine learning to have a dramatic and lasting impact is real. But only for companies that are forward-thinking enough to take care of their data first</p>



<hr class="wp-block-separator"/>



<p><strong><span class="has-inline-color has-vivid-cyan-blue-color">Additional reading</span></strong></p>



<p>A very good synthesis and potential applications of ML to cybersecurity : <a href="https://towardsdatascience.com/machine-learning-for-cybersecurity-101-7822b802790b" target="_blank" rel="noreferrer noopener">https://towardsdatascience.com/machine-learning-for-cybersecurity-101-7822b802790b</a></p>



<p>A huge ML repository dedicated to cybersecurity : <a rel="noreferrer noopener" href="https://github.com/jivoi/awesome-ml-for-cybersecurity" target="_blank">https://github.com/jivoi/awesome-ml-for-cybersecurity</a></p>



<p>The challenges of data gathering for an efficient ML application to cybersecurity : <a href="https://www.securityroundtable.org/the-growing-role-of-machine-learning-in-cybersecurity/" target="_blank" rel="noreferrer noopener">https://www.securityroundtable.org/the-growing-role-of-machine-learning-in-cybersecurity/</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.forensicxs.com/machine-learning-applied-to-cybersecurity-and-hacking/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/?utm_source=w3tc&utm_medium=footer_comment&utm_campaign=free_plugin

Page Caching using Disk: Enhanced 
Database Caching using Disk (Request-wide modification query)

Served from: www.forensicxs.com @ 2026-01-02 02:21:12 by W3 Total Cache
-->