Introduction to Malware Analysis

In this article, I’m going to go through a detailed review of a typical malware, as an introduction to malware analysis . I will cover techniques such as static and dynamic analysis, reverse engineering & disassembly, and more. For this purpose, I’m going to take a sample provided in the excellent book “Practical Malware Analysis“. I highly recommend to buy a copy of this book as it’s really a must have for any malware analyst. It will really take your hacking skills to the next level

The book proposes a malware library, for learning purposes, that can be downloaded here : https://practicalmalwareanalysis.com/labs/

I have selected the Lab 7-3 for this article, as it’s asking for basic malware analysis techniques, but still provides an already complex piece of malware and a very good learning opportunity

To analyse this malware, you shall install a Virtual Machine (I have installed a copy of Windows 10 Enterprise in Virtual Box) and run it from within your VM. Don’t run it on your normal PC !

Please note that this book has been written with Windows XP in mind…so running the malwares on Windows 10 may not provide all the expected results. But still, it provides an opportunity to learn as is the case with the example below

There are two files available for analysis

Malware Static Analysis

This is the process of analyzing the code or structure of a program to determine its function. The program is not yet run at this point. Let’s go through, using some basic tools

VirusTotal.com

This site, owned by Google, is almost a mandatory first step for any malware analyst. It will run a scan of the file and detect known virus signatures, with some details. Let’s upload our files

Lab07-03.exe

One issue that is immediately visible here, is the absence of standardization, naming the virus ! In fact, each anti virus editor provides a unique name

Let’s pick a few of them and learn a few things from Google

This virus is usually a Trojan, that is capable of performing several tasks such as downloading, installing or running malware on the targeted computer

This malware is known to be a Backdoor, that is capable of installing all manner of malware on your computer

Let’s look into the details section and learn a few more things

We see that our executable is in the Portable Executable (PE), as all Windows executables and DLLs will be. It’s a data structure that contains the information necessary for the Windows OS loader to manage the wrapped executable code. In our case, it’s mainly written in C++

Lab07-03.dll

Let’s search the Web again to learn more from these signatures

This Trojan acts as a malicious threat into your Windows computer system, that is capable of sending informations to a remote hacker

Timestamp

Let’s compare the timestamps of the .exe and the .dll, as found in VirusTotal

Here is the first one :

timestamp of Lab07-03.exe

to be compared with the second one :

timestamp of Lab07-03.dll

We see a very close match between the two files. This points to a creation by the same malware author

Detecting Packers with PEiD

Malwares are often obfuscated or packed to make their analysis more difficult. Obfuscation means that the execution code has been hidden in some way. Packing is a subset of Obfuscation, in which the code has been compressed and cannot be analyzed easily. These two techniques will make static analysis difficult

Here below is what PEiD does : in the red marked area, it will prompt the packing method, if any. In our case, the file is identified as a C++ file, hence it’s not packed

We get the same result with the .dll file

Strings

A normal program can be easily looked into for strings. This will usually allow to find interesting text that helps understand the purpose of the program. But, in the case of obfuscated or packed programs, there will be only a few strings readable, or none

We have seen above that our files are not packed, so we should be able to gather informations. To find the strings, we can use the strings command in Linux

strings command applied to Lab07-03.exe

In section 1, we see that the malware is manipulating some system files, which looks consistent with a Trojan trying to access our files

In section 2, we see the call to some Libraries : KERNEL32.dll is a common dll that contains core functionality, such as access and manipulation of memory, files, and hardware. The MSCVRT.dll contains program code that enables applications written in Microsoft Visual C++ to run properly, this is the case with our malware, as seen previously

In section 3, we see two lines which look identical at first glance, but can see that the Kernel32.dll library is also written as Kerne132.dll (the “l” is replaced by a “1”). This looks like a basic obfuscation and a potential attempt to replace the legit library by a malicious one

In section 4, we see the implementation of the malicious library in the system files system32

In section 5, we see the call to the Lab07-03.dll, which is another indication that the two files are strongly linked together

In section 6, we see the reference to the legit library in the system files system32

In section 7, we see a threatening message

strings command applied to Lab07-03.dll

In section 1, we see the creation of a Mutex. It’s an object that coordinates multiple processes and threads. Usually, a Mutex is called to ensure that only one process is running at a given time, thus ensuring that only one version of a malware is up and running

In section 2, we see the inclusion of the library WS2_32.dll, which is a networking dll. A malware that accesses this library is likely to connect and perform network related tasks

In section 3, we can identify an IP address 127.26.152.13, this could be the IP used by a Command and Control server, but in this case we identify that this is a loopback address

What is this ? Here is a definition

Let’s ping this IP adress. We receive an acknowledgement, although this is a virtual address within our internal network. Nothing is sent over the Internet !

Ping to the loopback IP 127.26.152.13

We also see a strange text “SADFHUHF” . This may be useful later on in our analysis, let’s remember

pestudio

This software will greatly complement what we have seen above. We can confirm the libraries imported in the .exe and the .dll, with the corresponding strings

Lab07-03.exe
Lab07-03.dll

Malware Dynamic Analysis

This is the examination of the malware, either during its execution, or examining the system after the malware has been run. It is usually an efficient way to identify malware functionality. We are going to install, and run the tools below, before and after executing the malware

Screenshot of the dynamic analysis tools

Procmon

Process Monitor, or procmon, provides a way to monitor registry, file system, network, process and thread activity. It monitors all system calls as soon as it runs

We can follow our malware when we launch it

Lab07-03.exe is run and highlighted in blue

If we click on kernel32.dll, we can see a bit more details

We can’t find the kerne132.dll. If it runs somewhere, it is hidden to our eyes

Process Explorer

You can use PE to list active processes, DLLs loaded by a process, various process properties, and overall system informations

When launching the malware, we see the process quickly appearing, and then disappearing after less than a second

Regshot

It is a registry comparison tool that allows to take and compare two registry snapshots. You shall take a registry shot before running the malware, as the tool will analyze any differences found after running the malware. In our case, here is an extract with a focus on Kernel32.dll

Please note we can’t find the kerne123.dll. If the malware calls this DLL, then it stays hidden obviously

Wireshark

It is a packet capture tool that intercepts and logs network traffic. It provides visualization, packet stream analysis, in-depth analysis of individual packets. Here is a snapshot

As said before, the IP address 127.26.152.13 being a loopback address, our malware will not communicate over the Internet. Therefore we can’t find any traces in Wireshark

Apate DNS

This tool provides a quick way to see DNS requests made by a malware. It spoofs DNS responses to a user specified IP address. It responds to the DNS requests made by the malware with this IP. It can therefore record and analyze the DNS requests, providing insightfull informations to the analyst

Again, we can’t find any outside calls as our malware calls a local loopback

Summary after Static & Dynamic Analysis

The above analysis has shown us that the malware is made of an .exe and .dll files. In real life, it would call for an external IP and would try to interact with a C&C server, to manipulate some of our system files. But, in this exercise, the IP is a local loopback and there is no interaction possible with a C&C. So, the dynamic analysis does not reveal important informations due to a lack of Internet interaction

The malware seems to rely on the kerne132.dll, trying to replace the kernel32.dll “behind the scene”, but the running process is still kernel32.dll. So far, this is only an hypothesis, nothing can really prove that in the above analysis

So, to really uncover the secrets of this malware, we really need to go deeper and try to Reverse Engineer the code. In fact, the methodes used so far only scratched the surface of the malware. These techniques are like trying to analyze a black box from outside. With Reverse Engineering, we can have a look inside !

Reverse Engineering

We are going to use two different well known softwares :

IDA Pro : https://www.hex-rays.com/products/ida/ (I will use it as the main analyzer)

Ghidra : https://www.nsa.gov/resources/everyone/ghidra/ (I will use it to show some C code constructs recovered by Ghidra)

Lab07-03.dll

After opening IDA Pro, let’s start listing the call instructions to get a quick overview

call instructions (with my additional comments)

Please note it corresponds to the Import functions section found in IDA Pro (as expected)

The best explanation for such a DLL, sending and receiving data, creating processes, is that it is designed to receive commands from a remote machine (a potential C&C server). At this point, we have a first overall understanding of what the Lab07-03.dll does. But we can analyze deeper. Before doing so, we notice that this DLL has no Export function, but it has an Entry Point

An Export function is usually necessary to provide a function to be Imported by the EXE. So the absence of this Export function is questionable. We don’t seem to have an answer to this question at this point

Destination address

Here is what we see just before the connect call

We see that the destination IP is 127.26.152.13. We had already found this IP during the strings analysis. We also see that the port is 50h or port 80, the port normally used for web traffic

In the C code below, we see the creation of the Mutex called SADFHUHF (seen earlier in the strings analysis). We also see the initialization of the loopback IP 127.26.152.13

C Code in Ghidra

Communication with the C&C server

Our machine is going to send a “hello” to the C&C

This is probably the message sent by our machine to the C&C, to confirm that we are ready to receive instructions

Receiving data from the C&C server

The data received will be pushed to the buffer “buf” and the call to recv will store the incoming network traffic on the stack

In addition, we also see the following instructions

What happens here is the following : if the C&C sends a sleep message, the loop will detect it and let our machine sleep for 60000h or about 394 seconds

Here is the corresponding C code

C code in Ghidra

Then, the C&C is sending an execution instruction to our machine using a backdoor. At first, the code is checking if the stack buffer has the “exec” instruction, using a string compare call. In such case, it will jump and call the CreateProcessA instruction. The CommandLine instruction will take any argument provided by the C&C (such as the path to an executable)

So, as a short wrap-up : the C&C is waiting to receive a “hello” instruction, then allows the attacker to implement a backdoor and launch an executable on our machine, going through port 80

Lab07-03.exe

Let’s continue using IDA Pro. We are going to learn many new things about our malware

Parameters to run the malware

When looking into the follow code, we realize that the program will stop immediately if the correct parameters are not implemented

The parameter “WARNING_THIS_WILL_DESTROY_YOUR_MACHINE” shall be used to run the program, otherwise the program will stop abruptly. Now we understand why the program was apparently stopping when we launched it in the dynamic analysis. We will try later on in this article, once we have understood this malware better !

Creation of kerne132.dll

In the following code, we notice that two files are opened : kernel32.dll and Lab07-03.dll

Then, the content of Lab07-03.dll is copied into the new file kerne132.dll, at the location C:\\windows\\system32\\kerne132.dll

We now understand that the file kerne132.dll is meant to imitate the legit kernel32.dll

We notice that the argument C:\\ * is passed to the sub process 4011E0

Searching the .exe files in our machine

In the sub process 4011E0, we see that the code is looking into a first file

then it continues with a loop mapping the file system

We notice that during these loops, the code looks for the files having a .exe extension

At this point, we can conclude that the malware searches the C: drive for EXE programs and will perform some actions with these files

What the malware does with .exe files

We shall analyze the sub process 4010A0 which is called each time a .exe file is found

At first, we notice that the sub process is mapping the file into memory with “CreateFileMappingA” and “MapViewofFile”

The rep movsd and the repmovsb are copying strings byte to byte in memory (similar to memcpy). We see that the dword_403010 is going to be copied and replace kernel32.dll. So, what is the content of dword_403010 ?

To find out, let’s click on dword_403010, we move to the data section

Then let’s convert the data into the corresponding string, by pressing the letter A on your keyboard (yes, it’s good to know !)

We find the following change. So, the kernel32.dll is being replaced by kerne132.dll

So finally, we have understood. Within the .exe files, the malware will look for the legit string kernel32.dll and replace it with the kerne132.dll. At the same time, Lab07-03.dll is copied into kerne132.dll and placed into C:\Windows\System32

In summary, Executables are modified to execute kerne132.dll instead of kernel32.dll

Now let’s run the malware with the correct parameter

I launch the Lab07-03.exe as follows

We now see that the Lab07-03.dll has a lot of Exports (duplicated). That means that Lab07-03.dll exports are in fact forwarded exports from kernel32.dll

In case the malware is executed, the code will execute as if the program were still calling the original kernel32.dll

Conclusion

With the use of several techniques (static, dynamic, reverse engineering), we have been able to understand the key concepts of this malware :

  • parse any .exe file installed into the C: drive, and forwards the kernel32.dll exports to the kerne132.dll
  • mimics the kerne132.dll to run the Lab07-03.dll content instead. This will launch a backdoor and execute commands from a distant C&C

So overall, this can be a pretty nasty malware, as its operations will be covert. Each time a .exe will be executed, it can launch the backdoor and perform tricky operations on your machine !