Hello Everyone, Welcome to a New
series of Binary Exploitation, this is the first part of binary exploitation,
there are gonna be many more parts for this binary exploitation series. In this
Series I will start with Assembly
Basics, required concepts for basic binary exploitation in layman terms. So, if
you are new to either assembly or binary exploitation or buffer overflow – you
are pretty much welcome here, because All the basics required for binary
exploitation are explained in detail in this series. I am putting a lot of
thought into this to make it as easy as possible and trying to cover most
important and basic concepts required to learn assembly and start with binary
exploitation. If you are interested in going directly to Binary Exploitation - here is the Part II Linux 32-bit binary exploitation.
This Series consists of 32-bit
Assembly Basics, Concepts, Binary Exploitation, Buffer Overflow – Return to
libc exploitation.
Contents:
1. What is Assembly
2. Why Assembly
3. Decimal
4. Binary
- Binary to Decimal Conversion
- Decimal to Binary Conversion
5. Hexadecimal
- Hexadecimal to Decimal Conversion
- Decimal to Hexadecimal Conversion
6. Segment & Offset
7. Data Types in Assembly
8. Registers
a. General Purpose Registers
b. Segment Registers
c. Stack Registers
d. Special Purpose Registers
9. Structure of Assembly Program
10. Linux System Calls
11. Executing an Assembly Program
12. Writing a Hello World Program in
Assembly language
Before Directly Jumping into Binary Exploitation, Some basics are important, lets hop on to them first.
Binary Exploitation?
You need to understand the basics of
assembly, Registers, Hex, Binary, Hexadecimal. I will explain about registers
and assembly basics which are required for this tutorial. The binary I am going
to exploit in this series is an intended vulnerable binary vulnerable to Buffer
overflow – Return to libc attack.
What is Assembly?
Assembly is a Low-level programming language. Programs written in assembly languages are compiled by an assembler. Every assembler has its own assembly language, which is designed for one specific computer architecture.
Why Assembly?
1. If Something crashes on windows/linux
– you will get a response it usually returns the location/action that caused
the error, if you are to solve that error – knowing assembly is the only way to
trouble shoot low level memory problems.
2. If you need precise control over what
your program is doing, a high-level language is never powerful enough to give
you full security.
3. Even the most optimized high-level
language compiler is still just a general compiler, thus the code it produces
is also general/slow code. If you have a
specific task, it will run faster in optimized assembly than in any other language.
4. I main reason would be the programming
languages that you already know like python, java, c++ gives you limited
functions, features but in assembly you are limited by the hardware you own
only, you can play around with memory and CPU instructions to a great extent –
which is pretty much fun.
Topics to Know Before Getting into Assembly
1) Decimal:
The decimal system is a base 10 system, meaning that it consists of 10
numbers that are used to make all the numbers 0 -9.
Example: Let’s take 275
Hundreds
|
Tens
|
Units
|
|
Digit
|
2
|
7
|
5
|
Explanation
|
2x10^2
|
7x10^1
|
5x10^0
|
Value
|
200
|
70
|
5
|
So, the output is 200+70+5 = 275.
Lets take any example of 3456
Thousands
|
Hundreds
|
Tens
|
Units
|
|
Digit
|
3
|
4
|
5
|
6
|
Explanation
|
3x10^3
|
4x10^2
|
5x10^1
|
6x10^0
|
Value
|
3000
|
400
|
50
|
6
|
So,
the output is 3000+400+50+6 = 3456
Well, that’s how decimal system works.
I guess you got no doubts regarding this. So, Let’s move onto the next one.
2) Binary: Binary system is a base 2 system. It
consists of only (2 Values à 0,1)
0’s and 1’s because computer can understand only binary, so you should
understand how a binary is being converted.
Binary to Decimal Conversion:
You need to multiply the Binary value
with “unit value with the power of (2)”.
Let’s take the binary value 11001 and
convert it to decimal:
Total
|
||
1 x
|
2^4
|
16
|
1 x
|
2^3
|
8
|
0 x
|
2^2
|
0
|
0 x
|
2^1
|
0
|
1 x
|
2^0
|
1
|
Now add the total à 16+8+0+0+1 = 25.
25 is the decimal number of Binary
numbers 11001. That’s how you do it. It might look complicated at first glance,
but if you try it once, you will get it in an instant.
Decimal to Binary Conversion: This is much easier than converting binary to
decimal. All you need to do is take the remainder and paste it as it is in its
unit’s place.
Let’s take the number 275
275/2 = 1
137/2 = 1
68/2 = 0
34/2 = 0
17/2 = 1
8/2 = 0
4/2 = 0
2/2 = 0
0/2 =1
0/2 =1
So, the binary value of Decimal 275 is
100010011.
Points to Note:
· Divide the
original number by 2, if it divides evenly the remainder is 0, or else 1
· Repeat until you
get 0
· Usually 1
represents TRUE, and 0 FALSE
· 001001100 is equal
to 1001100, the zero’s in at the start of the value represent nothing – you can
leave them alone XD
3) Hexadecimal: Hexadecimal is base 16 system.
Everything related to memory is a multiple of 4, for example memory allocation
starts with 8 bits,8bytes,16 bytes,32,64,128,256,512 and so on. Since hexadecimal is a base 16 system – it’s
perfect for computers to use hexadecimal. Also, Hex is nothing but hexadecimal,
Hex is the short form for hexadecimal, so if don’t think they are different.
You need to remember these before
getting into hexadecimal conversion
Hex
|
Decimal
|
Binary
|
0
|
0
|
0
|
1
|
1
|
1
|
2
|
2
|
10
|
3
|
3
|
11
|
4
|
4
|
100
|
5
|
5
|
101
|
6
|
6
|
110
|
7
|
7
|
111
|
8
|
8
|
1000
|
9
|
9
|
1001
|
A
|
10
|
1010
|
B
|
11
|
1011
|
C
|
12
|
1100
|
D
|
13
|
1101
|
E
|
14
|
1110
|
F
|
15
|
1111
|
Hexadecimal to Decimal Conversion:
Lets take “D80” Hexadecimal value as
an example to convert D80 into Decimal value.
D80
|
16^Unit’s
Position
|
Decimal
to Hex Value *16^ units’ position
|
Total
|
D x
|
16^2
|
13 x
256
|
3328
|
8 x
|
16^1
|
8 x
16
|
128
|
0 x
|
16^0
|
0 x
1
|
0
|
So, the total is 3328+128+0 =
3456. So, D80 is the hexadecimal value
for 3456 Decimal number.
Decimal to Hexadecimal Conversion:
Let’s take the Decimal value 3456 and convert
it to Hexadecimal, you should always go in Little Endian format (Reverse Order)
3456/16 = 216
216*16 = 3456
3456-3456 =0. So, the Hexadecimal
value for Decimal 0 is 0
216/16 = 13
13*16 = 208
216-208 = 8. So, the Hexadecimal value
for Decimal 8 is 8
13/16 = 0
0*16 = 0
13-0 = 13. So, the Hexadecimal Value
for Decimal 13 is D
Finally, the D80 is the hexadecimal
value of Decimal value 3456. Hope you understood this, if not you can drop a
comment below.
Note:
- Hex = Hexadecimal
- In windows environments, hex is mostly represented as 0D80
- In Unix Environments, hex is represented as 0xD80
Segment & Offset:
Everything on your computer is
connected through a series of wires called the BUS. The BUS to the RAM is 16 bits. So, when the processor needs to write to the
RAM, it does so by sending the 16-bit location through the bus. In the old days this meant that computers
could only have 65535 bytes of memory (16 bits = 1111111111111111 =
65535).
That was plenty back then, but today
that's not quite enough. So, designers
came up with a way to send 20 bits over the bus, thus allowing for a total of 1
MB of memory.
Memory is segmented into a collection
of bytes called Segments and can be access by specifying the Offset number
within those segments. So, whenever the processor
wants to access data, it first sends the Segment number, followed by the Offset
number.
Before you get into assembly programming, you need to understand the data types & registers in assembly. Registers are the most important things in assembly. without registers - there is no memory allocations and processing. For that reason i will explain the data types in assembly, List out the types of registes with a very brief explanation.
Before you get into assembly programming, you need to understand the data types & registers in assembly. Registers are the most important things in assembly. without registers - there is no memory allocations and processing. For that reason i will explain the data types in assembly, List out the types of registes with a very brief explanation.
Let’s get into Assembly Basics now,
Bits are the smallest unit of data on
a computer. Each bit can only represent 2 numbers, 1 and 0. Bits are useless because they're so damn
small, so we got the nibble. A nibble is a collection of 4 bits. The most important data structure used
by your computer is a Byte. A byte is
the smallest unit that can be accessed by your processor. It is made up of 8 bits, or 2 nibbles. A word
is simply 2 bytes, or 16 bits. Originally
a Word was the size of the BUS from the CPU to the RAM. Today most computers have at least a 32bit
bus but, most people were used to 1 word = 16 bits, so they decided to keep it
that way.
Data Types in Assembly:
Byte
|
8 Bits
|
Word
|
16 bits (2 Bytes)
|
Double Word (Dword)
|
32 Bits (2 Words)
|
Quad Word (Qword)
|
64 Bits (2 Dwords)
|
Registers in Assembly:
A processor contains small areas that
can store data. They are too small to
store files, instead they are used to store information while the program is
running.
Registers can be divided into
following categories:
When the x86 came out it added 4 new registers to that category: EAX, EBX, ECX, and EDX. The E stands for Extended, and that's just what they are, 32bit extensions to the originals.
1) General Purpose Registers:
All general-purpose registers are 16 bit and can be broken up into two 8-bit
registers. For example, AX can be broken
up into AL and AH.
· AX
– Accumulator:
- Made up of: AH, AL
- Common uses: Math operations, I/O operations, INT 21
·
BX
– Base:
- Made up of: BH, BL
- Common uses: Base or Pointer
·
CX
– Counter
- Made up of: CH, CL
- Common uses: Loops and Repeats
·
DX
– Displacement
- Made up of: DH, DL
- Common uses: Various data, character output
When the x86 came out it added 4 new registers to that category: EAX, EBX, ECX, and EDX. The E stands for Extended, and that's just what they are, 32bit extensions to the originals.
2) Segment Registers:
CS - Code Segment. The memory block that stores code
DS - Data Segment. The memory block that stores data
ES - Extra Segment. Commonly used for video stuff
SS - Stack Segment. Register used by the processor to store
return addresses from routines
3) Stack Registers:
- BP - Base pointer. Used in conjunction with SP for stack operations
- SP - Stack Pointer.
4) Special Purpose Registers:
IP - Instruction Pointer. Holds the offset of the instruction being
executed
Flags - These are a bit different from
all other registers. A flag register is
only 1 bit in size. It's either 1
(true), or 0 (false). There are several
flag registers including the Carry flag, Overflow flag, Parity flag, Direction
flag, and more. You don't assign numbers
to these manually. The value
automatically set depending on the previous instruction.
Memory Segments in Assembly:
data & bss = to store variables
heap = Location of memory where you
can store and manipulate data dynamically using some programming language
Stack = managed by the compiler, it is
at the bottom
Structure of Assembly Program:
Data Types in .DATA segment
.byte = 1 byte
.ascii = string
.asciz = Null Terminated String
.int = 32-bit integer
.short = 16-bit integer
.float = single precision floating
point number
.double = double precision floating
point number
Data types in .BSS Segment
.comm -- declares common memory area
.lcomm - declares local common memory
area
Space
created at Runtime; whatever you define here is not going to occupy any space
inside the executable which shall be created using assembler and linker.
Linux System Calls:
The Next important concept required to understand 32-bit assembly in Linux
is Linux System calls, these are used to make requests for any user to get some
output.
Before you start programming assembly,
you need to understand how Linux system calls works as we will be using them a
lot. We can use these system calls to execute commands, functions. In Assembly programming
Sys calls can be used with libraries which can make requests to kernel modules
and get the required output. Sys calls are helpful in buffer overflow
exploitations as well.
Examples: exit(), read(), write() etc.
Arguments to syscall: whenever you are going to invoke a
Linux system call, you need to load appropriate registers with appropriate
arguments which system call will require.
EAX - System
call number
EBX - First
Argument
ECX - Second
Argument
EDX - Third
Argument
ESI - Fourth
Argument
EDI - Fifth
Argument
for calls which require more than 5
arguments, we pass a pointer to the structure containing arguments.
System calls are invoked by processes
using a process interrupt - INT 0x80
when interrupt is invoked kernel calls
the system call interrupt handler which takes all arguments and does required
based on system call number.
Assembly Program to Execute System
call:
Defining a system call:
exit(0) --> is the sys call used to
exit a program, Explanation for the above program.
1. sys call number for exit() is 1, so
load EAX with 1, mov instruction load the value 1 into eax register(%eax)
2. "Status" is lets say
"0" - EBX must be loaded with 0
3. Calling the syscall - Raising the
interrupt 0x80
Executing an Assembly Program:
1) An
unix architecture Assembly program in most of the cases should be saved
with an extension ".s". So, always save your assembly program with an
extension of .s
2) You need to create an Object file and
Compiling the program using gnu assembler
3) Use linker to make it into an executable
Writing a Hello World Program in Assembly language:
Writing a hello world program in
assembly is not as easy as in other programming languages. You need to get a
good understanding of system call functions write and exit. So, let me explain
the how to write a Hello World program in Assembly.
Step 1: write() syscall to print the
"Hello world" message
Step 2: use exit() to exit the program
So, how to you write some data in
assembly? You need to understand the underlying functions used by write
function and syscall.
We need to follow this to write data
in Assembly
write() takes 3 arguments:
·
file
descriptor in which it needs to write,
·
buffer
- where the actual data written is to be stored
·
Count:
number of bytes - which needs to be written in the beginning
So, how do we achieve this?
There file descriptor numbers for all standard
streams – there are 3 standard streams in total. Standard input, Standard output,
Standard error. In the same way sys call number for write() is 4.
Here are the commonly used file
descriptor numbers
1) stdin, file descriptor 0
2) stdout, file descriptor 1
3) stderr, file descriptor 2
Explanation
As Explained above write() takes a
syscall, file descriptor, buffer and count. All These 4 should be passed for
successful execution.
1) We need to call write syscall to write
something. Sys call number for write() is 4, Store ‘4’ in EAX
2) After writing some data, as we need to
output the info, we need to use “STDOUT”. The File descriptor for STDOUT is “1”.
So, store 1 in EBX
3) The data to be written is the buffer, So
Buf = pointer to a memory location containing "Hello World" String.
Store “Hello World” in ECX.
4) Size of the string should be given as
a Count, So, pass “11” which is
the size of “Hello World” (including space) in EDX.
Hello World Program in Assembly:
Executing:
That's it, for this post. As we are done with least
of the basics, I think you can at least get a vague idea of what is going on in
the debugger if you read this whole article. In the next post of this series, I
will explain Linux 32-bit binary with an example. So, stay tuned and if you have
any feedback – please comment below.
================== HACKING DREAM ===================
Main Principle of My Hacking Dream is to Promote Hacking Tricks and Tips to All the People in the World, So That Everyone will be Aware of Hacking and protect themselves from Getting Hacked. Hacking Don’t Need Agreements.
I
Will Be Very Happy To Help You, So For Queries or Any Problem Comment
Below or You Can Send out a Mail At Bhanu@HackingDream.net
No comments:
Post a Comment