Introduction to Smali
euzadaThis course meant to be an introduction to Smali code. Smali is essentially a Java bytecode. A sentence of Java code corresponds to a sequence of multiple sentences of Smali code. Think about Smali code as an assembly code for Java. The two are very similar. They are a very low languages, a machine languages. Java language is written by humans and easy to read and write. In other hands, Smali code is a machine language
To fully understand Smali code, it is recommended to learn Java, at least the basic of Java. Smali code is a representation of Java code on a machine level. There are many online tutorials about Java. Please consider to self-study the basic if modding Android apps is your objective.
Let's take an example; the famous example of any programing language is "Hello world". The simplest Java code to print out the string "Hello World" is:
System.out.println("Hello World");
In Smali, it will be something like that:
# Get the out field in the System class and save it in v0
sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream;
# Save "Hello World" in v1
const-string v1, "Hello World"
# Call the virtual method println, and pass in the parameters v0, v1
# equivalent to v0.println (V1)
invoke-virtual {v0, v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V
It can be seen that a simple Java code is decomposed into a few more complex Smali Code. You may ask why it is so complicated? The simple answer is because Java is developed for people to read and write, Smali is for virtual machines.
It's true that talking to a machine is a complex task. This is why Smali is also more complex than Java.
Smali file format
In Java, you can define multiple classes in one Java file (xxx.java). In Smali, each file should include only one class. The general format is:
.class modifier class_name
.super class_name_of_the_parent_class
.source source_file_name
{implemented interface}
{comment list}
{field list}
{method list}
- The modifiers are public, private, protected, static, final, etc., which are similar to those in Java, in addition to the class interface and enum to indicate that this class is an interface or enumeration class;
- The class name in Smali is the Lpackage_name/class_name; for example, the TextView class in Android, its package name is android.widget. In Smali, the class will be indicated as Landroid/widget/TextView;
- The source file name is the java file name of the compiled class, such as Main.java, which is used mainly for debugging;
- For interface the syntax is: .implements interface class name. It can have zero or more interface, which indicates that the class implements the interface;
- Annotated words are codes such as @XXX in Java code, such as the more common @Override, @Nullable, @NonNull, its Smali syntax is: .annotation xxxxxxx xxxxxx .end annotation. It doesn’t matter if you don’t know what the purpose of annotations. Anyway, you can’t use them in general. You only need to know that they are annotations when you see the annotation in Smali.
- The field is the definition of variable, and the method is what we call function in another programing language like Pascal or C (this is why I call method function in my videos). Here also no need for explaination. If you have trouble understanding the role of field or method, please refer to Java as these are Java related topic.
Type
In Java, types are divided into basic types and reference types. They are 9 basic types. The corresponding value in Smali is:
For the reference type, I also mentioned before that in Smali, the type can be represent as LpackageName/class_name/typeName like in Landroid/widget/TextView; where TextView is the object type of the widget. Developer in Java can create their own type or replace the basic type with their own to make reverse engineering the app harder.
In Smali, adding in front of the type [ is to indicate an array. For example, [I represents int [] (single array of integer), [Ljava/lang/String; represents String [] (single array of strings). For multi-dimensional arrays, only needs to add extra [ to represent other dimensions. For example, [[I represents two-dimensional array int [] [].
Method
The syntax of the method defined in Smali is:
.method modifier method_name(parameter types) return type
method code...
.end method
A method could have no input (parameter) or many input, but it should have just one output. The output of the method is defined in the return. If the method doesn't have any output, it will return void. Notice that in Smali, the input and output should be defined. When a method require multiple parameter types, they should be connected together. For example, method(int, int, String) is represented as method(IILjava/lang/String;). The methods are the most important part in modding. Pay a particular attention to the inputs and their types as well as the type of the output.
Calling a method
The main objective of a method is to break the code into smaller blocks. It will make the code easier to follow. Another advantage of method is that the code in the method can be used many times. To use the code in a method, the method should be called by its name. In smali, the name alone is not enough. Smali must specify the method to be called in a very detailed form, including the class_name, method_name, parameter_type and return_type, the specific form is:
class_name->method_name(parameter_type)return_type
Let's take an example.
System.out.println("Hello world");
where out is a static field of System, its type is PrintStream, println is a method in PrintStream.
So in Smali, the last call to println method is:
invoke-virtual {v0, v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V
As we learned, println is the method_name, Ljava/io/PrintStream; is the class_name. This method takes one input, the input type is string and there's no output, so the method will return void.
Field
The field is describe in Smali as follows:
.field modifier field_name : Field_type {field value}
For example, the text field is defined in Java code:
public String text;
the corresponding Smali code is:
.field public text:Ljava/lang/String;
When a field is static and final (that is, a static constant) and its type is a basic type, you can directly assign a value to it:
.field public static final ID:I = 0x7f0a0001
If the defined field contains annotations, the syntax will be:
.field XXXXX
{Annotation list}
.end field
Reference field
Similar to call a method, when referencing a field, you also need to specify the field in a very detailed form. The specific form is: class_name->field_name:field_class_name
For example:
System.out .println("Hello world") needs to put the field out of the System class into the register v0 before calling the println method, like the following:
sget-object v0, Ljava/lang/System;->out:Ljava/ io/PrintStream;
Registers
I believe everyone noticed that there are many identifiers such as v0, v1, v2, p0, p1, and p2 when inspecting Smali code. These all represent registers.
So what is a register? Think of a register as a variable, or a place to temporarily store data.
Similar to assembly language, the registers are used to store data in memory to be used faster by the application.
For example:
let's suppose a static method abc(String), if you want to call this method in the Java method, you will type abc("Hello"); directly. On a machine level (Smali in this case), you can not directly passed the string parameter to the method. In fact, you need a register (such as v0 ). First store the string "Hello" into v0 , and then call the abc method with v0 as parameter in it. As you can see, v0 was like a temporary storage for the word "Hello".
# Define a string constant "Hello" into v0
const-string v0, "Hello"
# Call the abc method, the required parameters are placed in v0
invoke-static {v0}, LXX;->abc(Ljava/lang/String;)V
I hope everyone now understand what a regisyer is.
In addition, the numbering of the registers v0 , v1 , and v2 are not written randomly. You need to use .registers N at the beginning of the method to specify the number of registers before you can use registers v0 to v(N-1).
Parameters
The registers describe above are all ordinary registers vN . In addition, Smali also specifically defines a parameter register pN for storing the values of the parameters passed in by this method.
If a method has n registers and m parameters, then n must be greater than or equal to m, and the m following n registers are parameters. For example:
If a static method abc(int, int, int), it has in total 3 parameters. If it has 5 registers, defined by .registers N, N cannot be less than 3. In this example N is equal 4.
When the method abc(11, 22, 33) is called, the value in p0 is initialized to 11, the value in p1 is initialized to 22, the value in p2 is initialized to 33, v0 and v1 will not be initialized.
If the number of registers change to 6 ( .registers 5 ), registers will become as describe in the following table:
Think about it, if you don't use parameters, and use only ordinary registers vN in the code, how much code do you have to change after changing the number of registers from 5 to 6? Using the parameters, make the changes very easy.
Hidden parameter
For non-static methods, the number of parameter registers is one more than the actual parameters, p0 will be fixed to represent the current class instance (this in Java). The regular parameters start from p1. We can use the Java2Smali tool To verify, the Java code is as follows:
The only difference between method test1 and test2 is that test1 is a static method while test2 is a non-static method.
Both methods print two parameters in sequence. In line 24 of test1(), the first parameter is printed using p0, and in line 29, the second parameter is printed using p1; under test2(), the two parameters are printed separately using p1 and p2.
Does p0 in test2() really represent this? We can also modify the code to verify:
I will let you analyze the difference. Please refer to any online Java course to learn more.
Calling a method
There are 5 instructions for calling methods in the Smali syntax, namely:
The syntax is: invoke-xxxxxx {parameter_list}, class_name->method_name(parameter_type)return_type;
So anytime you see the instruction invoke at the beginning of Smali code, you can directly determine that this code is used to call a method. The word that follows invoke- , depends on the type the method it calls. If the method is static, invoke-static will be used.
Calling virtual methods
Virtual methods are actually a concept in Java. You should know that Java subclasses can override non-final methods that can be inherited from the parent class. When calling these methods, you need to use the invoke-virtual instruction in order to Implement polymorphic features, such as the following code:
#java code:
Object obj = "123";
obj.equals("456");
#The smali code for calling the equal method is:
invoke-virtual {v0, v1}, Ljava/lang/Object;->equals(Ljava/lang/Object;)Z
On the surface, it calls the equals method of Object, but since obj is actually the string "123", and the equals method is rewritten in the string class String, the virtual machine finally calls the equals method of String. This is why the invoke-virtual was used.
Direct method
When calling a virtual method, the virtual machine needs to first find out whether the method is overwritten, and for those methods that cannot be overridden, the search seems to be a waste of time, so the use of invoke-direct instruction will improve efficiency, which is usually used for final methods, private methods, and construction methods.
Static method
There is nothing to say other than a static method is localized in the same class. When calling static methods, invoke-static is used.
Parent method
In the child class, if it has rewritten the parent class XX method, and you want to call the parent class XX method, you can call it through super.XX(), the instruction is invoke-super.
Interface method
This is easy to understand, invoke-xxxxxx {parameter list}, class_name -> method name (parameter type) return type , if the class corresponding to the class name is an interface, then xxxxxx must be interface.
Range
The above 5 instructions can all be used with range extension instruction, namely: invoke-virtual/range, invoke-direct/range etc.
The syntax used is: invoke-xxxxxx/range {vN...vM}, class_name->method_name(parameter&type)return_type, where N is less than M.
It is equivalent to: invoke-xxxxxx{vN, vN+1, vN+2, ..., vM-2, vM-1, vM}, class_name->method_name(parameter_type)return_type.
Generally, the range command is only used for methods with many parameters to reduce the generated code size and improve the operating efficiency.
This is the end of this introduction. For more details, please consider to self-study Java. Google developer website is also a great source of details information.