Introduction to TSLoad
TSLoad is a complex set of tools that allow to simulate computer workloads in realistic way. Unlike benchmarks that intended to measure maximum performance of system by running it at maximum throughput (and through that forgetting about request latency that is in our opinion is important too), TSLoad is intended to simulate workload that generated by N users. It is also modular, so basically it doesn't support any workload class at all, like ab is intended to benchmark web-servers or iozone - filesystems. TSLoad is an engine that provides capabilities to describe workload in flexible way, and takes some responsibilities to create multiple threads, and process experiment config in JSON format, and pass it to a module.
TSLoad benefits and disadvantages
Here are some advantages of TSLoad:
-
Universal and modular. Our loader supports four different operating modes which are described in Operating modes. Primary functioning mode of TSLoad is running large-scale experiments to test Virtual or Cloud platforms by running hybrid workloads in non-stationary way. To do so, you need to create time series (time series or TS is basis for TSLoad name) of arrival rates and describe workloads classes.
-
On the other hand, like any compromise, implementing multiple functioning modes in one engine is bad. For example, time series mode requires steps, and because of that TSLoad induces some "inter-step" performance effects. They could be ignored in this mode, but crucial for benchmark mode.
-
Multiplatform. TSLoad supports Linux, Solaris and Windows (natively, not through Cygwin) on x86 platform. We are also plan to support BSD and possibly AIX, and also multiple processor architectures: ARM and POWER for Linux and SPARC for Solaris. Also we don't require any of OS abstraction level libraries such as GLib, providing them on our own with libtscommon which gives us good control of codebase.
-
Written in C. Because of that you can be sure that module that you written behaves as expected (for example, register-register transfers in bigmem benchmark are actually using registers).
-
On the other hand, C makes code bloatware. It already have 30k of code (think about how many bugs it contains), and during development we missed a lot of high level features such as exceptions (which caused return-value hell in TSLoad code)
-
Also, if you plan to evaluate performance of an specific programming language or a framework on top of that, TSLoad is not an option.
Of course, TSLoad have some disadvantage:
-
Unstable and buggy. TSLoad is still in active development, so currently it is hard to guarantee code or API stability.
-
However, we have some unit and integrational tests which helps us to find basic errors.
-
Complexety. As mentioned above in "Universal and modular" section, TSLoad is pretty complex which could induce some unwanted performance effects. Also, it designed to support multi-agent exnvironment (see below), which also adds some limitations on code. For example, if you want to make some actions on object in C, you should provide pointer to it to desired function. This doesn't work in TSLoad because remote server (written in Python actually) doesn't know such pointer. So, high-level libtsload API accepts only name of an object. So, tsexperiment and libtsload have two separate descriptors of workload, and each time they need to run hashmap search.
-
Non-standard. Like any new programming tool, it is new API and format config to learn which is always painful.
Planned features:
-
Monitoring capabilities which are also modular is planned. This feature is not yet implemented.
-
Mutli-agent architecture when experiments are configured on central server and being run on remote platforms. This feature was inspired by MOSBENCH set of benchmarks. It have been prototyped, but this feature is far from release
The Example
Let's consider following benchmark, usually called a memeat which I was written to demonstrate instrumenting systems with DTrace:
#include <stdlib.h> #include <unistd.h> #include <stdio.h> #define NUM_TOUCHES 4 long pagesize = 8192L; char *ptr = NULL; int main(int argc, char* argv[]) { long pages_to_alloc = atoi(argv[1]); long seconds_to_sleep = atoi(argv[2]) / NUM_TOUCHES; long page = 0; int touch = 0; int pid = getpid(); if(argc < 3) return -1; printf("[%6d] Allocating %dpgs & sleep %d times for %ds\n", pid, pages_to_alloc, NUM_TOUCHES, seconds_to_sleep); fflush(stdout); ptr = (char*) malloc(pages_to_alloc * pagesize); for(touch = 0; touch < NUM_TOUCHES; ++touch) { //Touch memory (for sure that memory is truly allocated) for(page = 0; page < pages_to_alloc; ++page) ptr[page * pagesize] = 0; sleep(seconds_to_sleep); } free(ptr); return 0; }
It is good for one-time demonstration, but couldn't be reused because it is badly parametrised: NUM_TOUCHES is a macro, arguments are undocumented and pagesize is a compile-time constant (which is valid for SPARC). Also it roughly simulate real-world arrivals which are probabilistic (for example, Poisson arrivals which are popular in queueing theory).
It could be rewritten as a TSLoad module (some code is ommitted):
MODEXPORT wlp_descr_t bigmem_params[] = { { WLP_INTEGER, WLPF_NO_FLAGS, WLP_NO_RANGE(), WLP_NO_DEFAULT(), "pages_to_alloc", "Number of pages to allocate", offsetof(struct memeat_workload, pages_to_alloc) }, { WLP_INTEGER, WLPF_REQUEST, WLP_NO_RANGE(), WLP_NO_DEFAULT(), "pages_to_touch", "Number of pages to touch per request", offsetof(struct memeat_request, pages_to_touch) }, { WLP_NULL } } MODEXPORT int memeat_wl_config(workload_t* wl) { struct memeat_workload* mw = (struct memeat_workload*) wl->wl_params; wl->wl_private = malloc(mw->pages_to_alloc * hi_get_pagesize()); return 0; } MODEXPORT int memeat_wl_unconfig(workload_t* wl) { free(wl->wl_private); return 0; } MODEXPORT int memeat_run_request(request_t* rq) { char* ptr = rq->rq_workload->wl_private; struct memeat_workload* mw = (struct memeat_workload*) rq->rq_workload->wl_params; struct memeat_request* mrq = (struct memeat_request*) rq->rq_params; long page = 0; mrq->pages_to_touch = mrq->pages_to_touch % mw->pages_to_alloc; for(page = 0; page < mrq->pages_to_touch; ++page) ptr[page * hi_get_pagesize()] = 0; return 0; }
Now you can set random inter-arrival times, random number of pages per request, run requests in multiple threads and configure experiment parameters using text-based config. For example, section that describes workload would look like:
{ ... "workloads" : { "memeat" : { "wltype": "memeat", "threadpool": "tp_memeat", "rqsched": { "type" : "iat" "distribution": "exponential" }, "params" : { "pages_to_alloc" : 20000, "pages_to_touch" : { "randgen": { "class": "lcg" }, "randvar": { "class": "exponential", "rate": 5e-3 } } } } } ... }