I found it very inconvenient to mess around with gdb or OpenOCD just to load and start linux kernel every time I need it, so I took ELF loader written in C, build somewhat minimal framwork for it and also created RAM module to hold loader code, data and stack.
In my system both FPGA bit stream and linux kernel are in SPI flash, bit stream is at the beginning of it and kernel is at 0x180000 offset. Kernel is plain vmlinux - uncompressed ELF.