CXPACKET means parallelism issues i.e the load has been shared between processors/cores and they're waiting on each other to finish.
The query in question takes a results set of 500,000 rows and aggregates it by four grouping columns into a smaller set of 46,000 records.
To solve this I added the query hint OPTION (MAXDOP n) to the end of a query (where n is number of processors).
Trying the query on it's own generated the CXPACKET wait types, as did any value of MAXDOP over 1.
Specifiying MAXDOP 1 let the query finish in 3 minutes without interuption.
INNER JOIN table2 ON table1.fk = table2.pk
WHERE column1 = criteria1
AND column2 = criteria2
[group by column list]
[order by column list]
OPTION (MAXDOP 1)
NB : MAXDOP is NOT a magic setting to be applied everywhere. Most queries will happily generate a parallel execution plan. Only use this to reduce the number of cores used if you have a query that will not scale or deliberately want to prevent an operation taking resources on the server.
Server level Max Degree of Parallelism setting
MSDN : Query Hints