Why Is It String.Join(List) Instead of List.Join(String)

Why is it string.join(list) instead of list.join(string)?

It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.

For example:

'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'

Using something other than strings will raise the following error:

TypeError: sequence item 0: expected str instance, int found

Why does string join function on a list of strings seems to exclude last item in list?

This is because the value you specify is the separator. It is put between the list element. Since the last value doesn't need a separator, it is not appended.
You can have a look at: https://www.w3schools.com/python/ref_string_join.asp for more information.

A workaround could be:

print(" ".join([e+"x" for e in a]))

What happens?

We use a whitespace as separator, but before we are going to put them between the list entries, we add an x to every element in your list a.
I use an inline for loop to make it in one single line. If this looks strange to you, consider this source: https://blog.teamtreehouse.com/python-single-line-loops.

Note that this means, that the whitespace is not appended to the last item if you want to achieve that, you need to move the whitespace after the x:

print("".join([e+"x " for e in a]))

Python string.join(list) on object array rather than string array

You could use a list comprehension or a generator expression instead:

', '.join([str(x) for x in list])  # list comprehension
', '.join(str(x) for x in list) # generator expression

Python: why str.join(iterable) instead of str.join(*strings)

For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join() is following:

''.join(process(x) for x in some_input) 
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)

where input_data can have thousand of entries and you just want to generate the output string efficiently.

If join accepted variable arguments instead of an iterable, this would have to be spelled as:

''.join(*(process(x) for x in some_input))
# or
''.join(*result)

which would create a (possibly long) tuple, just to pass it as *args.

So that's 2 characters in a short case vs. being wasteful in large data case.

History note

(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)

The *args in function definitions were added in Python long time ago:

==> Release 0.9.8 (9 Jan 1993) <==

Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".

A proper way to pass a list to such functions was using a built-in function apply(callable, sequence). (Note, this doesn't mention **kwargs which can be first seen in docs for version 1.4).

The ability to call a function with * syntax is first mentioned in release notes for 1.6:

There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).

But it's missing from grammar docs until version 2.2.

Before 2.0 str.join() did not even exists and you had to do from string import join.

Why is concatenating strings with ''.join(list) so popular?

This is faster because the join method gets to dive "under the surface" and use lower-level optimizations not available from the Python layer. The loop has to plod through the sequence generator and deal with each object in turn. Also, your loop has to build a new string on each iteration, a slow process. join gets to use mutable strings on the C layer or below.

If the objects aren't already in a list ... it depends on the application. However, I suspect that almost any such application will have to go through that loop-ish overhead somewhere just to form the list, so you'd lose some of the advantage of join, although the mutable string would still save time.

string Join List and items

You can concatenate all strings using Concat (without changing the original list!) and Join that enumerable:

List<string> list = new List<string>() { "item1", "item2" };
string item3 = "item3";

string result = string.Join(",", list.Concat(new string[] { item3 }));
// result = item1,item2,item3

The problem with your current code is that it calls string.Join(string, params IEnumerable<object>): it will treat list as an object, not a lis of objects.

Join list column with string column in PySpark

It can be done without UDF. First explode the array, then join and group.

Input data:

from pyspark.sql import functions as F
df_emp = spark.createDataFrame(
[(1, 'aaa'),
(2, 'bbb'),
(3, 'ccc'),
(4, 'ddd')],
['id', 'Name']
)
df_dept = spark.createDataFrame(
[(1, 'DE', [1, 2]),
(2, 'DA', [3, 4])],
['dept_id', 'dept_name', 'employees']
)

Script:

df_dept_exploded = df_dept.withColumn('id', F.explode('employees'))
df_joined = df_dept_exploded.join(df_emp, 'id', 'left')
df = (
df_joined
.groupBy('dept_name')
.agg(
F.collect_list('id').alias('employees'),
F.collect_list('Name').alias('employee_names')
)
)
df.show()
# +---------+---------+--------------+
# |dept_name|employees|employee_names|
# +---------+---------+--------------+
# | DE| [1, 2]| [aaa, bbb]|
# | DA| [3, 4]| [ccc, ddd]|
# +---------+---------+--------------+


Related Topics



Leave a reply



Submit