Why is it string.join(list) instead of list.join(string)?
It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
Why does string join function on a list of strings seems to exclude last item in list?
This is because the value you specify is the separator. It is put between the list element. Since the last value doesn't need a separator, it is not appended.
You can have a look at: https://www.w3schools.com/python/ref_string_join.asp for more information.
A workaround could be:
print(" ".join([e+"x" for e in a]))
What happens?
We use a whitespace as separator, but before we are going to put them between the list entries, we add an x
to every element in your list a
.
I use an inline for loop to make it in one single line. If this looks strange to you, consider this source: https://blog.teamtreehouse.com/python-single-line-loops.
Note that this means, that the whitespace is not appended to the last item if you want to achieve that, you need to move the whitespace after the x
:
print("".join([e+"x " for e in a]))
Python string.join(list) on object array rather than string array
You could use a list comprehension or a generator expression instead:
', '.join([str(x) for x in list]) # list comprehension
', '.join(str(x) for x in list) # generator expression
Python: why str.join(iterable) instead of str.join(*strings)
For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join()
is following:
''.join(process(x) for x in some_input)
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)
where input_data can have thousand of entries and you just want to generate the output string efficiently.
If join accepted variable arguments instead of an iterable, this would have to be spelled as:
''.join(*(process(x) for x in some_input))
# or
''.join(*result)
which would create a (possibly long) tuple, just to pass it as *args
.
So that's 2 characters in a short case vs. being wasteful in large data case.
History note
(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)
The *args
in function definitions were added in Python long time ago:
==> Release 0.9.8 (9 Jan 1993) <==
Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".
A proper way to pass a list to such functions was using a built-in function apply(callable, sequence)
. (Note, this doesn't mention **kwargs
which can be first seen in docs for version 1.4).
The ability to call a function with *
syntax is first mentioned in release notes for 1.6:
There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).
But it's missing from grammar docs until version 2.2.
Before 2.0 str.join()
did not even exists and you had to do from string import join
.
Why is concatenating strings with ''.join(list) so popular?
This is faster because the join
method gets to dive "under the surface" and use lower-level optimizations not available from the Python layer. The loop has to plod through the sequence generator and deal with each object in turn. Also, your loop has to build a new string on each iteration, a slow process. join
gets to use mutable strings on the C layer or below.
If the objects aren't already in a list ... it depends on the application. However, I suspect that almost any such application will have to go through that loop-ish overhead somewhere just to form the list, so you'd lose some of the advantage of join
, although the mutable string would still save time.
string Join List and items
You can concatenate all strings using Concat
(without changing the original list!) and Join
that enumerable:
List<string> list = new List<string>() { "item1", "item2" };
string item3 = "item3";
string result = string.Join(",", list.Concat(new string[] { item3 }));
// result = item1,item2,item3
The problem with your current code is that it calls string.Join(string, params IEnumerable<object>)
: it will treat list
as an object, not a lis of objects.
Join list column with string column in PySpark
It can be done without UDF. First explode
the array, then join and group.
Input data:
from pyspark.sql import functions as F
df_emp = spark.createDataFrame(
[(1, 'aaa'),
(2, 'bbb'),
(3, 'ccc'),
(4, 'ddd')],
['id', 'Name']
)
df_dept = spark.createDataFrame(
[(1, 'DE', [1, 2]),
(2, 'DA', [3, 4])],
['dept_id', 'dept_name', 'employees']
)
Script:
df_dept_exploded = df_dept.withColumn('id', F.explode('employees'))
df_joined = df_dept_exploded.join(df_emp, 'id', 'left')
df = (
df_joined
.groupBy('dept_name')
.agg(
F.collect_list('id').alias('employees'),
F.collect_list('Name').alias('employee_names')
)
)
df.show()
# +---------+---------+--------------+
# |dept_name|employees|employee_names|
# +---------+---------+--------------+
# | DE| [1, 2]| [aaa, bbb]|
# | DA| [3, 4]| [ccc, ddd]|
# +---------+---------+--------------+
Related Topics
Python Numpy Valueerror: Operands Could Not Be Broadcast Together with Shapes
Check If a Number Is a Perfect Square
Round to 5 (Or Other Number) in Python
Python Multithreading Wait Till All Threads Finished
How to Access Outer Class from an Inner Class
Iterating Through Directories with Python
Pandas Convert Dataframe to Array of Tuples
"Cloning" Row or Column Vectors
In Python, Differencebetween ".Append()" and "+= []"
How to Convert 'Binary String' to Normal String in Python3
How to Remove Specific Elements in a Numpy Array
How to Switch Position of Two Items in a Python List
How to Convert an Xml String to a Dictionary
How to Print Unicode Character in Python
Read File Data Without Saving It in Flask
How to Run Spyder in Virtual Environment