EDA of Terrorist Attack Around The World
In a world where news headlines are dominated by acts of terror, it’s crucial to examine the patterns and trends underlying these horrific events.
Welcome to a groundbreaking investigation into the dark realm of terrorism, where we’ll unleash the power of data science to unveil the intricate web of global terror. In this riveting article, we’ll dive into the depths of the Global Terrorism Dataset, harnessing the prowess of Python and Jupyter Notebook to conduct an eye-opening Exploratory Data Analysis (EDA). Prepare to be captivated as we embark on a thrilling journey, dissecting the who, what, where, and when of terrorist attacks around the world. Join us as we uncover the secrets behind the chaos and shed light on the factors that drive these deadly acts, to better understand and ultimately combat this ever-present threat.
Imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
This code imports four Python packages: NumPy, Seaborn, Pandas, and Matplotlib.pyplot.
import numpy as np — This line imports the NumPy package and renames it as np. NumPy is a package for scientific computing with Python, and it provides support for arrays and matrices, as well as mathematical functions to manipulate these arrays.
import seaborn as sns — This line imports the Seaborn package and renames it as sns. Seaborn is a data visualization library based on Matplotlib, which provides additional functionality and a more user-friendly interface.
import pandas as pd — This line imports the Pandas package and renames it as pd. Pandas is a data manipulation library that provides tools to read, write, and analyze data in various formats, including CSV, Excel, SQL, and more.
import matplotlib.pyplot as plt — This line imports the Pyplot module from the Matplotlib package and renames it as plt. Matplotlib is a data visualization library for Python, and Pyplot is a module within Matplotlib that provides a MATLAB-like interface for creating plots and charts.
In summary, this code imports several packages that will be useful for data analysis and visualization.
Import Data
Terror=pd.read_csv('/kaggle/input/global-terrorism-dataset/globalterrorismdb_0718dist.csv' , on_bad_lines='skip' , encoding='latin')
pd.set_option("display.max_columns",500) #max coulmns
This code snippet reads a CSV file containing the Global Terrorism Dataset and loads it into a Pandas DataFrame called ‘Terror’. It also sets the option to display up to 500 columns when viewing the DataFrame, allowing you to see more columns at once. Let’s break down each part of the code:
Terror=pd.read_csv('/kaggle/input/global-terrorism-dataset/globalterrorismdb_0718dist.csv', on_bad_lines='skip', encoding='latin')
pd.read_csv()
: This function is part of the Pandas library, which is commonly used for data manipulation and analysis in Python. The function reads a CSV (Comma Separated Values) file and converts it into a DataFrame object./kaggle/input/global-terrorism-dataset/globalterrorismdb_0718dist.csv
: This is the file path of the CSV file containing the Global Terrorism Dataset. It is located within the Kaggle platform, which is a popular data science and machine learning platform for sharing datasets and code.on_bad_lines='skip'
: This is an optional parameter that tells thepd.read_csv()
function how to handle lines in the CSV file that cannot be parsed. In this case, it is set to 'skip', which means any problematic lines will be skipped, and the function will continue reading the rest of the file.encoding='latin'
: This is another optional parameter that specifies the character encoding to be used when reading the CSV file. The 'latin' encoding (also known as ISO-8859-1 or Latin-1) is used here because the dataset may contain characters that are not part of the default UTF-8 encoding. This ensures that the file is read correctly without any encoding-related errors.
pd.set_option("display.max_columns",500)
pd.set_option()
: This function is part of the Pandas library and is used to set the value of a specified option. In this case, the option being set is "display.max_columns"."display.max_columns"
: This is the option key that specifies the maximum number of columns to be displayed when a Pandas DataFrame is printed or viewed.500
: This is the value assigned to the "display.max_columns" option. By setting it to 500, you will be able to view up to 500 columns at once when exploring the DataFrame. This can be helpful when dealing with large datasets that have many columns, as it allows you to see more of the data at once without having to scroll or paginate through the output.
Terror.shape
This code returns the shape of the Terror DataFrame, which is the number of rows and columns in the DataFrame.
Terror.shape
— This line returns a tuple containing two values: the number of rows and the number of columns in the Terror DataFrame. The output will be in the format (number of rows, number of columns). For example, if there are 10,000 rows and 30 columns, the output would be (10000, 30).
Terror.head(1)
Column Extraction
cols=['iyear','imonth','iday','country_txt','city','latitude',
'longitude','location','attacktype1_txt','targtype1_txt','targsubtype1_txt','target1',
'gname','motive','weaptype1_txt','dbsource','region_txt','nkill','nwound','natlty1_txt','weapdetail'
]
This code creates a list called cols that contains the names of 21 columns that are of interest from the Terror DataFrame.
cols=[‘iyear’,’imonth’,’iday’,’country_txt’,’city’,’latitude’,’longitude’,’location’,’attacktype1_txt’,’targtype1_txt’,’targsubtype1_txt’,’target1',’gname’,’motive’,’weaptype1_txt’,’dbsource’,’region_txt’,’nkill’,’nwound’,’natlty1_txt’,’weapdetail’]
— This line creates a list called cols and assigns it a sequence of 21 strings, each of which corresponds to a column name in the Terror DataFrame. These columns contain information about the year, month, and day of the attack, the location and target of the attack, the type of attack, the name of the group responsible, the number of deaths and injuries, and other details about the weapons and motive used in the attack.
Terror_clean=pd.DataFrame(data=Terror , columns=cols)
This code creates a new DataFrame called Terror_clean by selecting a subset of columns from the Terror DataFrame.
Terror_clean=pd.DataFrame(data=Terror , columns=cols)
— This line creates a new DataFrame called Terror_clean by passing the Terror DataFrame as the data argument to the pd.DataFrame()
function. The columns argument is set to cols, which is a list of column names that we want to include in the new DataFrame. The resulting Terror_clean DataFrame will only contain the columns specified in the cols list.
Terror_clean.shape
Terror_clean.isnull().sum()
This code returns the number of missing values in each column of the Terror_clean DataFrame.
Terror_clean.isnull().sum()
— This line first calls the isnull() method on the Terror_clean DataFrame, which returns a DataFrame of the same shape as Terror_clean, with the value True in cells where there is a missing value and False otherwise. The sum() method is then called on this DataFrame to compute the sum of missing values in each column. The output is a Series that lists the number of missing values for each column in the Terror_clean DataFrame. For example, if the nkill column has 100 missing values and the city column has 50 missing values, the output would be a Series with the values {‘iyear’: 0, ‘imonth’: 0, ‘iday’: 0, ‘country_txt’: 0, ‘city’: 50, ‘latitude’: 447, ‘longitude’: 447, ‘location’: 69429, ‘attacktype1_txt’: 0, ‘targtype1_txt’: 0, ‘targsubtype1_txt’: 10373, ‘target1’: 636, ‘gname’: 0, ‘motive’: 123491, ‘weaptype1_txt’: 0, ‘dbsource’: 0, ‘region_txt’: 0, ‘nkill’: 10313, ‘nwound’: 16311, ‘natlty1_txt’: 1559, ‘weapdetail’: 67670}.
Terror_clean.head(2)
Terror_clean.rename(columns={'iyear':'year' , 'imonth':'month' ,'iday':'day','country_txt':'country name','attacktype1_txt':'attacktype','targtype1_txt':'targtype','targsubtype2':'targsubtype2',
'gname':'group_name','weaptype1_txt':'weaptype','dbsource':'source','region_txt':'region1','nkill':'Killed','nwound':'Wounded','weapdetail':'weapon_detail','natlty1_txt':'nationality'
} , inplace=True)
This code renames several columns in the Terror_clean DataFrame using a dictionary.
Terror_clean.rename(columns={‘iyear’:’year’ , ‘imonth’:’month’ ,’iday’:’day’,’country_txt’:’country name’,’attacktype1_txt’:’attacktype’,’targtype1_txt’:’targtype’,’targsubtype2':’targsubtype2', ‘gname’:’group_name’,’weaptype1_txt’:’weaptype’,’dbsource’:’source’,’region_txt’:’region1',’nkill’:’Killed’,’nwound’:’Wounded’,’weapdetail’:’weapon_detail’,’natlty1_txt’:’nationality’} , inplace=True)
— This line calls the rename() method on the Terror_clean DataFrame to rename several columns. The new names are specified using a dictionary where the keys are the old column names and the values are the new column names.
The inplace=True
parameter means that the original Terror_clean DataFrame is modified in place rather than creating a new DataFrame.
For example, the iyear column is renamed to year, the imonth column is renamed to month, the iday column is renamed to day, and so on.
pd.set_option("display.max_columns",500) #max coulmns
This code sets an option in the Pandas library to display up to 500 columns when printing a DataFrame.
pd.set_option(“display.max_columns”,500)
— This line sets an option in the Pandas library to display up to 500 columns when printing a DataFrame. By default, Pandas will only display a subset of the columns if there are more than a certain number. This line ensures that all columns will be displayed when printing the DataFrame.
Terror_clean.head(1)
EXTRACTING FEATURE FOR EDA
Terror_clean['casualities']=Terror_clean['Wounded']+Terror_clean['Killed']
Terror_clean[‘casualities’]
— This code creates a new column called casualities in the Terror_clean DataFrame
. The square brackets [] are used to select a column in a DataFrame, but since the column name does not yet exist, Pandas creates a new column with the given name.
=Terror_clean[‘Wounded’]+Terror_clean[‘Killed’]
— This code calculates the values for the casualities column. It does this by adding the values in the Wounded column and the Killed column for each row in the Terror_clean DataFrame
. This operation returns a Pandas Series containing the sum of the Wounded and Killed values for each row. This new Series is then assigned as the values for the casualities column in the Terror_clean DataFrame.
In summary, this line of code creates a new column in the Terror_clean DataFrame that calculates the total number of casualties for each terrorist attack by adding the Wounded and Killed columns together. This new column provides a convenient way to analyze the overall impact of terrorist attacks.
Terror_clean.head(3)
NATIONALITY
nationality_top = Terror_clean[Terror_clean['nationality'] != 'Unknown']
This code creates a new DataFrame called nationality_top by selecting only the rows from the Terror_clean DataFrame where the value in the nationality column is not ‘Unknown’.
Terror_clean[‘nationality’] != ‘Unknown’
— This expression returns a boolean mask that is True for rows where the value in the nationality column is not ‘Unknown’, and False otherwise.
Terror_clean[Terror_clean[‘nationality’] != ‘Unknown’]
— This code uses the boolean mask to select only the rows from the Terror_clean DataFrame where the nationality column is not ‘Unknown’. The resulting DataFrame is assigned to the variable nationality_top.
In summary, this line of code creates a new DataFrame called nationality_top
that includes only the rows from the Terror_clean
DataFrame where the value in the nationality column is known. This new DataFrame will be useful for analyzing patterns and trends related to nationality and terrorist attacks.
nationality_type=nationality_top['nationality'].value_counts().reset_index()
This code creates a new DataFrame called nationality_type
that shows the count of each nationality in the nationality_top DataFrame
.
nationality_top[‘nationality’]
— This code selects the nationality column from the nationality_top DataFrame.
.value_counts()
— This code returns a Series containing the count of each unique value in the nationality column. In other words, it counts how many times each nationality appears in the nationality_top DataFrame.
.reset_index()
— This code converts the resulting Series into a DataFrame and resets the index. This means that the resulting DataFrame will have two columns: index and nationality. The index column contains the unique nationalities, and the nationality column contains the count of each nationality.
nationality_type=nationality_top[‘nationality’].value_counts().reset_index()
— This code creates a new DataFrame called nationality_type by chaining together the methods described above. The nationality_type DataFrame contains the count of each nationality in the nationality_top DataFrame, with one row per nationality.
nationality_type.rename(columns={"index":'Nationality','nationality':'Counts'},inplace=True)
nationality_type
This code renames the columns of the nationality_type DataFrame to more descriptive names.
nationality_type.rename(columns={“index”:’Nationality’,’nationality’:’Counts’},inplace=True)
— This code uses the rename() method to rename the columns of the nationality_type DataFrame. The columns parameter is set to a dictionary with the keys ‘index’ and ‘nationality’, which correspond to the old column names, and the values ‘Nationality’ and ‘Counts’, which correspond to the new column names, respectively. The inplace=True parameter means that the original DataFrame is modified in place rather than creating a new DataFrame.
The resulting nationality_type
DataFrame has two columns: Nationality and Counts. The Nationality column contains the unique nationalities in the nationality_top
DataFrame, and the Counts column contains the count of each nationality.
GROUP NAME
Terror_clean['nationality'].unique()
This code returns an array of unique values in the nationality column of the Terror_clean DataFrame.
Terror_clean[‘nationality’]
— This code selects the nationality column from the Terror_clean DataFrame.
.unique()
— This code returns an array containing the unique values in the nationality column. In other words, it returns a list of all the unique nationalities that appear in the Terror_clean DataFrame.
For example, if the nationality column contains values such as [‘United States’, ‘India’, ‘Pakistan’, ‘Unknown’]
, the output of this code would be an array [‘United States’, ‘India’, ‘Pakistan’, ‘Unknown’]
.
terr= Terror_clean[Terror_clean['group_name'] != 'Unknown']
Terror_Group_Org=terr['group_name'].value_counts().reset_index()
Terror_Group_Org.rename(columns={"index":'Group_name','group_name':'Counts'},inplace=True)
Terror_Group_Org
This code creates a new DataFrame called Terror_Group_Org
that shows the count of each terrorist group in the Terror_clean DataFrame.
Terr = Terror_clean[Terror_clean[‘group_name’] != ‘Unknown’]
— This code creates a new DataFrame called Terr by selecting only the rows from the Terror_clean
DataFrame where the value in the group_name
column is not ‘Unknown’. The resulting DataFrame includes only the terrorist attacks where the group responsible is known.
terr[‘group_name’].value_counts()
— This code returns a Series containing the count of each unique value in the group_name
column of the terr DataFrame. In other words, it counts how many times each terrorist group appears in the terr DataFrame.
.reset_index()
— This code converts the resulting Series into a DataFrame and resets the index. This means that the resulting DataFrame will have two columns: index and group_name. The index column contains the unique terrorist group names, and the group_name
column contains the count of each group.
Terror_Group_Org.rename(columns={“index”:’Group_name’,’group_name’:’Counts’},inplace=True)
— This code renames the columns of the Terror_Group_Org
DataFrame to more descriptive names, using the rename()
method with the columns parameter set to a dictionary with the old column names as keys and the new column names as values. The inplace=True
parameter means that the original DataFrame is modified in place rather than creating a new DataFrame.
The resulting Terror_Group_Org
DataFrame has two columns: Group_name
and Counts. The Group_name column contains the unique terrorist groups in the Terror_clean
DataFrame, and the Counts column contains the count of each group.
CITY
Terror_clean.head(3)
This code displays the first three rows of the Terror_clean
DataFrame.
Terror_clean.head(3)
— This line calls the head() method on the Terror_clean DataFrame, which returns the first three rows of the DataFrame. This is a useful way to quickly inspect the structure and contents of a DataFrame.
The resulting output shows the first three rows of the Terror_clean DataFrame, with columns including year, month, day, country name, city, latitude, longitude, location, attacktype, targtype, targsubtype2, target1, group_name, motive, weaptype, source, region1, Killed, Wounded, nationality, weapon_detail, and casualities.
city1= Terror_clean[Terror_clean['city'] != 'Unknown']
This code creates a new DataFrame called city1 by selecting only the rows from the Terror_clean DataFrame where the value in the city column is not ‘Unknown’.
Terror_clean[‘city’] != ‘Unknown’
— This expression returns a boolean mask that is True for rows where the value in the city column is not ‘Unknown’, and False otherwise.
Terror_clean[Terror_clean[‘city’] != ‘Unknown’]
— This code uses the boolean mask to select only the rows from the Terror_clean DataFrame where the city column is not ‘Unknown’. The resulting DataFrame is assigned to the variable city1.
In summary, this line of code creates a new DataFrame called city1 that includes only the rows from the Terror_clean DataFrame where the value in the city column is known. This new DataFrame will be useful for analyzing patterns and trends related to cities and terrorist attacks.
city_cln=city1['city'].value_counts().reset_index()
city_cln.rename(columns={"index":'City_name','city':'Counts'},inplace=True)
city_cln
This code creates a new DataFrame called city_cln that shows the count of each city in the city1 DataFrame.
city1[‘city’].value_counts()
— This code returns a Series containing the count of each unique value in the city column of the city1 DataFrame. In other words, it counts how many times each city appears in the city1 DataFrame.
.reset_index()
— This code converts the resulting Series into a DataFrame and resets the index. This means that the resulting DataFrame will have two columns: index and city. The index column contains the unique city names, and the city column contains the count of each city.
city_cln.rename(columns={“index”:’City_name’,’city’:’Counts’},inplace=True)
— This code renames the columns of the city_cln DataFrame to more descriptive names, using the rename() method with the columns parameter set to a dictionary with the old column names as keys and the new column names as values. The inplace=True parameter means that the original DataFrame is modified in place rather than creating a new DataFrame.
The resulting city_cln
DataFrame has two columns: City_name and Counts. The City_name
column contains the unique city names in the city1 DataFrame, and the Counts column contains the count of each city.
WEAPON TYPE
wp=Terror_clean[Terror_clean['weaptype'] != 'Unknown']
weaptype_cln=wp['weaptype'].value_counts().reset_index()
weaptype_cln.rename(columns={"index":'weapon_type','weaptype':'Counts'},inplace=True)
weaptype_cln
This code creates a new DataFrame called weaptype_cln
that shows the count of each weapon type in the Terror_clean
DataFrame where the weapon type is known.
Terror_clean[‘weaptype’] != ‘Unknown’
— This expression returns a boolean mask that is True for rows where the value in the weaptype column is not ‘Unknown’, and False otherwise.
Terror_clean[Terror_clean[‘weaptype’] != ‘Unknown’]
— This code uses the boolean mask to select only the rows from the Terror_clean DataFrame where the weaptype column is not ‘Unknown’. The resulting DataFrame is assigned to the variable wp.
wp[‘weaptype’].value_counts()
— This code returns a Series containing the count of each unique value in the weaptype column of the wp DataFrame. In other words, it counts how many times each weapon type appears in the wp DataFrame.
.reset_index() —
This code converts the resulting Series into a DataFrame and resets the index. This means that the resulting DataFrame will have two columns: index and weaptype. The index column contains the unique weapon types, and the weaptype column contains the count of each weapon type.
weaptype_cln.rename(columns={“index”:’weapon_type’,’weaptype’:’Counts’},inplace=True)
— This code renames the columns of the weaptype_cln DataFrame to more descriptive names, using the rename() method with the columns parameter set to a dictionary with the old column names as keys and the new column names as values. The inplace=True parameter means that the original DataFrame is modified in place rather than creating a new DataFrame.
The resulting weaptype_cln
DataFrame has two columns: weapon_type
and Counts. The weapon_type
column contains the unique weapon types in the Terror_clean
DataFrame, and the Counts column contains the count of each weapon type.
NATIONALITY
f, ax = plt.subplots(figsize=(15, 10))
ax = sns.barplot(x="Counts", y="Nationality", data=nationality_type[:10],
palette="viridis").set_title('Nations Suffered Most of the TARGET Attacks')
This code creates a bar plot using the Seaborn library that visualizes the top 10 nationalities that have suffered the most terrorist attacks according to the nationality_type DataFrame.
f, ax = plt.subplots(figsize=(15, 10))
— This code creates a new figure object using plt.subplots()
with a specified size of (15, 10) in inches, which sets the width to 15 and the height to 10.
ax = sns.barplot(x=”Counts”, y=”Nationality”, data=nationality_type[:10], palette=”viridis”)
— This code creates a vertical bar plot using Seaborn’s barplot() function, which takes in several arguments:
x=”Counts”
: This sets the Counts column of the nationality_type
DataFrame as the x-axis variable. y=”Nationality”
: This sets the Nationality column of the nationality_type DataFrame as the y-axis variable. data=nationality_type[:10]
: This sets the data to be used as the nationality_type DataFrame, but only for the first 10 rows (i.e., the top 10 nationalities with the highest count). palette=”viridis”
: This sets the color palette to be used in the plot. .set_title(‘Nations Suffered Most of the TARGET Attacks’)
— This code sets the title of the plot to be “Nations Suffered Most of the TARGET Attacks”.
The resulting plot is a vertical bar chart that shows the top 10 nationalities that have suffered the most terrorist attacks, with the number of attacks represented by the length of the bars.
Group Name
f, ax = plt.subplots(figsize=(15, 10))
ax = sns.barplot(x="Counts", y="Group_name", data=Terror_Group_Org[:10],
palette="Blues").set_title('Groups Responsible Behind the Terror Attacks')
This code creates a bar plot using the Seaborn library that visualizes the top 10 terrorist groups that have carried out the most attacks according to the Terror_Group_Org DataFrame.